When the peer in an NFC-DEP exchange has a
packet to send that is larger than the local
maximum payload, it sets the 'MI' bit in the
'I' PDU. This indicates that NFC-DEP chaining
is to occur.
When such a PDU is received, the local side
responds with an 'ACK' PDU and this continues
until the peer sends an 'I' PDU with the 'MI'
bit cleared. This indicates that the chaining
sequence is complete and the entire packet has
been transferred.
Receiving chained PDUs is currently not supported
by the digital layer so add that support. When a
chaining sequence is initiated by the peer, the
digital layer will allocate an skb large enough
to hold 8 maximum sized frame payloads. The maximum
payload can range from 64 to 254 bytes so 8 * 254 =
2032 seems like a reasonable compromise between
potentially wasting memory and constantly reallocating
new, larger skbs.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the NFC-DEP code is given a packet to send
that is larger than the peer's maximum payload,
its supposed to set the 'MI' bit in the 'I' PDU's
Protocol Frame Byte (PFB). Setting this bit
indicates that NFC-DEP chaining is to occur.
When NFC-DEP chaining is progress, sender 'I' PDUs
are acknowledged with 'ACK' PDUs until the last 'I'
PDU in the chain (which has the 'MI' bit cleared)
is responded to with a normal 'I' PDU. This can
occur while in Initiator mode or in Target mode.
Sender NFC-DEP chaining is currently not implemented
in the digital layer so add that support. Unfortunately,
since sending a frame may require writing the CRC to the
end of the data, the relevant data part of the original
skb must be copied for each intermediate frame.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The maximum payload for NFC-DEP exchanges (i.e., the
number of bytes between SoD and EoD) is negotiated
using the ATR_REQ, ATR_RES, and PSL_REQ commands.
The valid maximum lengths are 64, 128, 192, and 254
bytes.
Currently, NFC-DEP code assumes that both sides are
always using 254 byte maximums and ignores attempts
by the peer to change it. Instead, implement the
negotiation code, enforce the local maximum when
receiving data from the peer, and don't send payloads
that exceed the remote's maximum. The default local
maximum is 254 bytes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC-DEP DEP_REQ and DEP_RES exchanges using 'I'
and 'ACK/NACK' PDUs have a sequence number called
the Packet Number Information (PNI). The PNI
is incremented (modulo 4) after every DEP_REQ/
DEP_RES pair and should be verified by the digital
layer code. That verification isn't always done,
though, so add code to make sure that it is done.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
According to chapter 14 of the NFC-DEP Digital
Protocol Spec., the NAD byte should never be
present in DEP_REQ or DEP_RES frames. However,
this is not enforced so add that enforcement code.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Target mode, the Initiator specifies whether
subsequent DEP_REQ and DEP_RES frames will include
a DID byte by the value passed in the ATR_REQ. If
the DID value in the ATR_REQ is '0' then no DID
byte will be included. If the DID value is between
'1' and '14' then a DID byte containing the same
value must be included in subsequent DEP_REQ and
DEP_RES frames. Any other DID value is invalid.
This is specified in sections 14.8.1.2 and 14.8.2.2
of the NFC Digital Protocol Spec.
Checking the DID value (if it should be there at all),
is not currently supported by the digital layer's
NFC-DEP code. Add this support by remembering the
DID value in the ATR_REQ, checking the DID value of
received DEP_REQ frames (if it should be there at all),
and including the remembered DID value in DEP_RES
frames when appropriate.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Initiator mode, the digital layer's
NFC-DEP code always sets the Device ID (DID)
value in the ATR_REQ to '0'. This means that
subsequent DEP_REQ and DEP_RES frames must
never include a DID byte. This is specified
in sections 14.8.1.1 and 14.8.2.1 of the NFC
Digital Protocol Spec.
Currently, the digital layer's NFC-DEP code
doesn't enforce this rule so add code to ensure
that there is no DID byte in DEP_RES frames.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Rearrange some of the code in digital_in_recv_dep_res()
and digital_tg_recv_dep_req() so the initial code looks
similar. The real reason is prepare the code for some
upcoming patches that require these changes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When digital_in_send_cmd() or digital_tg_send_cmd()
fail, they do not free the skb that was passed to
them so the routine that allocated the skb should
free it. Currently, there are several routines in
the NFC-DEP code that don't do this so make them.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This option has been marked for deprecation and removal for
a little more than two years, but it's not been very clearly
signalled since it was always possible to just select it.
Make it unselectable now to signal anyone who's still using
it after all this time more clearly. They can still get it
back, but only by patching the kernel.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull networking fixes from David Miller:
"Several small fixes here:
1) Don't crash in tg3 driver when the number of tx queues has been
configured to be different from the number of rx queues. From
Thadeu Lima de Souza Cascardo.
2) VLAN filter not disabled properly in promisc mode in ixgbe driver,
from Vlad Yasevich.
3) Fix OOPS on dellink op in VTI tunnel driver, from Xin Long.
4) IPV6 GRE driver WCCP code checks skb->protocol for ETH_P_IP
instead of ETH_P_IPV6, whoops. From Yuri Chislov.
5) Socket matching in ping driver is buggy when packet AF does not
match socket's AF. Fix from Jane Zhou.
6) Fix checksum calculation errors in VXLAN due to where the
udp_tunnel6_xmit_skb() helper gets it's saddr/daddr from. From
Alexander Duyck.
7) Fix 5G detection problem in rtlwifi driver, from Larry Finger.
8) Fix NULL deref in tcp_v{4,6}_send_reset, from Eric Dumazet.
9) Various missing netlink attribute verifications in bridging code,
from Thomas Graf.
10) tcp_recvmsg() unconditionally calls ipv4 ip_recv_error even for
ipv6 sockets, whoops. Fix from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks
bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK
bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE
net: Check for presence of IFLA_AF_SPEC
net: Validate IFLA_BRIDGE_MODE attribute length
bridge: Validate IFLA_BRIDGE_FLAGS attribute length
stmmac: platform: fix default values of the filter bins setting
net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
net: dsa: bcm_sf2: reset switch prior to initialization
net: dsa: bcm_sf2: fix unmapping registers in case of errors
tg3: fix ring init when there are more TX than RX channels
tcp: fix possible NULL dereference in tcp_vX_send_reset()
rtlwifi: Change order in device startup
rtlwifi: rtl8821ae: Fix 5G detection problem
Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse"
vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX]
ip6_udp_tunnel: Fix checksum calculation
net-timestamp: Fix a documentation typo
net/ping: handle protocol mismatching scenario
af_packet: fix sparse warning
...
Add a new directory heirarchy under the debugfs sunrpc/ directory:
sunrpc/
rpc_xprt/
<xprt id>/
Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.
For now, this directory just holds an "info" file, but we may add
other files to it in the future.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.
Add a new directory hierarchy under debugfs:
sunrpc/
rpc_clnt/
<clientid>/
Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Instead of keeping track of all those special cases where
VLAN interfaces have no bss_conf.chandef, just make sure
they have the same as the AP interface they belong to.
Among others, this fixes a crash getting a VLAN's channel
from userspace since a NULL channel is returned as a good
result (return value 0) for VLANs since the commit below.
Cc: stable@vger.kernel.org [3.18 only]
Fixes: c12bc4885f ("mac80211: return the vif's chandef in ieee80211_cfg_get_channel()")
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[rewrite commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
One of the cases for an invalid channel definition is that
the channel pointer is NULL, in which case the warning is
a bit late since we'll dereference the pointer. Bail out
of the function upon warning about this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This resolves linking problems with CONFIG_IPV6=n:
net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'
Reported-by: Andreas Ruprecht <rupran@einserver.de>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the missing bits to allow to match per meta l4proto from
the bridge. Example:
nft add rule bridge filter input ether type {ip, ip6} meta l4proto udp counter
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch exports the functions nft_reject_iphdr_validate and
nft_reject_ip6hdr_validate to use it in follow up patches.
These functions check if the IPv4/IPv6 header is correct.
Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
add a __nfct_init_offset annotation member to struct nf_conn to make
it clear which members are covered by the memset when the conntrack
is allocated.
This avoids zeroing timer_list and ct_net; both are already inited
explicitly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The maximum value for the hitcount parameter is given by
"ip_pkt_list_tot" parameter (default: 20).
Exceeding this value on the command line will cause the rule to be
rejected. The parameter is also readonly, i.e. it cannot be changed
without module unload or reboot.
Store size per table, then base nstamps[] size on the hitcount instead.
The module parameter is retained for backwards compatibility.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The Bluetooth spec states that automatically flushable packets may not
be sent over a LE-U link.
Signed-off-by: Steven Walter <stevenrwalter@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUdPv2AAoJENfLVL+wpUDrJkMQAKjtPZHLMcj+eHm4f1ZKLJxy
GSrUZV21TU9tL0NVE/5An8US6hoLwHpNXsW8o+gHTAeGRyiCmIaNXGd1Ql/4PYRH
zfzdNXoaJAh1N5iXX11fF3gOWqx/SolqzO2xLDVETK/3lAvq0VwMYoMElBQB6qQW
8sN3z8yVuz/9Ia9oGIFhqu1B6dcKPHkQDMtmsElGxeEX/+9yEg4HUKx+kZDtV0Uj
8/JM8Jh1FKRCQT/P6INkRItdY5KaSJGFc43BkC/8lbugfxa5XCyu/m/qMr9FJsDV
nM6rwaiVcmR/mvD3fL82+Jg/M+P9VUHQ1/Az0sV9G+fEoHH/1Mey3LfMzNpUmf9v
bykrPRuzXkPPQgN1VnjSaF2RF+CWwV9Nme1VVXM/zj8gHX1mcmQF/wPRxDuLjCrt
EObAFsvHOwDTZZmYp9bG5kc6IvwvT8aeeVQMJ4q4PSGD3w8AtoIyJDn+Ee0LFD1K
Zw0oZpTJpI4t7DVxGBSdo2wZWuMU/UKqGqGtGJ+ljXfTRuuq968Q5j5ujaA9vf0v
C9igYTU8hq4teMzhZrfR1jtTWoSS+5zamb1KtvAZy8gsht2PQVgE9xka2k/AV8uE
ul/w5HU4OV+QIrHNbiu7BE8B2Ags6smpdHMqn9fqLBwvG+JEwbWqk1zeTsajxzq+
hkvKkkMq6JjDbsDf96Yk
=YIru
-----END PGP SIGNATURE-----
Merge tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
"NFS: Client side changes for RDMA
These patches various bugfixes and cleanups for using NFS over RDMA, including
better error handling and performance improvements by using pad optimization.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
xprtrdma: Display async errors
xprtrdma: Enable pad optimization
xprtrdma: Re-write rpcrdma_flush_cqs()
xprtrdma: Refactor tasklet scheduling
xprtrdma: unmap all FMRs during transport disconnect
xprtrdma: Cap req_cqinit
xprtrdma: Return an errno from rpcrdma_register_external()
These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUdfhQAAoJENfLVL+wpUDr9x4QANWmG6xjEU7RuIBLalOoxit6
eXnEDNZlwYp6NCkYktHVWaTqXdwq7fdGX+3p4eiwNg3C3SrJHkpWvjeFd4KT/SyP
B/w3vYG3o5H01i3Mb5kCD2uW0gbS9soZXpIg+uHHOl43yZzneC0vPTLhQ/h+9zPc
B32XcgQhvLIHR3LWrC/+uolsa31lcyya0W45PX+iCpzHF7i9qRBrNLODkTlw1hNQ
eEnYIvVy5oW00zlHJUYiTHP3e+0EJn5PAngdYbqiboJ9mK7DbB0QDwqyvJbIT7ql
WAip6cNcJnSv1eiVYqDwlR1ok8drK5X7yQCT3lcLzAMDznLsSAL1Itu0h2Ay3z61
f8XCyTwI0izq0DbdrMcPoqPSitqyM8nkPElnOuitXwzEroPaG40OF67yss3+ixbl
JeQZ+u35pnpCkKUaZdCK3Pn83StxmUaBcFx8eg30NBc0SN13Eiz6aZcGEperClrR
RwMLDUhUtAMcMRunRRxiN9lHafPqqeDeJre7uky0p0sU9CsH+1n5qIKLmk+Ber2d
ZS29TobdR7ktfjQ52XazMAIFzI7r1v4Zn7ziH/WRbvKoKw9ICcyoUGNy9CS5LyWu
BxWCZAJmby5H1i7V7ituJK8TImb81L4aPW06hHX9k+0SoTrgFjas//4jZYfysJ9N
PvR0MRNfMFhuBlsTj7Gy
=PMdS
-----END PGP SIGNATURE-----
Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next
Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
"NFS: Generic client side changes from Chuck
These patches fixes for iostats and SETCLIENTID in addition to cleaning
up the nfs4_init_callback() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"
* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
NFS: Clean up nfs4_init_callback()
NFS: SETCLIENTID XDR buffer sizes are incorrect
SUNRPC: serialize iostats updates
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.
This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.
Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only search for IFLA_EXT_MASK if the message actually carries a
ifinfomsg header and validate minimal length requirements for
IFLA_EXT_MASK.
Fixes: 6cbdceeb ("bridge: Dump vlan information from a bridge port")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: c2d3babf ("bridge: implement multicast fast leave")
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Payload is currently accessed blindly and may exceed valid message
boundaries.
Fixes: 407af3299 ("bridge: Add netlink interface to configure vlans on bridge ports")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having it as a sub-event for RSSI thresholds is very ugly,
but luckily no userspace actually uses the events yet.
Move the event to its own function call internally and to
its own event attribute in nl80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's receive queue to simplify its relevant code cemplexity.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link's deferred queue, simplifying relevant code.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are
used to flag whether or not more messages can be bundled into a data
packet in the outgoing transmission queue. Obviously, no more messages
can be appended after the packet has been sent and is waiting to be
acknowledged and deleted. These message types do in reality represent
a send-side local implementation flag, and are not defined as part of
the protocol. It is therefore safe to move it to to where it belongs,
that is, the control area (TIPC_SKB_CB) of the buffer.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original tipc_link_push_packet(), it pushes messages from protocol
message queue, retransmission queue and next_out queue. But as the two
first queues are removed, we can simplify its relevant code through
deleting tipc_link_push_queue().
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC retransmission queue is intended to record which messages
should be retransmitted when bearer is not congested. However,
as the retransmission queue becomes useless with the removal of
bearer congestion mechanism, it should be removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIPC protocol message queue is intended to save one protocol message
when bearer is congested so that the message stored in the queue can
be immediately transmitted when bearer congestion is released. However,
as now the protocol queue has no mission any more with the removal of
bearer congestion mechanism, it should be removed.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The node subscribe infrastructure represents a virtual base class, so
its users, such as struct tipc_port and struct publication, can derive
its implemented functionalities. However, after the removal of struct
tipc_port, struct publication is left as its only single user now. So
defining an abstract infrastructure for one user becomes no longer
reasonable. If corresponding new functions associated with the
infrastructure are moved to name_table.c file, the node subscription
infrastructure can be removed as well.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "init_net" test in function addrconf_exit_net is introduced
in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
function addrconf_init_net will allocate memory for every net regardless of
init_net. In this case, it is unnecessary to make "init_net" test.
CC: Hong Zhiguo <honkiko@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Cong Wang <cwang@twopensource.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change remote checksum offload to call remcsum_adjust. This also
eliminates the optimization to skip an IP header as part of the
adjustment (really does not seem to be much of a win).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FQ/pacing has a clamp of delay of 125 ms, to avoid some possible harm.
It turns out this delay is too small to allow pacing low rates :
Some ISP setup very aggressive policers as low as 16kbit.
Now TCP stack has spurious rtx prevention, it seems safe to increase
this fixed parameter, without adding a qdisc attribute.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Much of the code can be shared by moving it into helper functions
for the CQM event sending.
Also move the code closer together, even in the header file.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull nfsd bugfixes from Bruce Fields:
"These fix one mishandling of the case when security labels are
configured out, and two races in the 4.1 backchannel code"
* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix slot wake up race in the nfsv4.1 callback code
SUNRPC: Fix locking around callback channel reply receive
nfsd: correctly define v4.2 support attributes
Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.
Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
After commit ca777eff51 ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.
If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191
Reported-by: Jaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 5195c14c8b.
If the conntrack clashes with an existing one, it is left out of
the unconfirmed list, thus, crashing when dropping the packet and
releasing the conntrack since golden rule is that conntracks are
always placed in any of the existing lists for traceability reasons.
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP checksum calculation for VXLAN tunnels is currently using the
socket addresses instead of the actual packet source and destination
addresses. As a result the checksum calculated is incorrect in some
cases.
Also uh->check was being set twice, first it was set to 0, and then it is
set again in udp6_set_csum. This change removes the redundant assignment
to 0.
Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.")
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An async error upcall is a hard error, and should be reported in
the system log.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The Linux NFS/RDMA server used to reject NFSv3 WRITE requests when
pad optimization was enabled. That bug was fixed by commit
e560e3b510 ("svcrdma: Add zero padding if the client doesn't send
it").
We can now enable pad optimization on the client, which helps
performance and is supported now by both Linux and Solaris servers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.
1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
different threads using the same wc buffers (ep->rep_recv_wcs
and ep->rep_send_wcs), added by commit 1c00dd0776 ("xprtrmda:
Reduce calls to ib_poll_cq() in completion handlers").
During transport disconnect processing, this sometimes resulted
in the same reply getting added to the rpcrdma_tasklets_g list
more than once, which corrupted the list.
2. The upcall functions drain only a limited number of CQEs,
thanks to the poll budget added by commit 8301a2c047
("xprtrdma: Limit work done by completion handler").
Fixes: a7bc211ac9 ("xprtrdma: On disconnect, don't ignore ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Restore the separate function that schedules the reply handling
tasklet. I need to call it from two different paths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When using RPCRDMA_MTHCAFMR memory registration, after a few
transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
return EINVAL because the provider has exhausted its map pool.
Make sure that all FMRs are unmapped during transport disconnect,
and that ->send_request remarshals them during an RPC retransmit.
This resets the transport's MRs to ensure that none are leaked
during a disconnect.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.
Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.
The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.
For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).
For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.
Some providers hold back dispatching Work Requests until a CQE is
generated. If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.
I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.
Cap the rep_cqinit setting so completions are not left turned off
for too long.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The RPC/RDMA send_request method and the chunk registration code
expects an errno from the registration function. This allows
the upper layers to distinguish between a recoverable failure
(for example, temporary memory exhaustion) and a hard failure
(for example, a bug in the registration logic).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit e1bd95bf7c ("crypto: algif - zeroize IV buffer") and
2a6af25bef ("crypto: algif - zeroize message digest buffer")
added memzero_explicit() calls on buffers that are later on
passed back to sock_kfree_s().
This is a discussed follow-up that, instead, extends the sock
API and adds sock_kzfree_s(), which internally uses kzfree()
instead of kfree() for passing the buffers back to slab.
Having sock_kzfree_s() allows to keep the changes more minimal
by just having a drop-in replacement instead of adding
memzero_explicit() calls everywhere before sock_kfree_s().
In kzfree(), the compiler is not allowed to optimize the memset()
away and thus there's no need for memzero_explicit(). Both,
sock_kfree_s() and sock_kzfree_s() are wrappers for
__sock_kfree_s() and call into kfree() resp. kzfree(); here,
__sock_kfree_s() needs to be explicitly inlined as we want the
compiler to optimize the call and condition away and thus it
produces e.g. on x86_64 the _same_ assembler output for
sock_kfree_s() before and after, and thus also allows for
avoiding code duplication.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If there are no channels allowing 80 MHz to be used, then the
station isn't really VHT capable even if the driver and device
support it in general. In this case, exclude the VHT capability
IE from probe request frames.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We have a channel pointer, and we use its center frequency
to look up a channel pointer - which will thus be exactly
the same as the original pointer.
Remove that pointless lookup and just use the pointer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's always set to the same value as CONFIG_TRACEPOINTS, so we can just
use that instead.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af_packet produces lots of these:
net/packet/af_packet.c:384:39: warning: incorrect type in return expression (different modifiers)
net/packet/af_packet.c:384:39: expected struct page [pure] *
net/packet/af_packet.c:384:39: got struct page *
this seems to be because sparse does not realize that _pure
refers to function, not the returned pointer.
Tweak code slightly to avoid the warning.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using GRE redirection in WCCP, it sets the wrong skb->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Yuri Chislov <yuri.chislov@gmail.com>
Tested-by: Yuri Chislov <yuri.chislov@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter/ipvs updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, this includes the NAT redirection support for nf_tables, the
cgroup support for nft meta and conntrack zone support for the connlimit
match. Coming after those, a bunch of sparse warning fixes, missing
netns bits and cleanups. More specifically, they are:
1) Prepare IPv4 and IPv6 NAT redirect code to use it from nf_tables,
patches from Arturo Borrero.
2) Introduce the nf_tables redir expression, from Arturo Borrero.
3) Remove an unnecessary assignment in ip_vs_xmit/__ip_vs_get_out_rt().
Patch from Alex Gartrell.
4) Add nft_log_dereference() macro to the nf_log infrastructure, patch
from Marcelo Leitner.
5) Add some extra validation when registering logger families, also
from Marcelo.
6) Some spelling cleanups from stephen hemminger.
7) Fix sparse warning in nf_logger_find_get().
8) Add cgroup support to nf_tables meta, patch from Ana Rey.
9) A Kconfig fix for the new redir expression and fix sparse warnings in
the new redir expression.
10) Fix several sparse warnings in the netfilter tree, from
Florian Westphal.
11) Reduce verbosity when OOM in nfnetlink_log. User can basically do
nothing when this situation occurs.
12) Add conntrack zone support to xt_connlimit, again from Florian.
13) Add netnamespace support to the h323 conntrack helper, contributed
by Vasily Averin.
14) Remove unnecessary nul-pointer checks before free_percpu() and
module_put(), from Markus Elfring.
15) Use pr_fmt in nfnetlink_log, again patch from Marcelo Leitner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...so we can keep track of when calls are sent and replies received.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
...just around svc_send, svc_recv and svc_process for now.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The supported bandwidth field is a two-bit field, not a bitmap,
so treat it accordingly when disabling 80+80 or 160 MHz.
Note that we can only advertise "80+80 and 160" or "160", not
"80+80" by itself, so disabling 160 also disables 80+80.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
... and make it handle multi-segment iovecs - deals with that
"fix this later" issue for free. A bit of shame, really - it
had been there since 2.3.15pre3 when the whole thing went into the
tree, practically a historical artefact by now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now the vti_link_ops do not point the .dellink, for fb tunnel device
(ip_vti0), the net_device will be removed as the default .dellink is
unregister_netdevice_queue,but the tunnel still in the tunnel list,
then if we add a new vti tunnel, in ip_tunnel_find():
hlist_for_each_entry_rcu(t, head, hash_node) {
if (local == t->parms.iph.saddr &&
remote == t->parms.iph.daddr &&
link == t->parms.link &&
==> type == t->dev->type &&
ip_tunnel_key_match(&t->parms, flags, key))
break;
}
the panic will happen, cause dev of ip_tunnel *t is null:
[ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
[ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
[ 3835.073008] Oops: 0000 [#1] SMP
.....
[ 3835.073008] Stack:
[ 3835.073008] ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
[ 3835.073008] ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
[ 3835.073008] ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
[ 3835.073008] Call Trace:
[ 3835.073008] [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
[ 3835.073008] [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
[ 3835.073008] [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
[ 3835.073008] [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
[ 3835.073008] [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
[ 3835.073008] [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
[ 3835.073008] [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
[ 3835.073008] [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
[ 3835.073008] [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
[ 3835.073008] [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
[ 3835.073008] [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
....
modprobe ip_vti
ip link del ip_vti0 type vti
ip link add ip_vti0 type vti
rmmod ip_vti
do that one or more times, kernel will panic.
fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
which we skip the unregister of fb tunnel device. do the same on ip6_vti.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.
No changes to the resultant object files result as determined by objdiff.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
the IPv6 GSO offloads. Without this change VXLAN tunnels running over IPv6
do not currently handle IPv4 TCP TSO requests correctly and end up handing
the non-segmented frame off to the device.
Below is the before and after for a simple netperf TCP_STREAM test between
two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
a 1Gb/s network adapter.
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.29 0.88 Before
87380 16384 16384 10.03 895.69 After
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ieee802154/fakehard.c
A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.
Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix BUG when decrypting empty packets in mac80211, from Ronald Wahl.
2) nf_nat_range is not fully initialized and this is copied back to
userspace, from Daniel Borkmann.
3) Fix read past end of b uffer in netfilter ipset, also from Dan
Carpenter.
4) Signed integer overflow in ipv4 address mask creation helper
inet_make_mask(), from Vincent BENAYOUN.
5) VXLAN, be2net, mlx4_en, and qlcnic need ->ndo_gso_check() methods to
properly describe the device's capabilities, from Joe Stringer.
6) Fix memory leaks and checksum miscalculations in openvswitch, from
Pravin B SHelar and Jesse Gross.
7) FIB rules passes back ambiguous error code for unreachable routes,
making behavior confusing for userspace. Fix from Panu Matilainen.
8) ieee802154fake_probe() doesn't release resources properly on error,
from Alexey Khoroshilov.
9) Fix skb_over_panic in add_grhead(), from Daniel Borkmann.
10) Fix access of stale slave pointers in bonding code, from Nikolay
Aleksandrov.
11) Fix stack info leak in PPP pptp code, from Mathias Krause.
12) Cure locking bug in IPX stack, from Jiri Bohac.
13) Revert SKB fclone memory freeing optimization that is racey and can
allow accesses to freed up memory, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (71 commits)
tcp: Restore RFC5961-compliant behavior for SYN packets
net: Revert "net: avoid one atomic operation in skb_clone()"
virtio-net: validate features during probe
cxgb4 : Fix DCB priority groups being returned in wrong order
ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
openvswitch: Don't validate IPv6 label masks.
pptp: fix stack info leak in pptp_getname()
brcmfmac: don't include linux/unaligned/access_ok.h
cxgb4i : Don't block unload/cxgb4 unload when remote closes TCP connection
ipv6: delete protocol and unregister rtnetlink when cleanup
net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
bonding: fix curr_active_slave/carrier with loadbalance arp monitoring
mac80211: minstrel_ht: fix a crash in rate sorting
vxlan: Inline vxlan_gso_check().
can: m_can: update to support CAN FD features
can: m_can: fix incorrect error messages
can: m_can: add missing delay after setting CCCR_INIT bit
can: m_can: fix not set can_dlc for remote frame
can: m_can: fix possible sleep in napi poll
can: m_can: add missing message RAM initialization
...
John W. Linville says:
====================
pull request: wireless-next 2014-11-21
Please pull this batch of updates intended for the 3.19 stream...
For the mac80211 bits, Johannes says:
"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people"
For the Bluetooth bits, Johan says:
"Here's another set of patches for 3.19. Most of it is again fixes and
cleanups to ieee802154 related code from Alexander Aring. We've also got
better handling of hardware error events along with a proper API for HCI
drivers to notify the HCI core of such situations. There's also a minor
fix for mgmt events as well as a sparse warning fix. The code for
sending HCI commands synchronously also gets a fix where we might loose
the completion event in the case of very fast HW (particularly easily
reproducible with an emulated HCI device)."
And...
"Here's another bluetooth-next pull request for 3.19. We've got:
- Various fixes, cleanups and improvements to ieee802154/mac802154
- Support for a Broadcom BCM20702A1 variant
- Lots of lockdep fixes
- Fixed handling of LE CoC errors that should trigger SMP"
For the Atheros bits, Kalle says:
"One ath6kl patch and rest for ath10k, but nothing really major which
stands out. Most notable:
o fix resume (Bartosz)
o firmware restart is now faster and more reliable (Michal)
o it's now possible to test hardware restart functionality without
crashing the firmware using hw-restart parameter with
simulate_fw_crash debugfs file (Michal)"
On top of that...both ath9k and mwifiex get their usual level of
updates. Of note is the ath9k spectral scan work from Oleksij Rempel.
I also pulled from the wireless tree in order to avoid some merge issues.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a6111d3c "vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device"
intended to enable hardware time stamping on VLAN interfaces, but
passing SIOCSHWTSTAMP is only half of the story. This patch adds
the second half, by letting user space find out the time stamping
capabilities of the device backing a VLAN interface.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c3ae62af8e ("tcp: should drop incoming frames without ACK
flag set") was created to mitigate a security vulnerability in which a
local attacker is able to inject data into locally-opened sockets by
using TCP protocol statistics in procfs to quickly find the correct
sequence number.
This broke the RFC5961 requirement to send a challenge ACK in response
to spurious RST packets, which was subsequently fixed by commit
7b514a886b ("tcp: accept RST without ACK flag").
Unfortunately, the RFC5961 requirement that spurious SYN packets be
handled in a similar manner remains broken.
RFC5961 section 4 states that:
... the handling of the SYN in the synchronized state SHOULD be
performed as follows:
1) If the SYN bit is set, irrespective of the sequence number, TCP
MUST send an ACK (also referred to as challenge ACK) to the remote
peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgment, TCP MUST drop the unacceptable
segment and stop processing further.
By sending an ACK, the remote peer is challenged to confirm the loss
of the previous connection and the request to start a new connection.
A legitimate peer, after restart, would not have a TCB in the
synchronized state. Thus, when the ACK arrives, the peer should send
a RST segment back with the sequence number derived from the ACK
field that caused the RST.
This RST will confirm that the remote peer has indeed closed the
previous connection. Upon receipt of a valid RST, the local TCP
endpoint MUST terminate its connection. The local TCP endpoint
should then rely on SYN retransmission from the remote end to
re-establish the connection.
This patch lets SYN packets through the discard added in c3ae62af8e,
so that spurious SYN packets are properly dealt with as per the RFC.
The challenge ACK is sent unconditionally and is rate-limited, so the
original vulnerability is not reintroduced by this patch.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not sure what I was thinking, but doing anything after
releasing a refcount is suicidal or/and embarrassing.
By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
could have released last reference and freed whole skb.
We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.
Reported-by: Chris Mason <clm@fb.com>
Fixes: ce1a4ea3f1 ("net: avoid one atomic operation in skb_clone()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.
This command supports dumping the name table of all nodes.
Netlink logical layout of name table response message:
-> name table
-> publication
-> type
-> lower
-> upper
-> scope
-> node
-> ref
-> key
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NET_SET command to the new tipc netlink API.
This command can set the network id and network (tipc) address.
Netlink logical layout of network set message:
-> net
[ -> id ]
[ -> address ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NET_GET command to the new tipc netlink API.
This command dumps the network id of the node.
Netlink logical layout of returned network data:
-> net
-> id
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_NODE_GET to the new tipc netlink API.
This command can dump the address and node status of all nodes in the
tipc cluster.
Netlink logical layout of returned node/address data:
-> node
-> address
-> up flag
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_MEDIA_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
media.
Netlink logical layout of bearer set message:
-> media
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_MEDIA_GET command to the new tipc netlink API.
This command supports dumping all information about all defined
media as well as getting all information about a specific media.
The information about a media includes name and link properties.
Netlink logical layout of media get response message:
-> media
-> name
-> link properties
-> tolerance
-> priority
-> window
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_RESET_STATS command to the new netlink API.
This command resets the link statistics for a particular link.
Netlink logical layout of link reset message:
-> link
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_SET to the new tipc netlink API.
This command can set one or more link properties for a particular
link.
Netlink logical layout of link set message:
-> link
-> name
-> properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_LINK_GET command to the new tipc netlink API.
This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).
The information about a link includes name, transmission info,
properties and link statistics.
As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.
Netlink logical layout of link response message:
-> port
-> name
-> MTU
-> RX
-> TX
-> up flag
-> active flag
-> properties
-> priority
-> tolerance
-> window
-> statistics
-> rx_info
-> rx_fragments
-> rx_fragmented
-> rx_bundles
-> rx_bundled
-> tx_info
-> tx_fragments
-> tx_fragmented
-> tx_bundles
-> tx_bundled
-> msg_prof_tot
-> msg_len_cnt
-> msg_len_tot
-> msg_len_p0
-> msg_len_p1
-> msg_len_p2
-> msg_len_p3
-> msg_len_p4
-> msg_len_p5
-> msg_len_p6
-> rx_states
-> rx_probes
-> rx_nacks
-> rx_deferred
-> tx_states
-> tx_probes
-> tx_nacks
-> tx_acks
-> retransmitted
-> duplicates
-> link_congs
-> max_queue
-> avg_queue
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_PUBL_GET command to the new tipc netlink API.
This command supports dumping of all publications for a specific
socket.
Netlink logical layout of request message:
-> socket
-> reference
Netlink logical layout of response message:
-> publication
-> type
-> lower
-> upper
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_SOCK_GET command to the new tipc netlink API.
This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.
The information about a socket includes reference, address, connection
information / publication information.
Netlink logical layout of response message:
-> socket
-> reference
-> address
[
-> connection
-> node
-> socket
[
-> connected flag
-> type
-> instance
]
]
[
-> publication flag
]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_BEARER_SET command to the new tipc netlink API.
This command can set one or more link properties for a particular
bearer.
Netlink logical layout of bearer set message:
-> bearer
-> name
-> link properties
[ -> tolerance ]
[ -> priority ]
[ -> window ]
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TIPC_NL_BEARER_GET command to the new tipc netlink API.
This command supports dumping all data about all bearers or getting
all information about a specific bearer.
The information about a bearer includes name, link priorities and
domain.
Netlink logical layout of bearer get message:
-> bearer
-> name
Netlink logical layout of returned bearer information:
-> bearer
-> name
-> link properties
-> priority
-> tolerance
-> window
-> domain
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new netlink API for tipc that can disable or enable a tipc bearer.
The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.
The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.
Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE
Netlink logical layout of bearer enable message:
-> bearer
-> name
[ -> domain ]
[
-> properties
-> priority
]
Netlink logical layout of bearer disable message:
-> bearer
-> name
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's just silly to hold the skb destructor argument around inside
skb->cb[] as we currently do in SCTP.
Nowadays, we're sort of cheating on data accounting in the sense
that due to commit 4c3a5bdae2 ("sctp: Don't charge for data in
sndbuf again when transmitting packet"), we orphan the skb already
in the SCTP output path, i.e. giving back charged data memory, and
use a different destructor only to make sure the sk doesn't vanish
on skb destruction time. Thus, cb[] is still valid here as we
operate within the SCTP layer. (It's generally actually a big
candidate for future rework, imho.)
However, storing the destructor in the cb[] can easily cause issues
should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
layer, since cb[] may get overwritten by lower layers and thus can
corrupt the chunk pointer. There are no such issues at present,
but lets keep the chunk in destructor_arg, as this is the actual
purpose for it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So it can be used from out of openvswitch code.
Did couple of cosmetic changes on the way, namely variable naming and
adding support for 8021AD proto.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use them to push skb->vlan_tci into the payload and avoid code
duplication.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Name fits better. Plus there's going to be introduced
__vlan_insert_tag later on.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always returns the same skb it gets, so change to void.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace duplicated code by calling skb_postpull_rcsum
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains two bugfixes for your net tree, they are:
1) Validate netlink group from nfnetlink to avoid an out of bound array
access. This should only happen with superuser priviledges though.
Discovered by Andrey Ryabinin using trinity.
2) Don't push ethernet header before calling the netfilter output hook
for multicast traffic, this breaks ebtables since it expects to see
skb->data pointing to the network header, patch from Linus Luessing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-11-20
Please full this little batch of fixes intended for the 3.18 stream!
For the mac80211 patch, Johannes says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
On top of that...
Ben Greear fixes an ath9k regression in which a BSSID mask is
miscalculated.
Dmitry Torokhov corrects an error handling routing in brcmfmac which
was checking an unsigned variable for a negative value.
Johannes Berg avoids a build problem in brcmfmac for arches where
linux/unaligned/access_ok.h and asm/unaligned.h conflict.
Mathy Vanhoef addresses another brcmfmac issue so as to eliminate a
use-after-free of the URB transfer buffer if a timeout occurs.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an old regression introduced by commit
b0d0d915 (ipx: remove the BKL).
When a recvmsg syscall blocks waiting for new data, no data can be sent on the
same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
This breaks mars-nwe (NetWare emulator):
- the ncpserv process reads the request using recvmsg
- ncpserv forks and spawns nwconn
- ncpserv calls a (blocking) recvmsg and waits for new requests
- nwconn deadlocks in sendmsg on the same socket
Commit b0d0d915 has simply replaced BKL locking with
lock_sock/release_sock. Unlike now, BKL got unlocked while
sleeping, so a blocking recvmsg did not block a concurrent
sendmsg.
Only keep the socket locked while actually working with the socket data and
release it prior to calling skb_recv_datagram().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When userspace doesn't provide a mask, OVS datapath generates a fully
unwildcarded mask for the flow by copying the flow and setting all bits
in all fields. For IPv6 label, this creates a mask that matches on the
upper 12 bits, causing the following error:
openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)
This patch ignores the label validation check for masks, avoiding this
error.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUbghkAAoJEDBSmw7B7bqroosP/RABYXUMua+k3Ccq7T4eU4jV
AEO2p2gt5nHBzEl1NCJtdUTJkZ6ftT7ehvAkV2zboB0FBoAoBTbZ8YDtcBBiWaY6
wJ5TYuOl6LFo7csAxWxpCzUxW3M+iq26itpyW9Zt9WWxP4QLSNPFyXEXV3SEh45n
HcDW9A0SE+mgdaTQ2LEMBJ5XWxG/Ic7i9Xn6Py3o4x7NsTB4EqFNOD0WXcPCq7M0
H8xlsIYIBYoGNMsV/2Nu7CEgcSXfDLqWcs9uPHQMCvWPjx/vIoEyOgTwJlE9bQHh
2tloc1LBP6XKQ6g2bJ/pBaQVnZGugcOJhD6KUq3ckNm9qIP1ZtRmJslH4V6pUSCS
eGl3TcAKSSE4BWIa7AaETWXeoH4X68dO7PM7pnflQRCQLzCJRbDWCdqjBst/AxBT
6hvAFAvExEcWBkNVSTJ2egRk/C9cDFKRaCWQ1h4wX9yvh+8efe1D0DDWLW9a9qv1
LsoGJE72BZdXn2CaQEME+CjTd3fWmn6u729d/c863cq2kspCSOof0QD0X9uWhBUx
BvqtgbQjGZzAvHFcjBd7yRd5hz0aDfLyBL59bq2IBzaU1QmyekNPqzSMSD+5ZlCp
uxEeE5AY2+pcNZV1KRtkvgAByfUgAVd0FHZcVb8SIM6QZ3IhqiOuzxuXtxv6hrYP
V/76B+ath4Sv1IPF56ex
=q4e6
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-john-2014-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:
"It has been a while since my last pull request, so we accumulated
another relatively large set of changes:
* TDLS off-channel support set from Arik/Liad, with some support
patches I did
* custom regulatory fixes from Arik
* minstrel VHT fix (and a small optimisation) from Felix
* add back radiotap vendor namespace support (myself)
* random MAC address scanning for cfg80211/mac80211/hwsim (myself)
* CSA improvements (Luca)
* WoWLAN Net Detect (wake on network found) support (Luca)
* and lots of other smaller changes from many people"
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The functions free_percpu() and module_put() test whether their argument
is NULL and then return immediately. Thus the test around the call is
not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently we can't lookup tunnels with wildcard endpoints.
This patch adds a method to lookup these tunnels in the
receive path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
pim6_protocol was added when initiation, but it not deleted.
Similarly, unregister RTNL_FAMILY_IP6MR rtnetlink.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use {compat_,}rw_copy_check_uvector(). As the result, we are
guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
and ->recvmsg() will pass access_ok().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kernel-side struct msghdr is (currently) using the same layout as
userland one, but it's not a one-to-one copy - even without considering
32bit compat issues, we have msg_iov, msg_name and msg_control copied
to kernel[1]. It's fairly localized, so we get away with a few functions
where that knowledge is needed (and we could shrink that set even
more). Pretty much everything deals with the kernel-side variant and
the few places that want userland one just use a bunch of force-casts
to paper over the differences.
The thing is, kernel-side definition of struct msghdr is *not* exposed
in include/uapi - libc doesn't see it, etc. So we can add struct user_msghdr,
with proper annotations and let the few places that ever deal with those
beasts use it for userland pointers. Saner typechecking aside, that will
allow to change the layout of kernel-side msghdr - e.g. replace
msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
to modify the iovec as we copy data to/from it, etc.
We could introduce kernel_msghdr instead, but that would create much more
noise - the absolute majority of the instances would need to have the
type switched to kernel_msghdr and definition of struct msghdr in
include/linux/socket.h is not going to be seen by userland anyway.
This commit just introduces user_msghdr and switches the few places that
are dealing with userland-side msghdr to it.
[1] actually, it's even trickier than that - we copy msg_control for
sendmsg, but keep the userland address on recvmsg.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
depending on the value of some uninitialised stack.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUa70EAAoJEDBSmw7B7bqr40gQAMgNUKILuYTyHZaKR7teV8aG
uVFzrMdmNjCZIsxAuH1X8VNOJZEAB+Ro+lTybNjC/tTPyA+Yu3lDCQjSYiE8UOFk
mDrNFG0tjZHGn3axceLIhujfPqXEks/wQ98tDEJU2eMLxS0nClwyTXjhc1IxzUBO
/L2HlhGWADRZ6FIV7yhRO33KN1l3Q8oxW79ybJq/5JCcclfGBt3OpYvi7AW6M4iK
qfgTfS6GbpcRRc4+HVA/cgYOFaRmMI71bvqMgvh8MgmqfQeRw+WSOaUiLx795CsR
OKrIQgdpVp2JHwHkFXIgBy+EBBbXtCozvxKX3u+YKPabj4GC/6dkT370nuvv9hkz
IzTxt+yNBTqJawum7vM6WypgT4wCZOR0JbY0vDW5jYLJnDfGJkYFbny7WtRtCVf7
xWzIixkyWLNFs6swbRdIKoU5GNw2fNe7az4fRoUqNtZGSPHiK7oD5sEkr81hWUV5
5XsifngqekfdVvIGTckpy2SAikxgkrtHOQFIGJz+PN1CjvDT++66Opr3AHnwWeSP
IAjmoVOCiyIBzXlcsZJmAfPa4Hvk+NAPRKCx6N25q/+ESpOZRARWFRV3VuPeGzQW
+SiC4V9VYWHvW9VPJ37WkAFYv5tPaLLyURdU7p6MSwuc0RCPgtcvXty2QFbjQLdf
k0apC9DZydQPF2kre6Pv
=L2Jb
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"Here's another last minute fix, for minstrel HT crashing
depending on the value of some uninitialised stack."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The __module_get() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The proc_remove() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the queue mapping earlier, skb->queue_mapping is more likely than
skb->data to still be in d-cache.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows drivers with a firmware or chip-based rate lookup table to
use the most recent default rate selection without having to get it from
per-packet data or explicit ieee80211_get_tx_rate calls
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Let the other listeners being notified when a new or del interface
command has been issued, thus reducing later necessary request to be in
sync with current context.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Replace NL80211_ATTR_IFACE_SOCKET_OWNER attribute with more generic
NL80211_ATTR_SOCKET_OWNER that can be used with other commands
that interface creation.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Callback add_virtual_intf is supposed to return ERR_PTR and trying to
return NULL results in some "Unable to handle kernel paging request",
etc. As it may be complicated to debug & trace, let's catch it (WARN).
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Explicitly initialize the DFS state and beacon found state when handling
channels in the custom regulatory path.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some channels fields were not being updated in the custom regulatory
path. Update them according to the code in handle_channel().
Signed-off-by: Jonathan Doron <jonathanx.doron@intel.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The rate mask code currently assumes that a rate is legacy if
IEEE80211_TX_RC_MCS is not set. This might be the cause of bogus VHT
rates being reported with minstrel_ht.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When 20MHz chandef is used, 40MHz rates shouldn't be
used (by the rate-control algorithm), even if the sta
ht capabilities indicate support for it.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Singed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow drivers to support NL80211_SCAN_FLAG_RANDOM_ADDR with software
based scanning and generate a random MAC address for them for every
scan request with the flag.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to use the scan and scheduled scan request pointers during
RX to check for randomisation, make them accessible using RCU.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the necessary feature flags and a scan flag to support using
random MAC addresses for scan while unassociated.
The configuration for this supports an arbitrary MAC address
value and mask, so that any kind of configuration (e.g. fixed
OUI or full 46-bit random) can be requested. Full 46-bit random
is the default when no other configuration is passed.
Also add a small helper function to use the addr/mask correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
local->scan_req was tested in the previous line, so it
can't be NULL.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new WoWLAN API to enable net-detect as a wake up trigger.
Net-detect allows the device to scan in the background while the
host is asleep to wake up the host system when a matching network
is found.
Reuse the scheduled scan attributes to specify how the scan is
performed while suspended and the matches that will trigger a
wake event.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For net detect, we will need to reuse most of the scheduled scan
parsing function, but not all, so split out the attributes parsing
part out of the main start sched_scan function.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In TDLS (e.g., TDLS off-channel) there is a requirement for
some drivers to supply an unused TID between the AP and the
device to the FW, to allow sending PTI requests and to allow
the FW to aggregate on a specific TID for better throughput.
To ensure that the allocated TID is indeed unused, this patch
introduces an API for blocking the driver from TXing on that
TID.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the HW supports IEEE80211_HW_QUEUE_CONTROL, allow
flushing only specific queues rather than all of them.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When receiving a TDLS channel switch request or response, parse the frame
and call a new tdls_recv_channel_switch op in the low level driver with
the parsed data.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement the cfg80211 TDLS channel switch ops and introduce new mac80211
ones for low-level drivers.
Verify low-level driver support for the new ops when using the relevant
wiphy feature bit. Also verify the peer supports channel switching before
passing the command down.
Add a new STA flag to track the off-channel state with the TDLS peer and
make sure to cancel the channel-switch if the peer STA is unexpectedly
removed.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These are used in TDLS channel switching code.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce commands to initiate and cancel TDLS channel-switching. Once
TDLS channel-switching is started, the lower level driver is responsible
for continually initiating channel-switch operations and returning to
the base (AP) channel to listen for beacons from time to time.
Upon cancellation of the channel-switch all communication between the
relevant TDLS peers will continue on the base channel.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Split the data-generating from the Tx-sending functionality, as we do
not want to send templates to the lower driver. Also add an optional
chandef argument to the data-generating portion. It will be used for
channel-switch templates.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The AP or peer can prohibit TDLS channel switch via a bit in the
extended capabilities IE. Parse the IE and track this bit. Set an
appropriate STA flag if both the AP and peer STA support TDLS
channel-switching.
Add the new STA flag and the missing TDLS_INITIATOR to debugfs.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Define some related TDLS protocol constants and advertise channel switch
support in the extended-capabilities IE when the feature bit is defined.
Actually supporting TDLS channel-switching also requires support for
some new nl80211 commands, to be introduced by future patches.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the BSS coex IE in case we support HT40 channels, as mandated by
section 8.5.13 in IEEE802.11 2012.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This information element is mandatory in case TDLS channel-switching is to
be supported. The channels given are ones supported and allowed to be
active in the current regulatory setting.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For some TDLS channel switch implementations data frames need to be
sent by the firmware based on a template. This template should be
created by mac80211, and thus needs to properly be built from an
802.3 frame into an 802.11 frame. In addition, the device will need
the key information so the select_key handler needs to be run.
However, the driver/device will be responsible for all of the crypto
encapsulation, as the sequence numbers etc. cannot be built by the
host anyway in this case since it's a template to be used multiple
times.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Factor out the 802.11 header building code from the xmit function
to be able to use it separately in a later commit.
While at it, fix up some documentation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Instead of passing the band as a parameter to ieee80211_xmit()
and ieee80211_tx(), move it outside of the two functions while
making sure info->band is set up before calling them.
This removes the parameter and simplifies the follow commit.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the TDLS peer station might not receive the teardown
packet (e.g., when in PS), this makes sure the packet is
retransmitted - this time through the AP - if the TDLS peer
didn't ACK the packet.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allows setting of an skb's flags - if needed - when calling
ieee80211_subif_start_xmit().
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch converts the hdev->link_keys list to be protected through
RCU, thereby eliminating the need to hold the hdev lock while accessing
the list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When a connection is requested the conn->pending_sec_level value gets
set to whatever level the user requested the connection to be. During
the pairing process there are various sanity checks to try to ensure
that the right length PIN or right IO Capability is used to satisfy the
target security level. However, when we finally get hold of the link key
that is to be used we should still set the actual final security level
from the key type.
This way when we eventually get an Encrypt Change event the correct
value gets copied to conn->sec_level.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In __hci_cmd_sync_ev() and __hci_req_sync() if the hci_req_run() call
fails and we return from the functions we should ensure that the state
doesn't remain in TASK_INTERRUPTIBLE that we just set it to. This patch
fixes missing calls to set_current_state(TASK_RUNNING) in both places.
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The commit 5935839ad7
"mac80211: improve minstrel_ht rate sorting by throughput & probability"
introduced a crash on rate sorting that occurs when the rate added to
the sorting array is faster than all the previous rates. Due to an
off-by-one error, it reads the rate index from tp_list[-1], which
contains uninitialized stack garbage, and then uses the resulting index
for accessing the group rate stats, leading to a crash if the garbage
value is big enough.
Cc: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg <johan.hedberg@gmail.com> says:
"Here's another bluetooth-next pull request for 3.19. We've got:
- Various fixes, cleanups and improvements to ieee802154/mac802154
- Support for a Broadcom BCM20702A1 variant
- Lots of lockdep fixes
- Fixed handling of LE CoC errors that should trigger SMP"
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This brings in some mwifiex changes that further patches will
need to work on top to not cause merge conflicts.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Doing things like hci_conn_hash_flush() while holding the hdev lock is
risky since its synchronous pending work cancellation could cause the
L2CAP layer to try to reacquire the hdev lock. Right now there doesn't
seem to be any obvious places where this would for certain happen but
it's already enough to cause lockdep to start warning against the hdev
and the work struct locks being taken in the "wrong" order:
[ +0.000373] mgmt-tester/1603 is trying to acquire lock:
[ +0.000292] ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181
[ +0.000270]
but task is already holding lock:
[ +0.000000] (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359
[ +0.000000]
which lock already depends on the new lock.
[ +0.000000]
the existing dependency chain (in reverse order) is:
[ +0.000000]
-> #1 (&hdev->lock){+.+.+.}:
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c140c663>] mutex_lock_nested+0x54/0x375
[ +0.000000] [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c
[ +0.000000] [<c13d7ca4>] process_pending_rx+0x50/0x5e
[ +0.000000] [<c1041a3f>] process_one_work+0x21c/0x436
[ +0.000000] [<c1041e3d>] worker_thread+0x1be/0x251
[ +0.000000] [<c1045a22>] kthread+0x94/0x99
[ +0.000000] [<c140f801>] ret_from_kernel_thread+0x21/0x30
[ +0.000000]
-> #0 ((&conn->pending_rx_work)){+.+.+.}:
[ +0.000000] [<c105e158>] __lock_acquire+0xa07/0xc89
[ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156
[ +0.000000] [<c1042696>] flush_work+0x29/0x181
[ +0.000000] [<c1042864>] __cancel_work_timer+0x76/0x8f
[ +0.000000] [<c104288c>] cancel_work_sync+0xf/0x11
[ +0.000000] [<c13d4c18>] l2cap_conn_del+0x72/0x183
[ +0.000000] [<c13d8953>] l2cap_disconn_cfm+0x49/0x55
[ +0.000000] [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3
[ +0.000000] [<c13b9af6>] hci_dev_do_close+0x1dc/0x359
[ +0.012038] [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3
[ +0.000000] [<c12d33c1>] vhci_release+0x28/0x47
[ +0.000000] [<c10dd6a9>] __fput+0xd6/0x154
[ +0.000000] [<c10dd757>] ____fput+0xd/0xf
[ +0.000000] [<c1044bb2>] task_work_run+0x6b/0x8d
[ +0.000000] [<c1001bd2>] do_notify_resume+0x3c/0x3f
[ +0.000000] [<c140fa70>] work_notifysig+0x29/0x31
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0 CPU1
[ +0.000000] ---- ----
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000] lock(&hdev->lock);
[ +0.000000] lock((&conn->pending_rx_work));
[ +0.000000]
*** DEADLOCK ***
Fully fixing this would require some quite heavy refactoring to change
how the hdev lock and hci_conn instances are handled together. A simpler
solution for now which this patch takes is to try ensure that the hdev
workqueue is empty before proceeding with the various cleanup calls,
including hci_conn_hash_flush().
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The common short form of "randomizer" is "rand" in many places
(including the Bluetooth specification). The shorter version also makes
for easier to read code with less forced line breaks. This patch renames
all occurences of "randomizer" to "rand" in the Bluetooth subsystem
code.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For now the mgmt commands dealing with remote OOB data are strictly
BR/EDR-only. This patch fixes missing checks for the passed address type
so that any non-BR/EDR value triggers the appropriate error response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently all the conntrack lookups are done using default zone.
In case the skb has a ct attached (e.g. template) we should use this zone
for lookups instead. This makes connlimit work with connections assigned
to other zones.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
for both locally generated IGMP and MLD queries. The IP header specific
filter options are off by 14 Bytes for netfilter (actual output on
interfaces is fine).
NF_HOOK() expects the skb->data to point to the IP header, not the
ethernet one (while dev_queue_xmit() does not). Luckily there is an
br_dev_queue_push_xmit() helper function already - let's just use that.
Introduced by eb1d164143
("bridge: Add core IGMP snooping support")
Ebtables example:
$ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP
before (broken):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
IPv6 priority=0x3, Next Header=2
after (working):
~EBT: IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
IPv6 priority=0x0, Next Header=0
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make sure the netlink group exists, otherwise you can trigger an out
of bound array memory access from the netlink_bind() path. This splat
can only be triggered only by superuser.
[ 180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
[ 180.204249] index 9 is out of range for type 'int [9]'
[ 180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
[ 180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
+04/01/2014
[ 180.206498] 0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
[ 180.207220] ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
[ 180.207887] ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
[ 180.208639] Call Trace:
[ 180.208857] dump_stack (lib/dump_stack.c:52)
[ 180.209370] ubsan_epilogue (lib/ubsan.c:174)
[ 180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
[ 180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
[ 180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
[ 180.211495] SYSC_bind (net/socket.c:1541)
Moreover, define the missing nf_tables and nf_acct multicast groups too.
Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch changes the byteorder handling for short and panid handling.
We now except to get little endian in nl802154 for these attributes.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the 802.15.4 constraints WPAN_NUM_ defines into
"net/ieee802154.h" which should contain all necessary 802.15.4 related
information. Also rename these defines to a common name which is
IEEE802154_MAX_CHANNEL and IEEE802154_MAX_PAGE.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for deleting a wpan interface via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting an extended address while
registration a new interface. If ieee802154_is_valid_extended_addr
getting as parameter and invalid extended address then the perm address
is fallback. This is useful to make some default handling while for
example default registration of a wpan interface while phy registration.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new nl802154 command for adding a new interface
according to a wpan phy via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This parameter was grabbed from wireless implementation with the
identically wireless dev struct. We don't need this right now and so we
remove it. Maybe we will add it later again if we found any real reason
to have such parameter.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch replace the depracted IEEE802154_DEV to the new introduced
NL802154_IFTYPE_NODE types. There is a backwards compatibility to have
the identical types for both enum definitions. Also remove some inlcude
issue with "linux/nl802154.h", because the export nl_policy inside this
header it was always necessary to have an include of "net/rtnetlink.h"
before. The reason for this is more complicated. Nevertheless we removed
this now, because "linux/nl802154.h" is the depracted netlink interface.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We don't and we can't name it zigbee anymore. This patch removes
deprecated information for project website.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as
const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patches removes the const keyword in variables which are non
pointers. There is no sense to declare call by value parameters as const.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes some prototypes which are not used anymore.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.
Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.
This patch provides infrastructure to help drivers doing the right thing.
netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.
A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.
Tested:
myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72
myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
0: 0 1 2 3 4 5 6 7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
Following fixes are accumulated in ovs-repo.
Three of them are related to protocol processing, one is
related to memory leak in case of error and one is to
fix race.
Patch "Validate IPv6 flow key and mask values" has conflicts
with net-next, Let me know if you want me to send the patch
for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Solves possible lockup issues that can be seen from firmware DCB agents calling
into the DCB app api.
DCB firmware event queues can be tied in with NAPI so that dcb events are
generated in softIRQ context. This can results in calls to dcb_*app()
functions which try to take the dcb_lock.
If the the event triggers while we also have the dcb_lock because lldpad or
some other agent happened to be issuing a get/set command we could see a cpu
lockup.
This code was not originally written with firmware agents in mind, hence
grabbing dcb_lock from softIRQ context was not considered.
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to limit the amount of possible links to a
neighboring node to 2. If we have more then two bearers we can also
establish more links.
Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
Reviewed-By: Jon Maloy <jon.maloy@ericsson.com>
cc: Ying Xue <ying.xue@windriver.com>
cc: Erik Hugne <erik.hugne@ericsson.com>
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Fix missing initialization of the range structure (allocated in the
stack) in nft_masq_{ipv4, ipv6}_eval, from Daniel Borkmann.
2) Make sure the data we receive from userspace contains the req_version
structure, otherwise return an error incomplete on truncated input.
From Dan Carpenter.
3) Fix handling og skb->sk which may cause incorrect handling
of connections from a local process. Via Simon Horman, patch from
Calvin Owens.
4) Fix wrong netns in nft_compat when setting target and match params
structure.
5) Relax chain type validation in nft_compat that was recently included,
this broke the matches that need to be run from the route chain type.
Now iptables-test.py automated regression tests report success again
and we avoid the only possible problematic case, which is the use of
nat targets out of nat chain type.
6) Use match->table to validate the tablename, instead of the match->name.
Again patch for nft_compat.
7) Restore the synchronous release of objects from the commit and abort
path in nf_tables. This is causing two major problems: splats when using
nft_compat, given that matches and targets may sleep and call_rcu is
invoked from softirq context. Moreover Patrick reported possible event
notification reordering when rules refer to anonymous sets.
8) Fix race condition in between packets that are being confirmed by
conntrack and the ctnetlink flush operation. This happens since the
removal of the central spinlock. Thanks to Jesper D. Brouer to looking
into this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trying to add an unreachable route incorrectly returns -ESRCH if
if custom FIB rules are present:
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: Network is unreachable
[root@localhost ~]# ip rule add to 55.66.77.88 table 200
[root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
RTNETLINK answers: No such process
[root@localhost ~]#
Commit 83886b6b63 ("[NET]: Change "not found"
return value for rule lookup") changed fib_rules_lookup()
to use -ESRCH as a "not found" code internally, but for user space it
should be translated into -ENETUNREACH. Handle the translation centrally in
ipv4-specific fib_lookup(), leaving the DECnet case alone.
On a related note, commit b7a71b51ee
("ipv4: removed redundant conditional") removed a similar translation from
ip_route_input_slow() prematurely AIUI.
Fixes: b7a71b51ee ("ipv4: removed redundant conditional")
Signed-off-by: Panu Matilainen <pmatilai@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
- Stable patches to fix NFSv4.x delegation reclaim error paths
- Fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features
- Fix a use-after-free problem with pNFS block layouts
- Fix a memory leak in the pNFS files O_DIRECT code
- Replace an intrusive and Oops-prone performance fix in the NFSv4 atomic
open code with a safer one-line version and revert the two original patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUZol9AAoJEGcL54qWCgDyNQQQALnngvpPR51BoO/iTz9ruXol
fGZy0SRIlTUKm1ArsQsQ+HGbV5K0hgP3Tg+z2AtEEZ8u/2Fi2Bqdl6+eNY12tKHd
uUctDdM5TXLrETAn1UULrnd2eX1cvPMBfOlXlAdNHHsGEgC7w7YQ+rzGwnls+HDy
LYXzY7Y3jYGdTMaRgZc5YRdtd8JBpCxciRvPEQLDIobwP0JnZC1afTLe1XInqB2I
TZ4NTHT+DEWA+Ou1P2deL7+RuJNEAeWWBvULJy76n4BqKvN/HNedOO5HyBYXrwSd
3UX3wbx9CWRxN1F0EqNKxjxZ/597JwqBeNoTDRcofLsqumUfAOtlbym1EahcD3Ls
pykopNfgUhGuhxolStmuHdS6CnyQPERpR5lFZcDp7XtcwSq4FcwD8DRzLJMZW5dg
N1lkfFlwQN3rqdk/NEHL+IxS41Hlk4HXjMoP6MNbRtqzIN6tW9tvC4MtAWd1aYxO
YuUW281pbWxXQ731s0kTIrMUdQ9vGSRBMcbnO9rL3o+xkh8y5SPVkx9lhdhJN0UD
VbQ5Ws/xZ54bD1PfyYb+Yx659lI8MSFOsDuMuLmDtfYnVicHwCA3H63StvQ3ihf/
q0gu8Iex9YbNNjf7IfYGuWPmPn3gwPBoURPC0bcZvMPdY6DXodU6Oj4BRTQ5VCie
9N0pt2wp2eRjaSzD7r5A
=8YN6
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- stable patches to fix NFSv4.x delegation reclaim error paths
- fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
features
- fix a use-after-free problem with pNFS block layouts
- fix a memory leak in the pNFS files O_DIRECT code
- replace an intrusive and Oops-prone performance fix in the NFSv4
atomic open code with a safer one-line version and revert the two
original patches"
* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
NFS: Don't try to reclaim delegation open state if recovery failed
NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
NFS: SEEK is an NFS v4.2 feature
nfs: Fix use of uninitialized variable in nfs_getattr()
nfs: Remove bogus assignment
nfs: remove spurious WARN_ON_ONCE in write path
pnfs/blocklayout: serialize GETDEVICEINFO calls
nfs: fix pnfs direct write memory leak
Revert "NFS: nfs4_do_open should add negative results to the dcache."
Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
When passed BDADDR_ANY the Remove Remote OOB Data comand is specified to
clear all entries. This patch adds the necessary check and calls
hci_remote_oob_data_clear() when necessary.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds some extra debug logs to L2CAP related code. These are
mainly to help track locking issues but will probably be useful for
debugging other types of issues as well.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that the SMP related key lists are converted to RCU there is nothing
in smp_cmd_sign_info() or smp_cmd_ident_addr_info() that would require
taking the hdev lock (including the smp_distribute_keys call). This
patch removes this unnecessary locking.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch set converts the hdev->identity_resolving_keys list to use
RCU to eliminate the need to use hci_dev_lock/unlock.
An additional change that must be done is to remove use of
CRYPTO_ALG_ASYNC for the hdev-specific AES crypto context. The reason is
that this context is used for matching RPAs and the loop that does the
matching is under the RCU read lock, i.e. is an atomic section which
cannot sleep.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch set converts the hdev->long_term_keys list to use RCU to
eliminate the need to use hci_dev_lock/unlock.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The insufficient authentication/encryption errors indicate to the L2CAP
client that it should try to elevate the security level. Since there
really isn't any exception to this rule it makes sense to fully handle
it on the kernel side instead of pushing the responsibility to user
space.
This patch adds special handling of these two error codes and calls
smp_conn_security() with the elevated security level if necessary.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
So far smp_sufficient_security() has returned false if we're encrypted
with an STK but do have an LTK available. However, for the sake of LE
CoC servers we do want to let the incoming connection through even
though we're only encrypted with the STK.
This patch adds a key preference parameter to smp_sufficient_security()
with two possible values (enum used instead of bool for readability).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For LE CoC L2CAP servers we don't do security level elevation during the
BT_CONNECT2 state (instead LE CoC simply sends an immediate error
response if the security level isn't high enough). Therefore if we get a
security level change while an LE CoC channel is in the BT_CONNECT2
state we should simply do nothing.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.
Introduced by commit 6093ae9aba ("openvswitch: Minimize
dp and vport critical sections").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
match_validate() enforce that a mask matching on NDP attributes has also an
exact match on ICMPv6 type.
The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of
'struct sw_flow_key', which is 16-bit wide.
Therefore, an exact match on ICMPv6 type should only check the first 8 bits.
This commit fixes a bug that prevented flows with an exact match on NDP field
from being installed
Introduced by commit 03f0d916aa ("openvswitch: Mega flow implementation").
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
The checksum of ICMPv6 packets uses the IP pseudoheader as part of
the calculation, unlike ICMP in IPv4. This was not implemented,
which means that modifying the IP addresses of an ICMPv6 packet
would cause the checksum to no longer be correct as the psuedoheader
did not match.
Introduced by commit 3fdbd1ce11 ("openvswitch: add ipv6 'set' action").
Reported-by: Neal Shrader <icosahedral@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Need to free memory in case of sample action error.
Introduced by commit 651887b0c2 ("openvswitch: Sample
action without side effects").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
John W. Linville says:
====================
pull request: wireless 2014-11-13
Please pull this set of a few more wireless fixes intended for the
3.18 stream...
For the mac80211 bits, Johannes says:
"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."
For the iwlwifi bits, Emmanuel says:
"Two fixes here - we weren't updating mac80211 if a scan
was cut short by RFKILL which confused cfg80211. As a
result, the latter wouldn't allow to run another scan.
Liad fixes a small bug in the firmware dump."
On top of that...
Arend van Spriel corrects a channel width conversion that caused a
WARNING in brcmfmac.
Hauke Mehrtens avoids a NULL pointer dereference in b43.
Larry Finger hits a trio of rtlwifi bugs left over from recent
backporting from the Realtek vendor driver.
Miaoqing Pan fixes a clocking problem in ath9k that could affect
packet timestamps and such.
Stanislaw Gruszka addresses an payload alignment issue that has been
plaguing rt2x00.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After removal of the central spinlock nf_conntrack_lock, in
commit 93bb0ceb75 ("netfilter: conntrack: remove central
spinlock nf_conntrack_lock"), it is possible to race against
get_next_corpse().
The race is against the get_next_corpse() cleanup on
the "unconfirmed" list (a per-cpu list with seperate locking),
which set the DYING bit.
Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list.
While at this, fix coding style of the comment that has been
updated.
Fixes: 93bb0ceb75 ("netfilter: conntrack: remove central spinlock nf_conntrack_lock")
Reported-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: bill bonaparte <programme110@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add dependency on INET to fix following build error. I have also
fixed MPLS dependency.
ERROR: "ip_route_output_flow" [net/openvswitch/openvswitch.ko]
undefined!
make[1]: *** [__modpost] Error 1
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/chelsio/cxgb4vf/sge.c
drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c
sge.c was overlapping two changes, one to use the new
__dev_alloc_page() in net-next, and one to use s->fl_pg_order in net.
ixgbe_phy.c was a set of overlapping whitespace changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) sunhme driver lacks DMA mapping error checks, based upon a report by
Meelis Roos.
2) Fix memory leak in mvpp2 driver, from Sudip Mukherjee.
3) DMA memory allocation sizes are wrong in systemport ethernet driver,
fix from Florian Fainelli.
4) Fix use after free in mac80211 defragmentation code, from Johannes
Berg.
5) Some networking uapi headers missing from Kbuild file, from Stephen
Hemminger.
6) TUN driver gets csum_start offset wrong when VLAN accel is enabled,
and macvtap has a similar bug, from Herbert Xu.
7) Adjust several tunneling drivers to set dev->iflink after registry,
because registry sets that to -1 overwriting whatever we did. From
Steffen Klassert.
8) Geneve forgets to set inner tunneling type, causing GSO segmentation
to fail on some NICs. From Jesse Gross.
9) Fix several locking bugs in stmmac driver, from Fabrice Gasnier and
Giuseppe CAVALLARO.
10) Fix spurious timeouts with NewReno on low traffic connections, from
Marcelo Leitner.
11) Fix descriptor updates in enic driver, from Govindarajulu
Varadarajan.
12) PPP calls bpf_prog_create() with locks held, which isn't kosher.
Fix from Takashi Iwai.
13) Fix NULL deref in SCTP with malformed INIT packets, from Daniel
Borkmann.
14) psock_fanout selftest accesses past the end of the mmap ring, fix
from Shuah Khan.
15) Fix PTP timestamping for VLAN packets, from Richard Cochran.
16) netlink_unbind() calls in netlink pass wrong initial argument, from
Hiroaki SHIMODA.
17) vxlan socket reuse accidently reuses a socket when the address
family is different, so we have to explicitly check this, from
Marcelo Lietner.
18) Fix missing include in nft_reject_bridge.c breaking the build on ppc
and other architectures, from Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
vxlan: Do not reuse sockets for a different address family
smsc911x: power-up phydev before doing a software reset.
lib: rhashtable - Remove weird non-ASCII characters from comments
net/smsc911x: Fix delays in the PHY enable/disable routines
net/smsc911x: Fix rare soft reset timeout issue due to PHY power-down mode
netlink: Properly unbind in error conditions.
net: ptp: fix time stamp matching logic for VLAN packets.
cxgb4 : dcb open-lldp interop fixes
selftests/net: psock_fanout seg faults in sock_fanout_read_ring()
net: bcmgenet: apply MII configuration in bcmgenet_open()
net: bcmgenet: connect and disconnect from the PHY state machine
net: qualcomm: Fix dependency
ixgbe: phy: fix uninitialized status in ixgbe_setup_phy_link_tnx
net: phy: Correctly handle MII ioctl which changes autonegotiation.
ipv6: fix IPV6_PKTINFO with v4 mapped
net: sctp: fix memory leak in auth key management
net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
net: ppp: Don't call bpf_prog_create() in ppp_lock
net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
cxgb4 : Fix bug in DCB app deletion
...
In DC world, GSO packets initially cooked by tcp_sendmsg() are usually
big, as sk_pacing_rate is high.
When network is congested, cwnd can be smaller than the GSO packets
found in socket write queue. tcp_write_xmit() splits GSO packets
using the available cwnd, and we end up sending a single GSO packet,
consuming all available cwnd.
With GRO aggregation on the receiver, we might handle a single GRO
packet, sending back a single ACK.
1) This single ACK might be lost
TLP or RTO are forced to attempt a retransmit.
2) This ACK releases a full cwnd, sender sends another big GSO packet,
in a ping pong mode.
This behavior does not fill the pipes in the best way, because of
scheduling artifacts.
Make sure we always have at least two GSO packets in flight.
This allows us to safely increase GRO efficiency without risking
spurious retransmits.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reallocation is only required for shrinking and expanding and both rely
on a mutex for synchronization and callers of rhashtable_init() are in
non atomic context. Therefore, no reason to continue passing allocation
hints through the API.
Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
for silent fall back to vzalloc() without the OOM killer jumping in as
pointed out by Eric Dumazet and Eric W. Biederman.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mutex_is_held can only test locks in the that are global
since it takes no arguments. This prevents rhashtable from being
used in places where locks are lock, e.g., per-namespace locks.
This patch adds a parent field to mutex_is_held and rhashtable_params
so that local locks can be used (and tested).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netfilter so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rhashtable function mutex_is_held is only used when PROVE_LOCKING
is enabled. This patch modifies netlink so that we can rhashtable.h
itself can later make mutex_is_held optional depending on PROVE_LOCKING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Large receive offloading is known to cause problems if received packets
are passed to other host. Therefore the kernel disables it by calling
dev_disable_lro() whenever a network device is enslaved in a bridge or
forwarding is enabled for it (or globally). For virtual devices we need
to disable LRO on the underlying physical device (which is actually
receiving the packets).
Current dev_disable_lro() code handles this propagation for a vlan
(including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
It doesn't handle other stacked devices and their combinations, in
particular propagation from a bond to its slaves which often causes
problems in virtualization setups.
As we now have generic data structures describing the upper-lower device
relationship, dev_disable_lro() can be generalized to disable LRO also
for all lower devices (if any) once it is disabled for the device
itself.
For bonding and teaming devices, it is necessary to disable LRO not only
on current slaves at the moment when dev_disable_lro() is called but
also on any slave (port) added later.
v2: use lower device links for all devices (including vlan and macvlan)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’:
net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type]
Fixes: a8c5f90fb5 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
kick_requests() can put linger requests on the notarget list. This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).
AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd. Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().
MON=1 OSD=1:
# cat remove-osd.sh
#!/bin/bash
rbd create --size 1 test
DEV=$(rbd map test)
ceph osd out 0
sleep 3
rbd map dne/dne # obtain a new osdmap as a side effect
rbd unmap $DEV & # will block
sleep 3
ceph osd in 0
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
In case of OOM, there's nothing userspace can do.
If there's no room to put the payload in __build_packet_message(),
jump to nla_put_failure which already performs the corresponding
error reporting.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
net/bridge/br_netfilter.c:870:6: symbol 'br_netfilter_enable' was not declared. Should it be static?
no; add include
net/ipv4/netfilter/nft_reject_ipv4.c:22:6: symbol 'nft_reject_ipv4_eval' was not declared. Should it be static?
yes
net/ipv6/netfilter/nf_reject_ipv6.c:16:6: symbol 'nf_send_reset6' was not declared. Should it be static?
no; add include
net/ipv6/netfilter/nft_reject_ipv6.c:22:6: symbol 'nft_reject_ipv6_eval' was not declared. Should it be static?
yes
net/netfilter/core.c:33:32: symbol 'nf_ipv6_ops' was not declared. Should it be static?
no; add include
net/netfilter/xt_DSCP.c:40:57: cast truncates bits from constant value (ffffff03 becomes 3)
net/netfilter/xt_DSCP.c:57:59: cast truncates bits from constant value (ffffff03 becomes 3)
add __force, 3 is what we want.
net/ipv4/netfilter/nf_log_arp.c:77:6: symbol 'nf_log_arp_packet' was not declared. Should it be static?
yes
net/ipv4/netfilter/nf_reject_ipv4.c:17:6: symbol 'nf_send_reset' was not declared. Should it be static?
no; add include
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For a long time we couldn't actually use __xfrm_policy_link in
xfrm_policy_insert because the latter wanted to do hashing at
a specific position.
Now that __xfrm_policy_link no longer does hashing it can now
be safely used in xfrm_policy_insert to kill some duplicate code,
finally reuniting general policies with socket policies.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Back in 2003 when I added policy expiration, I half-heartedly
did a clean-up and renamed xfrm_sk_policy_link/xfrm_sk_policy_unlink
to __xfrm_policy_link/__xfrm_policy_unlink, because the latter
could be reused for all policies. I never actually got around
to using __xfrm_policy_link for non-socket policies.
Later on hashing was added to all xfrm policies, including socket
policies. In fact, we don't need hashing on socket policies at
all since they're always looked up via a linked list.
This patch restores xfrm_sk_policy_link/xfrm_sk_policy_unlink
as wrappers around __xfrm_policy_link/__xfrm_policy_unlink so
that it's obvious we're dealing with socket policies.
This patch also removes hashing from __xfrm_policy_link as for
now it's only used by socket policies which do not need to be
hashed. Ironically this will in fact allow us to use this helper
for non-socket policies which I shall do later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Server channels in BT_LISTEN state should use L2CAP_NESTING_PARENT. This
patch fixes the nesting value for the 6lowpan channel.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There's no reason why all users of L2CAP would need to worry about
initializing chan->nesting to L2CAP_NESTING_NORMAL (which is important
since 0 is the same as NESTING_SMP). This patch moves the initialization
to the common place that's used to create all new channels, i.e. the
l2cap_chan_create() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The teardown callback for L2CAP channels is problematic in that it is
explicitly called for all types of channels from l2cap_chan_del(),
meaning it's not possible to hard-code a nesting level when taking the
socket lock. The simplest way to have a correct nesting level for the
socket locking is to use the same value as for the chan. This also means
that the other places trying to lock parent sockets need to be update to
use the chan value (since L2CAP_NESTING_PARENT is defined as 2 whereas
SINGLE_DEPTH_NESTING has the value 1).
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
By default lockdep considers all L2CAP channels equal. This would mean
that we get warnings if a channel is locked when another one's lock is
tried to be acquired in the same thread. This kind of inter-channel
locking dependencies exist in the form of parent-child channels as well
as any channel wishing to elevate the security by requesting procedures
on the SMP channel.
To eliminate the chance for these lockdep warnings we introduce a
nesting level for each channel and use that when acquiring the channel
lock. For now there exists the earlier mentioned three identified
categories: SMP, "normal" channels and parent channels (i.e. those in
BT_LISTEN state). The nesting level is defined as atomic_t since we need
access to it before the lock is actually acquired.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new interframe spacing time handling into mac802154
layer. Interframe spacing time is a time period between each transmit.
This patch adds a high resolution timer into mac802154 and starts on
xmit complete with corresponding interframe spacing expire time if
ifs_handling is true. We make it variable because it depends if
interframe spacing time is handled by transceiver or mac802154. At the
timer complete function we wake the netdev queue again. This avoids
new frame transmit in range of interframe spacing time.
For synced driver we add no handling of interframe spacing time. This
is currently a lack of support in all synced xmit drivers. I suppose
it's working because the latency of workqueue which is needed to call
spi_sync.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Fix the build failures that result from the use of pr_debug
without the referenced char * arrays being defined.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if netlink_kernel_cfg::unbind is implemented the unbind() method is
not called, because cfg->unbind is omitted in __netlink_kernel_create().
And fix wrong argument of test_bit() and off by one problem.
At this point, no unbind() method is implemented, so there is no real
issue.
Fixes: 4f52090052 ("netlink: have netlink per-protocol bind function return an error code.")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of calling fou and gue functions directly from ip_tunnel
use ops for these that were previously registered. This patch adds the
logic to add and remove encapsulation operations for ip_tunnel,
and modified fou (and gue) to register with ip_tunnels.
This patch also addresses a circular dependency between ip_tunnel
and fou that was causing link errors when CONFIG_NET_IP_TUNNEL=y
and CONFIG_NET_FOU=m. References to fou an gue have been removed from
ip_tunnel.c
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Standardize function pointer uses.
Convert calling style from:
(*foo)(args...);
to:
foo(args...);
Other miscellanea:
o Add braces around loops with single ifs on multiple lines
o Realign arguments around these functions
o Invert logic in if to return immediately.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the normal kernel debugging mechanism which also
enables dynamic_debug at the same time.
Other miscellanea:
o Remove sysctl for irda_debug
o Remove function tracing like uses (use ftrace instead)
o Coalesce formats
o Realign arguments
o Remove unnecessary OOM messages
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing xtables matches and targets, when used from nft_compat, may
sleep from the destroy path, ie. when removing rules. Since the objects
are released via call_rcu from softirq context, this results in lockdep
splats and possible lockups that may be hard to reproduce.
Patrick also indicated that delayed object release via call_rcu can
cause us problems in the ordering of event notifications when anonymous
sets are in place.
So, this patch restores the synchronous object release from the commit
and abort paths. This includes a call to synchronize_rcu() to make sure
that no packets are walking on the objects that are going to be
released. This is slowier though, but it's simple and it resolves the
aforementioned problems.
This is a partial revert of c7c32e7 ("netfilter: nf_tables: defer all
object release via rcu") that was introduced in 3.16 to speed up
interaction with userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of the match->name, which is of course not relevant.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Check for nat chain dependency only, which is the one that can
actually crash the kernel. Don't care if mangle, filter and security
specific match and targets are used out of their scope, they are
harmless.
This restores iptables-compat with mangle specific match/target when
used out of the OUTPUT chain, that are actually emulated through filter
chains, which broke when performing strict validation.
Fixes: f3f5dde ("netfilter: nft_compat: validate chain type in match/target")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV4 which has unmet direct dependencies (NET && INET && NETFILTER && NF_NAT_IPV4)
warning: (NETFILTER_XT_TARGET_REDIRECT) selects NF_NAT_REDIRECT_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_NAT_IPV6)
Fixes: 8b13edd ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Fixes: 9de920e ("netfilter: refactor NAT redirect IPv6 code to use it from nf_tables")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The mgmt_user_passkey_request and related functions do not do anything
else except read access to hdev->id. This member never changes after the
hdev creation so there is no need to acquire a lock to read it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Any code calling bt_accept_dequeue() to get a new child socket from a
server socket should use lock_sock_nested to avoid lockdep warnings due
to the parent and child sockets being locked at the same time. The
l2cap_sock_accept() function is already doing this correctly but a
second place calling bt_accept_dequeue() is the code path from
l2cap_sock_teardown_cb() that calls l2cap_sock_cleanup_listen().
This patch fixes the proper nested locking annotation and thereby avoids
the following style of lockdep warning.
[ +0.000224] [ INFO: possible recursive locking detected ]
[ +0.000222] 3.17.0+ #1153 Not tainted
[ +0.000130] ---------------------------------------------
[ +0.000227] l2cap-tester/562 is trying to acquire lock:
[ +0.000210] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c1393f47>] bt_accept_dequeue+0x68/0x11b
[ +0.000467]
but task is already holding lock:
[ +0.000186] (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+...}, at: [<c13b949a>] lock_sock+0xa/0xc
[ +0.000421]
other info that might help us debug this:
[ +0.000199] Possible unsafe locking scenario:
[ +0.000117] CPU0
[ +0.000000] ----
[ +0.000000] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[ +0.000000] lock(sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP);
[ +0.000000]
*** DEADLOCK ***
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting listen before transmit mode via
nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch add support for setting mac frame retries setting via
nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch add support for max csma backoffs setting via nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting backoff exponents via nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting short address via nl802154 framework.
Also added a comment because a 0xffff seems to be valid address that we
don't have a short address. This is a valid setting but we need
more checks in upper layers to don't allow this address as source address.
Also the current netlink interface doesn't allow to set the short_addr
to 0xffff. Same for the 0xfffe short address which describes a not
allocated short address.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for setting pan_id via nl802154 framework.
Adding a comment because setting 0xffff as pan_id seems to be valid
setting. The pan_id 0xffff as source pan is invalid. I am not sure now
about this setting but for the current netlink interface this is an
invalid setting, so we do the same now. Maybe we need to change that
when we have coordinator support and association support.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds page and channel setting support to nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a netdev notifier for interface renaming. We have a name
attribute inside of subif data struct. This is needed to have always the
actual netdev name in sdata name attribute.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch changes the module description like wireless which is IEEE
802.11 "subsystem" and not "implementation".
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an unique id for an wpan_phy. This behaviour is mostly
grabbed from wireless stack. This is needed for upcomming patches which
identify the wpan netdev while NETDEV_CHANGENAME in netdev notify function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of always re-lock the iflist_mtx at multiple interfaces we lock
the complete for each loop at start and at the end.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch is just a cleanup to name the temporary variable for
protected list for each loop as tmp.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch move the iface unregistration into iface.c file to have
a behaviour which is similar like mac80211. Also iface handling should
be inside iface.c file only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
skb, either unconditionally setting ->sk to NULL or allowing
the uninitialized ->sk from a newly allocated skb to leak through
to the caller.
This patch properly copies ->sk and increments its reference count.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
And use the more common mechanisms directly.
Other miscellanea:
o Coalesce formats
o Add missing newlines
o Realign arguments
o Remove unnecessary OOM message logging as
there's a generic stack dump already on OOM.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
a module.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: c8e6ad0829 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there are only three neigh tables in the whole kernel:
arp table, ndisc table and decnet neigh table. What's more,
we don't support registering multiple tables per family.
Therefore we can just make these tables statically built-in.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:
unreferenced object 0xffff8800837047c0 (size 16):
comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
hex dump (first 16 bytes):
01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811c88d8>] __kmalloc+0xe8/0x270
[<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
[<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
[<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
[<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
[<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
[<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:
------------ INIT[PARAM: SET_PRIMARY_IP] ------------>
While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.
So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().
The trace for the log:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
<IRQ>
[<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
[<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
[<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]
A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.
Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.
All messages are still ratelimited.
Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled. Even so,
these messages are now _not_ emitted by default.
This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings". For backward compatibility,
the sysctl is not removed, but it has no function. The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c
Miscellanea:
o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
Following batch of patches brings feature parity between upstream
ovs and out of tree ovs module.
Two features are added, first adds support to export egress
tunnel information for a packet. This is used to improve
visibility in network traffic. Second feature allows userspace
vswitchd process to probe ovs module features. Other patches
are optimization and code cleanup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neaten and standardize the logging output.
Other miscellanea:
o Use pr_notice_once instead of a guard flag.
o Convert existing pr_<level> uses too.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_mark_napi_id() is used to record for a flow napi id of incoming
packets for busypoll sake.
We should do this only on established flows, not on listeners.
This was 'working' by virtue of the socket cloning, but doing
this on SYN packets in unecessary cache line dirtying.
Even if we move sk_napi_id in the same cache line than sk_lock,
we are working to make SYN processing lockless, so it is desirable
to set sk_napi_id only for established flows.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put duplicate detection into its own RX handler, and separate
out the conditions a bit to make the code more readable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When kfree() is all that's needed to free an object protected by RCU
there's a kfree_rcu() convenience function that can be used. This patch
updates the 6lowpan code to use this, thereby eliminating the need for
the separate peer_free() function.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We could be reading 8 bytes into a 4 byte buffer here. It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes a regression that was introduced by commit
cb77c3ec07. In addition to BT_CONFIG,
BT_CONNECTED is also a state in which we may get a remote name and need
to indicate over mgmt the connection status. This scenario is
particularly likely to happen for incoming connections that do not need
authentication since there the hci_conn state will reach BT_CONNECTED
before the remote name is received.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This fixes the following sparse warning:
net/bluetooth/amp.c:152:53: warning: Variable length array is used.
The warning itself is probably harmless since this kind of usage of
shash_desc is present also in other places in the kernel (there's even a
convenience macro SHASH_DESC_ON_STACK available for defining such stack
variables). However, dynamically allocated versions are also used in
several places of the kernel (e.g. kernel/kexec.c and lib/digsig.c)
which have the benefit of not exhibiting the sparse warning.
Since there are no more sparse warnings in the Bluetooth subsystem after
fixing this one it is now easier to spot whenever new ones might get
introduced by future patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When doing GRO processing for UDP tunnels, we never add
SKB_GSO_UDP_TUNNEL to gso_type - only the type of the inner protocol
is added (such as SKB_GSO_TCPV4). The result is that if the packet is
later resegmented we will do GSO but not treat it as a tunnel. This
results in UDP fragmentation of the outer header instead of (i.e.) TCP
segmentation of the inner header as was originally on the wire.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-11-07
Please pull this batch of updates intended for the 3.19 stream!
For the mac80211 bits, Johannes says:
"This relatively large batch of changes is comprised of the following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
For the Bluetooth bits, Johan says:
"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."
For the Atheros bits, Kalle says:
"Major changes are:
o ethtool support (Ben)
o print dev string prefix with debug hex buffers dump (Michal)
o debugfs file to read calibration data from the firmware verification
purposes (me)
o fix fw_stats debugfs file, now results are more reliable (Michal)
o firmware crash counters via debugfs (Ben&me)
o various tracing points to debug firmware (Rajkumar)
o make it possible to provide firmware calibration data via a file (me)
And we have quite a lot of smaller fixes and clean up."
For the iwlwifi bits, Emmanuel says:
"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."
Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.
This is a potential security problem since it means that whatever
we're probing may be invalid. This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets. In doing so it's processing iovec by hand and
overcomplicating things.
This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUYJNkAAoJEDBSmw7B7bqrgooP/ivTw/pJJ5i5wLsfAUUNV1tI
FDe3+w/kBsD0d0lNsnKoyq58iORyYIxHtmLhvFV+yuBMvpfADTNm9CkLDDjg939m
mU6/kjO6wbHv/SoLSL4rss/AggGaEsu+3ggGpmhXl2JApSUIVjw0z9XixNcU4Vnz
CEjIw9jsnKA5UOeH2cv+fwQLXImXKSsofAf2NgZ3zJ6Ci0zLOZtr/ZmO2o4IU9Do
HWp3CHsaehoWlaqtPx6r8JaodE7XPEPtim66aWoUaeSYueL9iYKJU7BS2jkTWJnQ
Kv8JuVQIQPlMsTuvTR8oHvpHZ29m0Z6mc8XsKeNu6UCNzoOBGS2Ak2YZwcAeqSLr
oQRDDBc4aKsY5YWcaChPx6N8gOEXRr4cQgfyY79o5vaPPJwixxLMPUUOU1rKHC3D
E61YfQ/tfoFWzEitksX14P3Gynas6abySku5mhM5y2L/1XFdDHugZjzRjcBv/1ry
J9x44k2EWbduvz8GBGGethmKMghKYYTqfB5rWh1cKxafgBEFt61ZcE4LuqaNnP17
sg+IlXFxZjJLnhhg0YL2oZt+++/CyCLl9G+mx6wZE4HLqsl26ElZ6ql2kekJ5baV
HKsPyBa2r48icJfrogA9WpjumI8b19ztG5OrsF0AQmK77IVSj80JFVnsTBal9/Y7
Ao2xMau9OghJq8AEJpOi
=H7HX
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-john-2014-11-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg <johannes@sipsolutions.net> says:
"This has just one fix, for an issue with the CCMP decryption
that can cause a kernel crash. I'm not sure it's remotely
exploitable, but it's an important fix nonetheless."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tuning coalescing parameters on NIC can be really hard.
Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.
To reach big GRO packets on 10Gbe NIC, one can use :
ethtool -C eth0 rx-usecs 4 rx-frames 44
But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.
Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.
This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.
Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.
This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().
Tested:
Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
Without this feature, we send back about 305,000 ACK per second.
GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)
Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.
Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average: eth0 811269.80 305732.30 1199462.57 19705.72 0.00
0.00 0.50
B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average: eth0 811577.30 19230.80 1199916.51 1239.80 0.00
0.00 0.50
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transferring from the original range in nf_nat_masquerade_{ipv4,ipv6}()
we copy over values from stack in from min_proto/max_proto due to uninitialized
range variable in both, nft_masq_{ipv4,ipv6}_eval. As we only initialize
flags at this time from nft_masq struct, just zero out the rest.
Fixes: 9ba1f726be ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow setting bandwidth related regulatory flags. These flags are mapped
to the corresponding channel flags in the specified range.
Make sure the new flags are consulted when calculating the maximum
bandwidth allowed by a regulatory-rule.
Also allow propagating the GO_CONCURRENT modifier from a reg-rule to a
channel.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Radiotap vendor namespace data might still be useful, but we
reverted it because it used too much space in the RX status.
Put it back, but address the space problem by using a single
bit only and putting everything else into the skb->data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For multi-vif channel switches, we want to send
NL80211_CMD_CH_SWITCH_NOTIFY to the userspace to let it decide whether
other interfaces need to be moved as well. This is needed when we
want a P2P GO interface to follow the channel of a station, for
example.
Modify the code so that all interfaces can send CSA notifications.
Additionally, send notifications for STA CSA as well.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Send a channel switch notification to userspace when a channel switch
is requested or when we react to a remote CSA.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new NL80211_CH_SWITCH_STARTED_NOTIFY message that can be sent to
the userspace when a channel switch process has started. This allows
userspace to take action, for instance, by requesting other interfaces
to switch channel as necessary.
This patch introduces a function that allows the drivers to send this
notification. It should be used when the driver starts processing a
channel switch initiated by a remote device (eg. when a STA receives a
CSA from the AP) and when it successfully starts a userspace-triggered
channel switch (eg. when hostapd triggers a channel swith in the AP).
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The device_timestamp value was left out of the event trace for
drv_pre_channel_switch by mistake. Add it.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There was a mistake when merging commit 6d027bcc (mac80211: add
pre_channel_switch driver operation) for upstream. The assignment of
the values in the ch_switch structure came below the call to
drv_pre_channel_switch. Fix the order.
Fixes: 6d027bcc (mac80211: add pre_channel_switch driver operation)
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This new flag is useful for suppressing error logging while probing
for datapath features using flow commands. For backwards
compatibility reasons the commands are executed normally, but error
logging is suppressed.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB. This also simplifies upcall handling
for gso packet.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS need to flow key for flow lookup in recic action. OVS
does key extract in recic action. Most of cases we could
use OVS_CB packet key directly and can avoid packet flow key
extract. SET action we can update flow-key along with packet
to keep it consistent. But there are some action like MPLS
pop which forces OVS to do flow-extract. In such cases we
can mark flow key as invalid so that subsequent recirc
action can do full flow extract.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
OVS vswitch has extended IPFIX exporter to export tunnel headers
to improve network visibility.
To export this information userspace needs to know egress tunnel
for given packet. By extending packet attributes datapath can
export egress tunnel info for given packet. So that userspace
can ask for egress tunnel info in userspace action. This
information is used to build IPFIX data for given flow.
Signed-off-by: Wenyu Zhang <wenyuz@vmware.com>
Acked-by: Romain Lenglet <rlenglet@vmware.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
vport can be compiled as modules, therefore openvswitch needs
to export few symbols. Export them as GPL symbols.
CC: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
This patch adds a netif_running check while trying to change the address
attributes via ioctl. While netif_running is true these attributes
should be only readable.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a hacked solution for an interface dump with a running
lowpan interface. This will crash because lowpan and wpan interface use
the same arphdr. To change the arphdr will change the UAPI, this patch
checks on mtu which should on lowpan interface always different than
IEEE802154_MTU.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds rtnl lock hold mechanism while accessing wpan_dev
attributes. Furthermore these attributes should be protected by rtnl
lock and netif_running only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the setting of dev_addr on a monitor device. This
address should be zero. A monitor should only sniff and send raw frames
out. The address should be never used by upper layers and receiving
frame parsing.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for wpan_dev dump via nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for dumping wpan_phy attributes via nl802154.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a basic nl802154 framework. Most of this code was
grabbed from nl80211 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds new sysfs entries for wpan_phy index and name. This
needed for the new 802.15.4 userspace tool.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a wpan_dev_list list into cfg802154_registered_device
struct. Also adding new wpan_dev into this list while
cfg802154_netdev_notifier_call. This behaviour is mostly grab from
wireless core.c implementation and is needed for preparing nl802154
framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an iftype argument to the wpan_dev. This is needed to
get the interface type from netdev ieee802154_ptr. The subif data struct
can only accessible in mac802154 branch.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a new cfg802154_rdev_list to remember all registered
cfg802154_registered_device structs. This is needed to prepare the
upcomming nl802154 framework.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch renames the wpan_phy_alloc function to wpan_phy_new. This
naming convention is like wireless and "wiphy_new" function.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the mac_params from subif data struct. Instead we
manipulate the wpan attributes directly.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves all mac pib attributes into the wpan_dev struct.
Furthermore we can easier access these attributes over the netdev
802154_ptr pointer. Currently this is only possible over a complicated
callback structure in mac802154 because subif data structure is
accessable inside mac802154 only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This allows you to filter traffic by process control group (cgroup).
Signed-off-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Remove the dependency on the "warning" sysctl (net_msg_warn)
which is only used by the LIMIT_NETDEBUG macro.
Convert the LIMIT_NETDEBUG use in DCCP_WARN to the more
common net_warn_ratelimited mechanism.
This still ratelimits based on the net_ratelimit()
function, but removes the check for the sysctl.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts commit:
a7807d73 ("Bluetooth: 6lowpan: Avoid memory leak if memory allocation
fails")
which was wrong suggested by Alexander Aring. The function skb_unshare
run also kfree_skb on failure.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.18.x
As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram. We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.
This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.
V3 - use bool per David Miller
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that both macvtap and tun are using skb_copy_datagram_iter, we
can kill the abomination that is skb_copy_datagram_const_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds skb_copy_datagram_iter, which is identical to
skb_copy_datagram_iovec except that it operates on iov_iter
instead of iovec.
Eventually all users of skb_copy_datagram_iovec should switch
over to iov_iter and then we can remove skb_copy_datagram_iovec.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-11-06
Please pull this batch of fixes intended for the 3.18 stream...
For the mac80211 bits, Johannes says:
"This contains another small set of fixes for 3.18, these are all
over the place and most of the bugs are old, one even dates back
to the original mac80211 we merged into the kernel."
For the iwlwifi bits, Emmanuel says:
"I fix here two issues that are related to the firmware
loading flow. A user reported that he couldn't load the
driver because the rfkill line was pulled up while we
were running the calibrations. This was happening while
booting the system: systemd was restoring the "disable
wifi settings" and that raised an RFKILL interrupt during
the calibration. Our driver didn't handle that properly
and this is now fixed."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar says:
====================
Open vSwitch
First two patches are related to OVS MPLS support. Rest of patches
are mostly refactoring and minor improvements to openvswitch.
v1-v2:
- Fix conflicts due to "gue: Remote checksum offload"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we ensure that the skb is freed on every error path in IPHC
decompression which makes it easy to introduce skb leaks. By centralising
the skb_free into the receive function it makes future decompression routines
easier to maintain. It does come at the expense of ensuring that the skb
passed into the decompression routine must not be copied.
Signed-off-by: Martin Townsend <mtownsend1973@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 64ce207306 ("[NET]: Make NETDEBUG pure printk wrappers")
originally had these NETDEBUG printks as always emitting.
Commit a2a316fd06 ("[NET]: Replace CONFIG_NET_DEBUG with sysctl")
added a net_msg_warn sysctl to these NETDEBUG uses.
Convert these NETDEBUG uses to normal pr_info calls.
This changes the output prefix from "ESP: " to include
"IPSec: " for the ipv4 case and "IPv6: " for the ipv6 case.
These output lines are now like the other messages in the files.
Other miscellanea:
Neaten the arithmetic spacing to be consistent with other
arithmetic spacing in the files.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These messages aren't useful as there's a generic dump_stack()
on OOM.
Neaten the comment and if test above the OOM by separating the
assign in if into an allocation then if test.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.
On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.
This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Opcodes in switch/case in hci_cmd_status_evt are not sorted
by value. This patch restores proper ordering.
Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If role switch was rejected by the controller and HCI Event: Command Status
returned with status "Command Disallowed" (0x0C) the flag
HCI_CONN_RSWITCH_PEND remains set. No further role switches are
possible as this flag prevents us from sending any new HCI Switch Role
requests and the only way to clear it is to receive a valid
HCI Event Switch Role.
This patch clears the flag if command was rejected.
2013-01-01 00:03:44.209913 < HCI Command: Switch Role (0x02|0x000b) plen 7
bdaddr BC:C6:DB:C4:6F:79 role 0x00
Role: Master
2013-01-01 00:03:44.210867 > HCI Event: Command Status (0x0f) plen 4
Switch Role (0x02|0x000b) status 0x0c ncmd 1
Error: Command Disallowed
Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Commit 7ec7c4a9a6 (mac80211: port CCMP to
cryptoapi's CCM driver) introduced a regression when decrypting empty
packets (data_len == 0). This will lead to backtraces like:
(scatterwalk_start) from [<c01312f4>] (scatterwalk_map_and_copy+0x2c/0xa8)
(scatterwalk_map_and_copy) from [<c013a5a0>] (crypto_ccm_decrypt+0x7c/0x25c)
(crypto_ccm_decrypt) from [<c032886c>] (ieee80211_aes_ccm_decrypt+0x160/0x170)
(ieee80211_aes_ccm_decrypt) from [<c031c628>] (ieee80211_crypto_ccmp_decrypt+0x1ac/0x238)
(ieee80211_crypto_ccmp_decrypt) from [<c032ef28>] (ieee80211_rx_handlers+0x870/0x1d24)
(ieee80211_rx_handlers) from [<c0330c7c>] (ieee80211_prepare_and_rx_handle+0x8a0/0x91c)
(ieee80211_prepare_and_rx_handle) from [<c0331260>] (ieee80211_rx+0x568/0x730)
(ieee80211_rx) from [<c01d3054>] (__carl9170_rx+0x94c/0xa20)
(__carl9170_rx) from [<c01d3324>] (carl9170_rx_stream+0x1fc/0x320)
(carl9170_rx_stream) from [<c01cbccc>] (carl9170_usb_tasklet+0x80/0xc8)
(carl9170_usb_tasklet) from [<c00199dc>] (tasklet_hi_action+0x88/0xcc)
(tasklet_hi_action) from [<c00193c8>] (__do_softirq+0xcc/0x200)
(__do_softirq) from [<c0019734>] (irq_exit+0x80/0xe0)
(irq_exit) from [<c0009c10>] (handle_IRQ+0x64/0x80)
(handle_IRQ) from [<c000c3a0>] (__irq_svc+0x40/0x4c)
(__irq_svc) from [<c0009d44>] (arch_cpu_idle+0x2c/0x34)
Such packets can appear for example when using the carl9170 wireless driver
because hardware sometimes generates garbage when the internal FIFO overruns.
This patch adds an additional length check.
Cc: stable@vger.kernel.org
Fixes: 7ec7c4a9a6 ("mac80211: port CCMP to cryptoapi's CCM driver")
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure. ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask. Therefore NULL check is required in SW_FLOW_KEY*
macros. Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Andy Zhou <azhou@nicira.com>
There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
The 'flow' memeber was chosen for removal because it's only used
in ovs_execute_actions() we can pass it as argument to this
function.
Signed-off-by: Lorand Jakab <lojakab@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.
Signed-off-by: Chunhe Li <lichunhe@huawei.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Avoid recursive read_rcu_lock() by using the lighter weight
get_dp_rcu() API. Add proper locking assertions to get_dp().
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.
Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.
CC: Federico Iezzi <fiezzi@enter.it>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
These two cases used to be treated differently for IPv4/IPv6,
but they are now identical.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Ths simplifies flow-table-destroy API. No need to pass explicit
parameter about context.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com>
Cc: Leo Alterman <lalterman@nicira.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
When filling netlink info, dport is being returned as flags. Fix
instances to return correct value.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a single fixed string is smaller code size than using
a format and many string arguments.
Reduces overall code size a little.
$ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o*
text data bss dec hex filename
34269 7012 14824 56105 db29 net/ipv4/igmp.o.new
34315 7012 14824 56151 db57 net/ipv4/igmp.o.old
30078 7869 13200 51147 c7cb net/ipv6/mcast.o.new
30105 7869 13200 51174 c7e6 net/ipv6/mcast.o.old
11434 3748 8580 23762 5cd2 net/ipv6/ip6_flowlabel.o.new
11491 3748 8580 23819 5d0b net/ipv6/ip6_flowlabel.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ueki Kohei reported that when we are using NewReno with connections that
have a very low traffic, we may timeout the connection too early if a
second loss occurs after the first one was successfully acked but no
data was transfered later. Below is his description of it:
When SACK is disabled, and a socket suffers multiple separate TCP
retransmissions, that socket's ETIMEDOUT value is calculated from the
time of the *first* retransmission instead of the *latest*
retransmission.
This happens because the tcp_sock's retrans_stamp is set once then never
cleared.
Take the following connection:
Linux remote-machine
| |
send#1---->(*1)|--------> data#1 --------->|
| | |
RTO : :
| | |
---(*2)|----> data#1(retrans) ---->|
| (*3)|<---------- ACK <----------|
| | |
| : :
| : :
| : :
16 minutes (or more) :
| : :
| : :
| : :
| | |
send#2---->(*4)|--------> data#2 --------->|
| | |
RTO : :
| | |
---(*5)|----> data#2(retrans) ---->|
| | |
| | |
RTO*2 : :
| | |
| | |
ETIMEDOUT<----(*6)| |
(*1) One data packet sent.
(*2) Because no ACK packet is received, the packet is retransmitted.
(*3) The ACK packet is received. The transmitted packet is acknowledged.
At this point the first "retransmission event" has passed and been
recovered from. Any future retransmission is a completely new "event".
(*4) After 16 minutes (to correspond with retries2=15), a new data
packet is sent. Note: No data is transmitted between (*3) and (*4).
The socket's timeout SHOULD be calculated from this point in time, but
instead it's calculated from the prior "event" 16 minutes ago.
(*5) Because no ACK packet is received, the packet is retransmitted.
(*6) At the time of the 2nd retransmission, the socket returns
ETIMEDOUT.
Therefore, now we clear retrans_stamp as soon as all data during the
loss window is fully acked.
Reported-by: Ueki Kohei
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".
When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.
Having a helper like this means there will be less places to touch
during that transformation.
Based upon descriptions and patch from Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).
Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.
Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the af_ieee802154 defines and use the
IEEE802154_EXTENDED_ADDR_LEN. We should do this everywhere in the
802.15.4 subsystem because af_ieee802154 should be normally an uapi
header.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adding support for a perm extended address. This is useful
when a device supports an eeprom with a programmed static extended address.
If a device doesn't support such eeprom or serial registers then the
driver should generate a random extended address.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch cleanups the ieee802154_be64_to_le64 to have a similar
function like ieee802154_le64_to_be64 only with switched source and
destionation types.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds an ieee802154_vif similar like the ieee80211_vif which
holds the interface type and maybe further more attributes like the
ieee80211_vif structure.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Varka Bhadram <varkabhadram@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a default interface registration for a wpan interface
type. Currently the 802.15.4 subsystem need to call userspace tools to
add an interface. This patch is like mac80211 handling for registration
a station interface type by default.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch removes the get_phy callback from mlme ops structure. Instead
we doing a dereference via ieee802154_ptr dev pointer. For backwards
compatibility we need to run get_device after dereference wpan_phy via
ieee802154_ptr.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch meld mac802154_netdev_register into ieee802154_if_add
function. Also we have now only one alloc_netdev call with one interface
setup routine "ieee802154_if_setup" instead two different one for each
interface type. This patch checks via runtime the interface type and do
different handling now. Additional we add the wpan_dev struct in
ieee802154_sub_if_data and set the new ieee802154_ptr while netdev
registration. This behaviour is very similar the mac80211 netdev
registration functionality.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the dev_hold call inside of nl-phy ieee802154_add_iface
function. The ieee802154_add_iface is the only one function which use the
ieee802154_if_add function and contains the corresponding dev_put call.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves and renames the mac802154_add_iface and
mac802154_netdev_register functions into iface.c. The function
mac802154_add_iface is renamed to ieee802154_if_add which is a similar naming
convention like mac80211.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves and rename the mac802154_del_iface function into
iface.c and rename the function to ieee802154_if_remove which is a similar
naming convention like mac80211.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The include/net/nl802154.h file contains a lot of prototypes which are
not used inside of ieee802154 subsystem. This patch removes this file
and make the only one used prototype "ieee802154_nl_start_confirm" as
static declaration in ieee802154/nl-mac.c
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch reworks the wpan_phy index incrementation. It's now similar
like wireless wiphy index incrementation. We move the wpan_phy index
attribute inside of cfg802154_registered_device and use atomic
operations instead locking mechanism via wpan_phy_mutex.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The pernet ops aren't ever unregistered, which causes a memory
leak and an OOPs if the module is ever reinserted.
Fixes: 0b5e8b8eea ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Geneve does not currently set the inner protocol type when
transmitting packets. This causes GSO segmentation to fail on NICs
that do not support Geneve offloading.
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value of seq_printf() is soon to be removed. Remove the
checks from seq_printf() in favor of seq_has_overflowed().
Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Since adding a new function to seq_file (seq_has_overflowed())
there isn't any value for functions called from seq_show to
return anything. Remove the int returns of the various
print_tuple/<foo>_print_tuple functions.
Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The seq_printf() and friends are having their return values removed.
The print_conntrack() returns the result of seq_printf(), which is
meaningless when seq_printf() returns void. Might as well remove the
return values of print_conntrack() as well.
Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The "else" block is on several lines and use bracket.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJUWMs/AAoJEDBSmw7B7bqrmBQQAIbfAe7wH1WifRtOnhw3zWQQ
K36+Edf3HlQ+EIkSs63QousRj2e7pGDOyhzMWLaqsmeTLteUtlGbr7qwiJO1QZdf
Ml2V5O2s+b8hUIClDBVQF2L6+GGUmRUdQqvDDhkN1guoxD/Nk8cNtsRkSdiXWJWy
R48NzvYDflBhc8uqPtR8jDb10eM3c00YP9HB+w9hYAfizD+FRue7UNp4MQIqwp9V
HdKRT6L2n/6QA+Mzse0rMDes5qI7nIUNgj+hjqgJSnhITPMgGR5j/pitnVHrr81M
ngOipBFG3svsQrwZh8nM4Llp0cM4Gs+GlgCieu9+TJpr2sY00Z3kYcp0pxtDoSxz
Wblqz9n/bnW9mrkEfl12XqwwT5vguchwHoZ9cXhejDxSawWXoTRx20uW4ahO8ArA
kWwwjTBVsQ5WMCtOBiqggzNKghwCc2ILmcZnjGdg9aNXcWsmQ4vyeCfG2QxBz/UB
Grv/f9NSy6mzKQ34yv+lyR7rFZ8XcT03EVAnZSYz8X0ZZGxwtFupRp1RrBh1KPtD
TJoe6Q71FfHKYRJ2xgygYkQFo+r9d0BKBeerq+Vu2hBeaqyi4aUwSj7d1sUaaq6N
tL8fmAUqFjVOOUFeH1g07Xke5QD+yrEC7sJKkeRMfcRGB+dEa+2m3I5p4WDz9bWM
AEvFSsYr/I9KI4d1huXD
=6GIj
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-john-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg <johannes@sipsolutions.net> says:
"This relatively large batch of changes is comprised of the
following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
Conflicts:
drivers/net/wireless/ath/wil6210/cfg80211.c
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch allows to set ECN on a per-route basis in case the sysctl
tcp_ecn is not set to 1. In other words, when ECN is set for specific
routes, it provides a tcp_ecn=1 behaviour for that route while the rest
of the stack acts according to the global settings.
One can use 'ip route change dev $dev $net features ecn' to toggle this.
Having a more fine-grained per-route setting can be beneficial for various
reasons, for example, 1) within data centers, or 2) local ISPs may deploy
ECN support for their own video/streaming services [1], etc.
There was a recent measurement study/paper [2] which scanned the Alexa's
publicly available top million websites list from a vantage point in US,
Europe and Asia:
Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely
blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side
only ECN") ;)); the break in connectivity on-path was found is about
1 in 10,000 cases. Timeouts rather than receiving back RSTs were much
more common in the negotiation phase (and mostly seen in the Alexa
middle band, ranks around 50k-150k): from 12-thousand hosts on which
there _may_ be ECN-linked connection failures, only 79 failed with RST
when _not_ failing with RST when ECN is not requested.
It's unclear though, how much equipment in the wild actually marks CE
when buffers start to fill up.
We thought about a fallback to non-ECN for retransmitted SYNs as another
global option (which could perhaps one day be made default), but as Eric
points out, there's much more work needed to detect broken middleboxes.
Two examples Eric mentioned are buggy firewalls that accept only a single
SYN per flow, and middleboxes that successfully let an ECN flow establish,
but later mark CE for all packets (so cwnd converges to 1).
[1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15
[2] http://ecn.ethz.ch/
Joint work with Daniel Borkmann.
Reference: http://thread.gmane.org/gmane.linux.network/335797
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.
We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.
While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.
Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>