Commit Graph

507960 Commits

Author SHA1 Message Date
Mitch Williams
63e18c2520 i40evf: resequence close operations
Call the netdev carrier off and TX disable functions first, before other
shutdown operations. This stops the stack from hitting us with
transmits while we're shutting down. Additionally, disable NAPI before
disabling interrupts, or the interrupt might get re-enabled
inappropriately. Finally, remove the call to netif_tx_stop_all_queues,
as it is redundant - the call to netif_tx_disable already did the same
thing.

Change-ID: I8b2dd25231b82817746cc256234a5eeeb4abaccc
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-03-27 02:57:43 -07:00
Mitch Williams
e284fc88df i40evf: delay releasing rings
When the VF interface is closed, we cannot immediately free our rings
and RX buffers, because the hardware hasn't yet stopped accessing this
memory. This shows up as a panic or memory corruption when the device is
brought down while under heavy stress.

To fix this, delay releasing resources until we receive acknowledgment
from the PF driver that the rings have indeed been stopped. Because of
this delay, we also need to check to make sure that all of our admin
queue requests have been handled before allowing the device to be
opened.

Change-ID: I44edd35529ce2fa2a9512437a3a8e6f14ed8ed63
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-03-27 02:57:03 -07:00
Jesse Brandeburg
ae24b4095c i40e/i40evf: implement KR2 support
The new devices need a new device ID some other defines to
handle the new 20G speed for KR2.

Change-ID: I03f717e364afe59657e8c9ce5ffaad856b4b21df
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-03-27 00:12:09 -07:00
Hannes Frederic Sowa
5a352dd0a3 ipv6: hash net ptr into fragmentation bucket selection
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:07:04 -04:00
Hannes Frederic Sowa
b6a7719aed ipv4: hash net ptr into fragmentation bucket selection
As namespaces are sometimes used with overlapping ip address ranges,
we should also use the namespace as input to the hash to select the ip
fragmentation counter bucket.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:07:04 -04:00
David S. Miller
8fa38a38ac Merge branch 'tipc-next'
Jon Maloy says:

====================
tipc: some improvements and fixes

We introduce a better algorithm for selecting when and which
users should be subject to link congestion control, plus clean
up some code for that mechanism.
Commit #3 fixes another rare race condition during packet reception.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Jon Paul Maloy
8b4ed8634f tipc: eliminate race condition at dual link establishment
Despite recent improvements, the establishment of dual parallel
links still has a small glitch where messages can bypass each
other. When the second link in a dual-link configuration is
established, part of the first link's traffic will be steered over
to the new link. Although we do have a mechanism to ensure that
packets sent before and after the establishment of the new link
arrive in sequence to the destination node, this is not enough.
The arriving messages will still be delivered upwards in different
threads, something entailing a risk of message disordering during
the transition phase.

To fix this, we introduce a synchronization mechanism between the
two parallel links, so that traffic arriving on the new link cannot
be added to its input queue until we are guaranteed that all
pre-establishment messages have been delivered on the old, parallel
link.

This problem seems to always have been around, but its occurrence is
so rare that it has not been noticed until recent intensive testing.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Jon Paul Maloy
3127a0200d tipc: clean up handling of link congestion
After the recent changes in message importance handling it becomes
possible to simplify handling of messages and sockets when we
encounter link congestion.

We merge the function tipc_link_cong() into link_schedule_user(),
and simplify the code of the latter. The code should now be
easier to follow, especially regarding return codes and handling
of the message that caused the situation.

In case the scheduling function is unable to pre-allocate a wakeup
message buffer, it now returns -ENOBUFS, which is a more correct
code than the previously used -EHOSTUNREACH.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Jon Paul Maloy
1f66d161ab tipc: introduce starvation free send algorithm
Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:05:56 -04:00
Guenter Roeck
b06b107a4c net: dsa: Handle non-bridge master change
Master change notifications may occur other than when joining or
leaving a bridge, for example when being added to or removed from
a bond or Open vSwitch. In that case, do nothing instead of asking
the switch driver to remove a port from a bridge that it didn't join.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 14:04:57 -04:00
tadeusz.struk@intel.com
ac110f4954 crypto: algif - fix warn: unsigned 'used' is never less than zero
Change type from unsigned long to int to fix an issue reported by kbuild robot:
crypto/algif_skcipher.c:596 skcipher_recvmsg_async() warn: unsigned 'used' is
never less than zero.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:44:18 -04:00
Ying Xue
bc14b8d6a9 tipc: fix a link reset issue due to retransmission failures
When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:43:32 -04:00
Sebastian Ott
358e048d67 s390: fix /proc/interrupts output
The irqclass_sub_desc array and enum interruption_class are out of sync
thus /proc/interrupts is broken. Remove IRQIO_CLW.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:41:41 -04:00
Ying Xue
7e3ea6d5c4 sctp: avoid to repeatedly declare external variables
Move the declaration for external variables to sctp.h file avoiding
to repeatedly declare them with extern keyword.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 11:40:16 -04:00
Eric Dumazet
0144a81ccc tcp: fix ipv4 mapped request socks
ss should display ipv4 mapped request sockets like this :

tcp    SYN-RECV   0      0  ::ffff:192.168.0.1:8080   ::ffff:192.0.2.1:35261

and not like this :

tcp    SYN-RECV   0      0  192.168.0.1:8080   192.0.2.1:35261

We should init ireq->ireq_family based on listener sk_family,
not the actual protocol carried by SYN packet.

This means we can set ireq_family in inet_reqsk_alloc()

Fixes: 3f66b083a5 ("inet: introduce ireq_family")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-25 00:57:48 -04:00
stephen hemminger
d631b94e7a virtio: change comment in transmit
The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:22:50 -04:00
Daniel Borkmann
835c3d9b7a tools: bpf_asm: cleanup vlan extension related token
We now have K_VLANT, K_VLANP and K_VLANTPID. Clean them up into more
descriptive token, namely K_VLAN_TCI, K_VLAN_AVAIL and K_VLAN_TPID.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:21:41 -04:00
David S. Miller
b1275eb32e Merge branch 'listener_refactor_16'
Eric Dumazet says:

====================
tcp: listener refactor part 16

A CONFIG_PROVE_RCU=y build revealed an RCU splat I had to fix.

I added const qualifiers to various md5 methods, as I expect
to call them on behalf of request sock traffic even if
the listener socket is not locked. This seems ok, but adding
const makes the contract clearer. Note a good reduction
of code size thanks to request/establish sockets convergence.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:30 -04:00
Eric Dumazet
fd3a154a00 tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()
With request socks convergence, we no longer need
different lookup methods. A request socket can
use generic lookup function.

Add const qualifier to 2nd tcp_v[46]_md5_lookup() parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:30 -04:00
Eric Dumazet
39f8e58e53 tcp: md5: remove request sock argument of calc_md5_hash()
Since request and established sockets now have same base,
there is no need to pass two pointers to tcp_v4_md5_hash_skb()
or tcp_v6_md5_hash_skb()

Also add a const qualifier to their struct tcp_md5sig_key argument.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:30 -04:00
Eric Dumazet
ff74e23f7e tcp: md5: input path is run under rcu protected sections
It is guaranteed that both tcp_v4_rcv() and tcp_v6_rcv()
run from rcu read locked sections :

ip_local_deliver_finish() and ip6_input_finish() both
use rcu_read_lock()

Also align tcp_v6_inbound_md5_hash() on tcp_v4_inbound_md5_hash()
by returning a boolean.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:29 -04:00
Eric Dumazet
0980c1e308 tcp: use C99 initializers in new_state[]
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:29 -04:00
Eric Dumazet
80f03e27a3 tcp: md5: fix rcu lockdep splat
While timer handler effectively runs a rcu read locked section,
there is no explicit rcu_read_lock()/rcu_read_unlock() annotations
and lockdep can be confused here :

net/ipv4/tcp_ipv4.c-906-        /* caller either holds rcu_read_lock() or socket lock */
net/ipv4/tcp_ipv4.c:907:        md5sig = rcu_dereference_check(tp->md5sig_info,
net/ipv4/tcp_ipv4.c-908-                                       sock_owned_by_user(sk) ||
net/ipv4/tcp_ipv4.c-909-                                       lockdep_is_held(&sk->sk_lock.slock));

Let's explicitely acquire rcu_read_lock() in tcp_make_synack()

Before commit fa76ce7328 ("inet: get rid of central tcp/dccp listener
timer"), we were holding listener lock so lockdep was happy.

Fixes: fa76ce7328 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric DUmazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 21:16:29 -04:00
David S. Miller
9ead3527f5 Merge branch 'rhashtable-next'
Thomas Graf says:

====================
rhashtable updates on top of Herbert's work

Patch 1 is a bugfix for an RCU splash I encountered while testing.
Patch 2 & 3 are pure cleanups. Patch 4 disables automatic shrinking
by default as discussed in previous thread. Patch 5 removes some
rhashtable internal knowledge from nft_hash and fixes another RCU
splash.

I've pushed various rhashtable tests (Netlink, nft) together with a
Makefile to a git tree [0] for easier stress testing.

[0] https://github.com/tgraf/rhashtable
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:53:06 -04:00
Thomas Graf
6b6f302ced rhashtable: Add rhashtable_free_and_destroy()
rhashtable_destroy() variant which stops rehashes, iterates over
the table and calls a callback to release resources.

Avoids need for nft_hash to embed rhashtable internals and allows to
get rid of the being_destroyed flag. It also saves a 2nd mutex
lock upon destruction.

Also fixes an RCU lockdep splash on nft set destruction due to
calling rht_for_each_entry_safe() without holding bucket locks.
Open code this loop as we need know that no mutations may occur in
parallel.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
Thomas Graf
b5e2c150ac rhashtable: Disable automatic shrinking by default
Introduce a new bool automatic_shrinking to require the
user to explicitly opt-in to automatic shrinking of tables.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:40 -04:00
Thomas Graf
ac833bddb5 rhashtable: Mark internal/private inline functions as such
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:39 -04:00
Thomas Graf
299e5c32a3 rhashtable: Use 'unsigned int' consistently
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:39 -04:00
Thomas Graf
58be8a583d rhashtable: Extend RCU read lock into rhashtable_insert_rehash()
rhashtable_insert_rehash() requires RCU locks to be held in order
to access ht->tbl and traverse to the last table.

Fixes: ccd57b1bd3 ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 17:48:39 -04:00
Michal Sekletar
27cd545247 filter: introduce SKF_AD_VLAN_TPID BPF extension
If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:25:15 -04:00
David S. Miller
f6bb76cd4d Merge branch 'cxgb4-fcoe'
Varun Prakash says:

====================
FCoE support in cxgb4 driver

This patch series enables FCoE support in cxgb4 driver, it enables
FCOE_CRC and FCOE_MTU net device features.

This series is created against net-next tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:24:39 -04:00
Varun Prakash
241e924731 cxgb4: update Kconfig and Makefile for FCoE support
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:24:38 -04:00
Varun Prakash
84a200b390 cxgb4: add cxgb4_fcoe.c for FCoE
This patch adds cxgb4_fcoe.c and enables FCOE_CRC, FCOE_MTU
net device features.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:24:38 -04:00
Varun Prakash
76fed8a989 cxgb4: add cxgb4_fcoe.h and macro definitions for FCoE
This patch adds new header file cxgb4_fcoe.h and defines new
macros for FCoE support in cxgb4 driver.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:24:38 -04:00
Hannes Frederic Sowa
ff40217e73 ipv6: fix sparse warnings in privacy stable addresses generation
Those warnings reported by sparse endianness check (via kbuild test robot)
are harmless, nevertheless fix them up and make the code a little bit
easier to read.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 622c81d57b ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:21:35 -04:00
Ying Xue
ed3e852aa5 tipc: fix compile error when IPV6=m and TIPC=y
When IPV6=m and TIPC=y, below error will appear during building kernel
image:

net/tipc/udp_media.c:196:
undefined reference to `ip6_dst_lookup'
make: *** [vmlinux] Error 1

As ip6_dst_lookup() is implemented in IPV6 and IPV6 is compiled as
module, ip6_dst_lookup() is not built-in core kernel image. As a
result, compiler cannot find 'ip6_dst_lookup' reference while
compiling TIPC code into core kernel image.

But with the method introduced by commit 5f81bd2e5d ("ipv6: export a
stub for IPv6 symbols used by vxlan"), we can avoid the compile error
through "ipv6_stub" pointer to access ip6_dst_lookup().

Fixes: d0f91938be ("tipc: add ip/udp media type")
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:20:29 -04:00
WANG Cong
66400d5430 net: allow to delete a whole device group
With dev group, we can change a batch of net devices,
so we should allow to delete them together too.

Group 0 is not allowed to be deleted since it is
the default group.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:00:01 -04:00
Herbert Xu
27ed44a5d6 rhashtable: Add comment on choice of elasticity value
This patch adds a comment on the choice of the value 16 as the
maximum chain length before we force a rehash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 14:57:04 -04:00
Wu Fengguang
8263d57e37 cx82310_eth: fix semicolon.cocci warnings
drivers/net/usb/cx82310_eth.c:175:2-3: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 14:56:02 -04:00
Sasha Levin
610600c8c5 tipc: validate length of sockaddr in connect() for dgram/rdm
Commit f2f8036 ("tipc: add support for connect() on dgram/rdm sockets")
hasn't validated user input length for the sockaddr structure which allows
a user to overwrite kernel memory with arbitrary input.

Fixes: f2f8036 ("tipc: add support for connect() on dgram/rdm sockets")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 12:50:39 -04:00
Hannes Frederic Sowa
0117ec1970 net: remove never used forwarding_accel_ops pointer from net_device
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 12:43:45 -04:00
David S. Miller
d5c1d8c567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nf_tables_core.c

The nf_tables_core.c conflict was resolved using a conflict resolution
from Stephen Rothwell as a guide.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:22:43 -04:00
Herbert Xu
ba7c95ea38 rhashtable: Fix sleeping inside RCU critical section in walk_stop
The commit 963ecbd41a ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:16:07 -04:00
David S. Miller
ce046c568c Merge branch 'ipv6_stable_privacy_address'
Hannes Frederic Sowa says:

====================
ipv6: RFC7217 stable privacy addresses implementation

this is an implementation of basic support for RFC7217 stable privacy
addresses. Please review and consider for net-next.

v2:
* Correct references to RFC 7212 -> RFC 7217 in documentation patch (thanks, Eric!)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:15 -04:00
Hannes Frederic Sowa
9f0761c154 ipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
1855b7c3e8 ipv6: introduce idgen_delay and idgen_retries knobs
This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
5f40ef77ad ipv6: do retries on stable privacy addresses
If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
8e8e676d0b ipv6: collapse state_lock and lock
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
64236f3f3d ipv6: introduce IFA_F_STABLE_PRIVACY flag
We need to mark appropriate addresses so we can do retries in case their
DAD failed.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Hannes Frederic Sowa
622c81d57b ipv6: generation of stable privacy addresses for link-local and autoconf
This patch implements the stable privacy address generation for
link-local and autoconf addresses as specified in RFC7217.

  RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)

is the RID (random identifier). As the hash function F we chose one
round of sha1. Prefix will be either the link-local prefix or the
router advertised one. As Net_Iface we use the MAC address of the
device. DAD_Counter and secret_key are implemented as specified.

We don't use Network_ID, as it couples the code too closely to other
subsystems. It is specified as optional in the RFC.

As Net_Iface we only use the MAC address: we simply have no stable
identifier in the kernel we could possibly use: because this code might
run very early, we cannot depend on names, as they might be changed by
user space early on during the boot process.

A new address generation mode is introduced,
IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
none or eui64 address configuration mode although the stable_secret is
already set.

We refuse writes to ipv6/conf/all/stable_secret but only allow
ipv6/conf/default/stable_secret and the interface specific file to be
written to. The default stable_secret is used as the parameter for the
namespace, the interface specific can overwrite the secret, e.g. when
switching a network configuration from one system to another while
inheriting the secret.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:08 -04:00