Commit Graph

871323 Commits

Author SHA1 Message Date
Shannon Nelson
e982ae6aa4 ionic: add lif_quiesce to wait for queue activity to stop
Even though we've already turned off the queue activity with
the ionic_qcq_disable(), we need to wait for any device queues
that are processing packets to drain down before we try to
flush our packets and tear down the queues.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:55:12 -04:00
Shannon Nelson
e95f922f4c ionic: implement ethtool set-fec
Wire up the --set-fec and --show-fec features in the ethtool
callbacks and pull the related code out of set_link_ksettings.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:55:12 -04:00
Shannon Nelson
780eded34c ionic: report users coalesce request
The user's request for an interrupt coalescing value gets
translated into a hardware value to be used with the NIC,
and was getting reported back based on the hw value, which,
due to hw tic resolution, could be reported as a different
number than what the user originally asked for.  This code
now tracks both the user request and what was put into the
hardware so we can report back to the user what they
requested.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:55:12 -04:00
Shannon Nelson
d229be4b27 ionic: use wait_on_bit_lock() rather than open code
Replace the open-coded ionic_wait_for_bit() with the
kernel's wait_on_bit_lock().

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:55:12 -04:00
Shannon Nelson
da0729e8d2 ionic: simplify returns in devlink info
There is no need for a goto in this bit of code.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:55:12 -04:00
David S. Miller
df428e4049 Merge branch 'per-netns-notifier'
Jiri Pirko says:

====================
net: introduce per-netns netdevice notifiers and use them in mlxsw

Some drivers, like mlxsw, are not interested in notifications coming in
for netdevices from other network namespaces. So introduce per-netns
notifiers and allow to reduce overhead by listening only for
notifications from the same netns.

This is also a preparation for upcoming patchset "devlink: allow devlink
instances to change network namespace". This resolves deadlock during
reload mlxsw into initial netns made possible by
328fbe747a ("net: Close race between {un, }register_netdevice_notifier() and setup_net()/cleanup_net()").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:48:44 -04:00
Jiri Pirko
f1cdaa077c mlxsw: spectrum: Use per-netns netdevice notifier registration
The mlxsw_sp instance is not interested in events happening in other
network namespaces. So use "_net" variants for netdevice notifier
registration/unregistration and get only events which are happening in
the net the instance is in.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:48:44 -04:00
Jiri Pirko
a30c7b429f net: introduce per-netns netdevice notifiers
Often the code for example in drivers is interested in getting notifier
call only from certain network namespace. In addition to the existing
global netdevice notifier chain introduce per-netns chains and allow
users to register to that. Eventually this would eliminate unnecessary
overhead in case there are many netdevices in many network namespaces.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:48:44 -04:00
Jiri Pirko
afa0df5998 net: push loops and nb calls into helper functions
Push iterations over net namespaces and netdevices from
register_netdevice_notifier() and unregister_netdevice_notifier()
into helper functions. Along with that introduce continue_reverse macros
to make the code a bit nicer allowing to get rid of "last" marks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:48:44 -04:00
Michal Kubecek
8b6b82ad16 mlx5: avoid 64-bit division in dr_icm_pool_mr_create()
Recently added code introduces 64-bit division in dr_icm_pool_mr_create()
so that build on 32-bit architectures fails with

  ERROR: "__umoddi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] undefined!

As the divisor is always a power of 2, we can use bitwise operation
instead.

Fixes: 29cf8febd1 ("net/mlx5: DR, ICM pool memory allocator")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:08:20 -04:00
Tuong Lien
e95584a889 tipc: fix unlimited bundling of small messages
We have identified a problem with the "oversubscription" policy in the
link transmission code.

When small messages are transmitted, and the sending link has reached
the transmit window limit, those messages will be bundled and put into
the link backlog queue. However, bundles of data messages are counted
at the 'CRITICAL' level, so that the counter for that level, instead of
the counter for the real, bundled message's level is the one being
increased.
Subsequent, to-be-bundled data messages at non-CRITICAL levels continue
to be tested against the unchanged counter for their own level, while
contributing to an unrestrained increase at the CRITICAL backlog level.

This leaves a gap in congestion control algorithm for small messages
that can result in starvation for other users or a "real" CRITICAL
user. Even that eventually can lead to buffer exhaustion & link reset.

We fix this by keeping a 'target_bskb' buffer pointer at each levels,
then when bundling, we only bundle messages at the same importance
level only. This way, we know exactly how many slots a certain level
have occupied in the queue, so can manage level congestion accurately.

By bundling messages at the same level, we even have more benefits. Let
consider this:
- One socket sends 64-byte messages at the 'CRITICAL' level;
- Another sends 4096-byte messages at the 'LOW' level;

When a 64-byte message comes and is bundled the first time, we put the
overhead of message bundle to it (+ 40-byte header, data copy, etc.)
for later use, but the next message can be a 4096-byte one that cannot
be bundled to the previous one. This means the last bundle carries only
one payload message which is totally inefficient, as for the receiver
also! Later on, another 64-byte message comes, now we make a new bundle
and the same story repeats...

With the new bundling algorithm, this will not happen, the 64-byte
messages will be bundled together even when the 4096-byte message(s)
comes in between. However, if the 4096-byte messages are sent at the
same level i.e. 'CRITICAL', the bundling algorithm will again cause the
same overhead.

Also, the same will happen even with only one socket sending small
messages at a rate close to the link transmit's one, so that, when one
message is bundled, it's transmitted shortly. Then, another message
comes, a new bundle is created and so on...

We will solve this issue radically by another patch.

Fixes: 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02 11:02:05 -04:00
Juergen Gross
0951570685 xen/efi: have a common runtime setup function
Today the EFI runtime functions are setup in architecture specific
code (x86 and arm), with the functions themselves living in drivers/xen
as they are not architecture dependent.

As the setup is exactly the same for arm and x86 move the setup to
drivers/xen, too. This at once removes the need to make the single
functions global visible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[boris: "Dropped EXPORT_SYMBOL_GPL(xen_efi_runtime_setup)"]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-02 10:31:07 -04:00
Peng Fan
ec066de1a5 arm: xen: mm: use __GPF_DMA32 for arm64
arm64 shares some code under arch/arm/xen, including mm.c.
However ZONE_DMA is removed by commit
ad67f5a6545("arm64: replace ZONE_DMA with ZONE_DMA32").

So add a check if CONFIG_ZONE_DMA32 is enabled use __GFP_DMA32.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2019-10-01 18:50:07 -07:00
Dongli Zhang
a761129e36 xen-netfront: do not use ~0U as error return value for xennet_fill_frags()
xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
to cache extra fragments. This is incorrect because the return type of
xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
ring buffer index.

In the situation when the rsp_cons is approaching 0xffffffff, the return
value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
caller) would regard as error. As a result, queue->rx.rsp_cons is set
incorrectly because it is updated only when there is error. If there is no
error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
This leads to NULL pointer access in the next iteration to process rx ring
buffer entries.

The symptom is similar to the one fixed in
commit 00b368502d ("xen-netfront: do not assume sk_buff_head list is
empty in error handling").

This patch changes the return type of xennet_fill_frags() to indicate
whether it is successful or failed. The queue->rx.rsp_cons will be
always updated inside this function.

Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:49:51 -04:00
David Ahern
a3ce2a21bb ipv6: Handle race in addrconf_dad_work
Rajendra reported a kernel panic when a link was taken down:

[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290

<snip>

[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.

This scenario does not occur when the ipv6 address is not kept
(net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
state of the ifp to DEAD. Handle when the addresses are kept by checking
IF_READY which is reset by addrconf_ifdown.

The 'dead' flag for an inet6_addr is set only under rtnl, in
addrconf_ifdown and it means the device is getting removed (or IPv6 is
disabled). The interesting cases for changing the idev flag are
addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
(reset the flag). The former does not have the idev lock - only rtnl;
the latter has both. Based on that the existing dead + IF_READY check
can be moved to right after the rtnl_lock in addrconf_dad_work.

Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:43:41 -04:00
Eric Dumazet
3256a2d6ab tcp: adjust rto_base in retransmits_timed_out()
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.

But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)

This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.

Fixes: a41e8a88b0 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:40:49 -04:00
Prashant Malani
c01ebd6c46 r8152: Use guard clause and fix comment typos
Use a guard clause in tx_bottom() to reduce the indentation of the
do-while loop.

Also, fix a couple of spelling and grammatical mistakes in the
r8152_csum_workaround() function comment.

Change-Id: I460befde150ad92248fd85b0f189ec2df2ab8431
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:28:47 -04:00
Matias Ezequiel Vara Larsen
a786ab36ae vsock/virtio: add support for MSG_PEEK
This patch adds support for MSG_PEEK. In such a case, packets are not
removed from the rx_queue and credit updates are not sent.

Signed-off-by: Matias Ezequiel Vara Larsen <matiasevara@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:26:51 -04:00
Dexuan Cui
0d9138ffac vsock: Fix a lockdep warning in __vsock_release()
Lockdep is unhappy if two locks from the same class are held.

Fix the below warning for hyperv and virtio sockets (vmci socket code
doesn't have the issue) by using lock_sock_nested() when __vsock_release()
is called recursively:

============================================
WARNING: possible recursive locking detected
5.3.0+ #1 Not tainted
--------------------------------------------
server/1795 is trying to acquire lock:
ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]

but task is already holding lock:
ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(sk_lock-AF_VSOCK);
  lock(sk_lock-AF_VSOCK);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by server/1795:
 #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
 #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

stack backtrace:
CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
Call Trace:
 dump_stack+0x67/0x90
 __lock_acquire.cold.67+0xd2/0x20b
 lock_acquire+0xb5/0x1c0
 lock_sock_nested+0x6d/0x90
 hvs_release+0x10/0x120 [hv_sock]
 __vsock_release+0x24/0xf0 [vsock]
 __vsock_release+0xa0/0xf0 [vsock]
 vsock_release+0x12/0x30 [vsock]
 __sock_release+0x37/0xa0
 sock_close+0x14/0x20
 __fput+0xc1/0x250
 task_work_run+0x98/0xc0
 do_exit+0x344/0xc60
 do_group_exit+0x47/0xb0
 get_signal+0x15c/0xc50
 do_signal+0x30/0x720
 exit_to_usermode_loop+0x50/0xa0
 do_syscall_64+0x24e/0x270
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4184e85f31

Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:23:35 -04:00
Johan Hovold
8353da9fa6 hso: fix NULL-deref on tty open
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:

	BUG: kernel NULL pointer dereference, address: 0000000000000006
	...
	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
	...
	Call Trace:
	hso_start_serial_device+0xdc/0x140 [hso]
	hso_serial_open+0x118/0x1b0 [hso]
	tty_open+0xf1/0x490

Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:21:10 -04:00
Johan Hovold
678799194a hso: fix NULL-deref on tty open
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:

	BUG: kernel NULL pointer dereference, address: 0000000000000006
	...
	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
	...
	Call Trace:
	hso_start_serial_device+0xdc/0x140 [hso]
	hso_serial_open+0x118/0x1b0 [hso]
	tty_open+0xf1/0x490

Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:06:49 -04:00
Simon Horman
37a2fce090 dt-bindings: sh_eth convert bindings to json-schema
Convert Renesas Electronics SH EtherMAC bindings documentation to
json-schema.  Also name bindings documentation file according to the compat
string being documented.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 15:31:33 -07:00
Peter Fink
9fb137aef3 net: usb: ax88179_178a: allow optionally getting mac address from device tree
Adopt and integrate the feature to pass the MAC address via device tree
from asix_device.c (03fc5d4) also to other ax88179 based asix chips.
E.g. the bootloader fills in local-mac-address and the driver will then
pick up and use this MAC address.

Signed-off-by: Peter Fink <pfink@christ-es.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 15:19:28 -07:00
Nicolas Dichtel
0d7982ce6e ipv6: minor code reorg in inet6_fill_ifla6_attrs()
Just put related code together to ease code reading: the memcpy() is
related to the nla_reserve().

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:59:03 -07:00
David S. Miller
7a56493f06 Merge branch 'netdev-altnames'
Jiri Pirko says:

====================
net: introduce alternative names for network interfaces

In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for
netdevice name length. Now when we have PF and VF representors
with port names like "pfXvfY", it became quite common to hit this limit:
0123456789012345
enp131s0f1npf0vf6
enp131s0f1npf0vf22

Udev cannot rename these interfaces out-of-the-box and user needs to
create custom rules to handle them.

Also, udev has multiple schemes of netdev names. From udev code:
 * Type of names:
 *   b<number>                             - BCMA bus core number
 *   c<bus_id>                             - bus id of a grouped CCW or CCW device,
 *                                           with all leading zeros stripped [s390]
 *   o<index>[n<phys_port_name>|d<dev_port>]
 *                                         - on-board device index number
 *   s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - hotplug slot index number
 *   x<MAC>                                - MAC address
 *   [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - PCI geographical location
 *   [P<domain>]p<bus>s<slot>[f<function>][u<port>][..][c<config>][i<interface>]
 *                                         - USB port number chain
 *   v<slot>                               - VIO slot number (IBM PowerVM)
 *   a<vendor><model>i<instance>           - Platform bus ACPI instance id
 *   i<addr>n<phys_port_name>              - Netdevsim bus address and port name

One device can be often renamed by multiple patterns at the
same time (e.g. pci address/mac).

This patchset introduces alternative names for network interfaces.
Main goal is to:
1) Overcome the IFNAMSIZ limitation (altname limitation is 128 bytes)
2) Allow to have multiple names at the same time (multiple udev patterns)
3) Allow to use alternative names as handle for commands

The patchset introduces two new commands to add/delete list of properties.
Currently only alternative names are implemented but the ifrastructure
could be easily extended later on. This is very similar to the list of vlan
and tunnels being added/deleted to/from bridge ports.

See following examples.

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff

-> Add alternative names for dummy0:

$ ip link prop add dummy0 altname someothername
$ ip link prop add dummy0 altname someotherveryveryveryverylongname
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
$ ip link show someotherveryveryveryverylongname
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname

-> Add bridge brx, add it's alternative name and use alternative names to
   do enslavement.

$ ip link add name brx type bridge
$ ip link prop add brx altname mypersonalsuperspecialbridge
$ ip link set someotherveryveryveryverylongname master mypersonalsuperspecialbridge
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

-> Add ipv4 address to the bridge using alternative name:

$ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge
$ ip addr show mypersonalsuperspecialbridge
3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge
    inet 192.168.0.1/24 scope global brx
       valid_lft forever preferred_lft forever

-> Delete one of dummy0 alternative names:

$ ip link prop del dummy0 altname someotherveryveryveryverylongname
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname someothername
3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

-> Add multiple alternative names at once

$ ip link prop add dummy0 altname a altname b altname c altname d
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname a
    altname b
    altname c
    altname d
3: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether ae:67:a9:67:46:86 brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
76c9ac0ee8 net: rtnetlink: add possibility to use alternative names as message handle
Extend the basic rtnetlink commands to use alternative interface names
as a handle instead of ifindex and ifname.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
cc6090e985 net: rtnetlink: introduce helper to get net_device instance by ifname
Introduce helper function rtnl_get_dev() that gets net_device structure
instance pointer according to passed ifname or ifname attribute.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
7af12cba4e net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest
__rtnl_newlink() code flow is a bit different around tb[IFLA_IFNAME]
processing comparing to the other places. Change that to be unified with
the rest.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
88f4fb0c74 net: rtnetlink: put alternative names to getlink message
Extend exiting getlink info message with list of properties. Now the
only ones are alternative names.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
36fbf1e52b net: rtnetlink: add linkprop commands to add and delete alternative ifnames
Add two commands to add and delete list of link properties. Implement
the first property type along - alternative ifnames.
Each net device can have multiple alternative names.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
ff92741270 net: introduce name_node struct to be used in hashlist
Introduce name_node structure to hold name of device and put it into
hashlist instead of putting there struct net_device directly. Add a
necessary infrastructure to manipulate the hashlist. This prepares
the code to use the same hashlist for alternative names introduced
later in this set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Jiri Pirko
6958c97a48 net: procfs: use index hashlist instead of name hashlist
Name hashlist is going to be used for more than just dev->name, so use
rather index hashlist for iteration over net_device instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 14:47:19 -07:00
Albert Ou
922b0375fc riscv: Fix memblock reservation for device tree blob
This fixes an error with how the FDT blob is reserved in memblock.
An incorrect physical address calculation exposed the FDT header to
unintended corruption, which typically manifested with of_fdt_raw_init()
faulting during late boot after fdt_totalsize() returned a wrong value.
Systems with smaller physical memory sizes more frequently trigger this
issue, as the kernel is more likely to allocate from the DMA32 zone
where bbl places the DTB after the kernel image.

Commit 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
changed the mapping of the DTB to reside in the fixmap area.
Consequently, early_init_fdt_reserve_self() cannot be used anymore in
setup_bootmem() since it relies on __pa() to derive a physical address,
which does not work with dtb_early_va that is no longer a valid kernel
logical address.

The reserved[0x1] region shows the effect of the pointer underflow
resulting from the __pa(initial_boot_params) offset subtraction:

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x2
[    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xfffffff080100000-0xfffffff080100527], 0x0000000000000528 bytes flags: 0x0

With the fix applied:

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x2
[    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x0000000080e00000-0x0000000080e00527], 0x0000000000000528 bytes flags: 0x0

Fixes: 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
Signed-off-by: Albert Ou <aou@eecs.berkeley.edu>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-01 13:22:39 -07:00
Palmer Dabbelt
18856604b3 RISC-V: Clear load reservations while restoring hart contexts
This is almost entirely a comment.  The bug is unlikely to manifest on
existing hardware because there is a timeout on load reservations, but
manifests on QEMU because there is no timeout.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-01 13:16:40 -07:00
Eric Dumazet
be2644aac3 tcp: add ipv6_addr_v4mapped_loopback() helper
tcp_twsk_unique() has a hard coded assumption about ipv4 loopback
being 127/8

Lets instead use the standard ipv4_is_loopback() method,
in a new ipv6_addr_v4mapped_loopback() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 13:07:53 -07:00
Jiaxun Yang
d345d9cad2
MIPS: cpu-bugs64: Mark inline functions as __always_inline
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
Leading to section mismatch in this case.

Since we're using const variables to pass assembly flags, 'inline's
can't be dropped. So we simply mark them as __always_inline.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
[paul.burton@mips.com:
  - Annotate these functions with __init, even if it only serves to
    inform human readers when the code can be used.
  - Drop the __always_inline from check_daddi() & check_daddiu() which
    don't use arguments as immediates in inline asm.
  - Rewrap the commit message.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-10-01 12:59:59 -07:00
David Hildenbrand
04ccbdc390 xen/balloon: Clear PG_offline in balloon_retrieve()
Let's move the clearing to balloon_retrieve(). In
bp_state increase_reservation(), we now clear the flag a little earlier
than before, however, this should not matter for XEN.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand
59b52f105f xen/balloon: Mark pages PG_offline in balloon_append()
Let's move the __SetPageOffline() call which all callers perform into
balloon_append().

In bp_state decrease_reservation(), pages are now marked PG_offline a
little later than before, however, this should not matter for XEN.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand
dde3285ffa xen/balloon: Drop __balloon_append()
Let's simply use balloon_append() directly.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand
c5ad81eb02 xen/balloon: Set pages PageOffline() in balloon_add_region()
We are missing a __SetPageOffline(), which is why we can get
!PageOffline() pages onto the balloon list, where
alloc_xenballooned_pages() will complain:

page:ffffea0003e7ffc0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0xffffe00001000(reserved)
raw: 000ffffe00001000 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageOffline(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:744!
invalid opcode: 0000 [#1] SMP NOPTI

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: 77c4adf6a6 ("xen/balloon: mark inflated pages PG_offline")
Cc: stable@vger.kernel.org # v5.1+
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
Stefano Stabellini
ade77a2da8 ARM: xen: unexport HYPERVISOR_platform_op function
HYPERVISOR_platform_op() is an inline function and should not
be exported. Since commit 15bfc2348d ("modpost: check for
static EXPORT_SYMBOL* functions"), this causes a warning:

WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL

Instead, export the underlying function called by the static inline:
HYPERVISOR_platform_op_raw.

Fixes: 15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2019-10-01 12:43:12 -07:00
Peter Zijlstra
73956fc07d membarrier: Fix RCU locking bug caused by faulty merge
The following commit:

  227a4aadc7 ("sched/membarrier: Fix p->mm->membarrier_state racy load")

got fat fingered by me when merging it with other patches. It meant to move
the RCU section out of the for loop but ended up doing it partially, leaving
a superfluous rcu_read_lock() inside, causing havok.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Fixes: 227a4aadc7 ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Link: https://lkml.kernel.org/r/20191001085033.GP4519@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-01 21:27:50 +02:00
Oleksij Rempel
569aad4fcd net: ag71xx: fix mdio subnode support
This patch is syncing driver with actual devicetree documentation:
Documentation/devicetree/bindings/net/qca,ar71xx.txt
|Optional subnodes:
|- mdio : specifies the mdio bus, used as a container for phy nodes
|  according to phy.txt in the same directory

The driver was working with fixed phy without any noticeable issues. This bug
was uncovered by introducing dsa ar9331-switch driver.
Since no one reported this bug until now, I assume no body is using it
and this patch should not brake existing system.

Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:19:19 -07:00
David S. Miller
b33210e379 Merge branch 'stmmac-fixes'
Jose Abreu says:

====================
net: stmmac: Fixes for -net

Misc fixes for -net tree. More info in commit logs.

v2 is just a rebase of v1 against -net and we added a new patch (09/09) to
fix RSS feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu
56627336b6 net: stmmac: xgmac: Fix RSS writing wrong keys
Commit b6b6cc9acd, changed the call to dwxgmac2_rss_write_reg()
passing it the variable cfg->key[i].

As key is an u8 but we write 32 bits at a time we need to cast it into
an u32 so that the correct key values are written. Notice that the for
loop already takes this into account so we don't try to write past the
keys size.

Fixes: b6b6cc9acd ("net: stmmac: selftest: avoid large stack usage")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu
3c72d4d330 net: stmmac: xgmac: Fix RSS not writing all Keys to HW
The sizeof(cfg->key) is != ARRAY_SIZE(cfg->key). Fix it. This warning is
triggered when running with cc flag -Wsizeof-array-div.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Fixes: 76067459c6 ("net: stmmac: Implement RSS and enable it in XGMAC core")
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu
30300d9f91 net: stmmac: xgmac: Disable the Timestamp interrupt by default
We don't use it anyway as XGMAC only supports polling for timestamp (in
current SW implementation). This greatly reduces the system load by
reducing the number of interrupts.

Fixes: 2142754f8b ("net: stmmac: Add MAC related callbacks for XGMAC2")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu
3e2bf04fb0 net: stmmac: Do not stop PHY if WoL is enabled
If WoL is enabled we can't really stop the PHY, otherwise we will not
receive the WoL packet. Fix this by telling phylink that only the MAC is
down and only stop the PHY if WoL is not enabled.

Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu
14f347334b net: stmmac: Correctly take timestamp for PTPv2
The case for PTPV2_EVENT requires event packets to be captured so add
this setting to the list of enabled captures.

Fixes: 891434b18e ("stmmac: add IEEE PTPv1 and PTPv2 support.")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu
f79bfda375 net: stmmac: dwmac4: Always update the MAC Hash Filter
We need to always update the MAC Hash Filter so that previous entries
are invalidated.

Found out while running stmmac selftests.

Fixes: b8ef7020d6 ("net: stmmac: add support for hash table size 128/256 in dwmac4")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00