Commit Graph

575800 Commits

Author SHA1 Message Date
Yuval Mintz
ff38577aa9 qed: Print HW attention reasons
Each HW block contains common information about attention reasons,
raising a bit for each one of the different sub-reasons that caused it
to raise an attention.

This patch extends the infrastructure by allowing logging of the various
reasons causing the HW blocks to generate an attention.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:39:50 -05:00
Yuval Mintz
0d956e8a65 qed: Add support for HW attentions
HW is capable of generating attentnions for a multitude of reasons,
but current driver is enabling attention generation only for management
firmware [required for link notifications].

This patch enables almost all of the possible reasons for HW attentions,
logging the HW block generating the attention and preventing further
attentions from that source [to prevent possible attention flood].
It also lays the infrastructure for additional exploration of the various
attentions.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:39:50 -05:00
Yuval Mintz
4ac801b77e qed: Semantic refactoring of interrupt code
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:39:49 -05:00
WANG Cong
64d4e3431e net: remove skb_sender_cpu_clear()
After commit 52bd2d62ce ("net: better skb->sender_cpu and skb->napi_id cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:36:47 -05:00
David S. Miller
8b4837c879 Merge branch 'mlx5-next'
Saeed Mahameed says:

====================
mlx5 driver updates

This series includes some bug fixes and updates for the mlx5 core
and ethernet driver.

From Gal, two fixes that protects the update CQ moderation flows
when it is not allowed.

From Moshe, two fixes for the core and ethernet driver in
non-cached(NC) and write combining(WC) buffers mappings,
which prevents the driver from double memory mappings.

From Or, reduce the firmware command completion timeout.

From Tariq, several small trivial fixes.

Changes from v0:
	- "Fix global UAR mapping" commit messages updated to explain ARCH_HAS_IOREMAP_WC usage.
	- rebased to commit 8d3f2806f8 'Merge branch ethtool-ksettings'

Changes from v1:
	- Removed ARCH_HAS_IOREMAP_WC config flag from "Fix global UAR mapping" commit,	as it was not accurate to use it.
	- Squashed "Fix global UAR mapping" and "net/mlx5: Avoid double mapping of io mapped memory"
	- Added more info for "Fix global UAR mapping" in commit message

Changes from v2:
	- None. resubmission per Dave's request due to two parallel submissions to mlx5 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:28:01 -05:00
Moshe Lazer
0ba422410b net/mlx5: Fix global UAR mapping
Avoid double mapping of io mapped memory, Device page may be
mapped to non-cached(NC) or to write-combining(WC).
The code before this fix tries to map it both to WC and NC
contrary to what stated in Intel's software developer manual.

Here we remove the global WC mapping of all UARS
"dev->priv.bf_mapping", since UAR mapping should be decided
per UAR (e.g we want different mappings for EQs, CQs vs QPs).

Caller will now have to choose whether to map via
write-combining API or not.

mlx5e SQs will choose write-combining in order to perform
BlueFlame writes.

Fixes: 88a85f99e5 ('TX latency optimization to save DMA reads')
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Reviewed-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:28:00 -05:00
Or Gerlitz
6b6c07bdcd net/mlx5: Make command timeout way shorter
The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.

Fixes: e126ba97db ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:28:00 -05:00
Gal Pressman
2fcb92fbd0 net/mlx5e: Don't modify CQ before it was created
Calling mlx5e_set_coalesce while the interface is down will result in
modifying CQs that don't exist.

Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Gal Pressman <galp@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:28:00 -05:00
Gal Pressman
7524a5d88b net/mlx5e: Don't try to modify CQ moderation if it is not supported
If CQ moderation is not supported by the device, print a warning on
netdevice load, and return error when trying to modify/query cq
moderation via ethtool.

Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4
Ethernet functionality')
Signed-off-by: Gal Pressman <galp@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:28:00 -05:00
Tariq Toukan
556dd1b9c3 net/mlx5e: Set drop RQ's necessary parameters only
By its role, there is no need to set all the other parameters
for the drop RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:27:59 -05:00
Tariq Toukan
c89fb18b65 net/mlx5e: Move common case counters within sq_stats struct
For data cache locality considerations, we moved the nop and
csum_offload_inner within sq_stats struct as they are more
commonly accessed in xmit path.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:27:59 -05:00
Tariq Toukan
3b6195240c net/mlx5e: Changed naming convention of tx queues in ethtool stats
Instead of the pair (channel, tc), we now use a single number that
goes over all tx queues of a TC, for all TCs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:27:59 -05:00
Tariq Toukan
ce89ef36d2 net/mlx5e: Placement changed for carrier state updates
More proper to declare carrier state UP only after the channels
are ready for traffic.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:27:59 -05:00
Tariq Toukan
daa21560a2 net/mlx5e: Replace async events spinlock with synchronize_irq()
We only need to flush the irq handler to make sure it does not
queue a work into the global work queue after we start to flush it.
So using synchronize_irq() is more appropriate than a spin lock.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:27:58 -05:00
David Ahern
4f25a1110c net: ipv6/l3mdev: Move host route on saved address if necessary
Commit f1705ec197 allows IPv6 addresses to be retained on a link down.
The address can have a cached host route which can point to the wrong
FIB table if the L3 enslavement is changed (e.g., route can point to local
table instead of VRF table if device is added to an L3 domain).

On link up check the table of the cached host route against the FIB
table associated with the device and correct if needed.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:22:52 -05:00
David S. Miller
292264de63 Merge branch 'net-timestamps-y2038'
Deepa Dinamani says:

====================
Convert network timestamps to be y2038 safe

Introduction:

The series is aimed at transitioning network timestamps to being
y2038 safe.
All patches can be reviewed and merged independently.

Socket timestamps and ioctl calls will be handled separately.

Thanks to Arnd Bergmann for discussing solution options with me.

Solution:

Data type struct timespec is not y2038 safe.
Replace timespec with struct timespec64 which is y2038 safe.

Changes v1 -> v2:
  Move and rename inet_current_time() as discussed
  Squash patches 1 and 2
  Reword commit text for patch 2/3
  Carry over review tags
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:18:45 -05:00
Deepa Dinamani
6497c7e640 net: sctp: Convert log timestamps to be y2038 safe
SCTP probe log timestamps use struct timespec which is
not y2038 safe.
Use struct timespec64 which is 2038 safe instead.

Use monotonic time instead of real time as only time
differences are logged.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:18:44 -05:00
Deepa Dinamani
b1b270d863 net: ipv4: tcp_probe: Replace timespec with timespec64
TCP probe log timestamps use struct timespec which is
not y2038 safe. Even though timespec might be good enough here
as it is used to represent delta time, the plan is to get rid
of all uses of timespec in the kernel.
Replace with struct timespec64 which is y2038 safe.

Prints still use unsigned long format and type.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:18:44 -05:00
Deepa Dinamani
822c868532 net: ipv4: Convert IP network timestamps to be y2038 safe
ICMP timestamp messages and IP source route options require
timestamps to be in milliseconds modulo 24 hours from
midnight UT format.

Add inet_current_timestamp() function to support this. The function
returns the required timestamp in network byte order.

Timestamp calculation is also changed to call ktime_get_real_ts64()
which uses struct timespec64. struct timespec64 is y2038 safe.
Previously it called getnstimeofday() which uses struct timespec.
struct timespec is not y2038 safe.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:18:44 -05:00
David S. Miller
e9c0d61d5f Merge branch 'ife'
Jamal Hadi Salim says:

====================
net_sched: Add support for IFE action

As agreed at netconf in Seville, here's the patch finally (1 year
was just too long to wait for an ethertype. Now we are just going
have the user configure one).
Described in netdev01 paper:
            "Distributing Linux Traffic Control Classifier-Action Subsystem"
             Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

The original motivation and deployment of this work was to horizontally
scale packet processing at scope of a chasis or rack. This means one
could take a tc policy and split it across machines connected over
L2. The paper refers to this as "pipeline stage indexing". Other
use cases which evolved out of the original intent include but are
not limited to carrying OAM information, carrying exception handling
metadata, carrying programmed authentication and authorization information,
encapsulating programmed compliance information, service IDs etc.
Read the referenced paper for more details.

The architecture allows for incremental updates for new metadatum support
to cover different use cases.
This patch set includes support for basic skb metadatum.
Followup patches will have more examples of metadata and other features.

v4 changes:
Integrate more feedback from Cong

v3 changes:
Integrate with the new namespace changes
Remove skbhash and queue mapping metadata (but keep their claim for ids)
Integrate feedback from Cong
Integrate feedback from Daniel

v2 changes:
Remove module option for an upper bound of metadata
Integrate feedback from Cong
Integrate feedback from Daniel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:15:23 -05:00
Jamal Hadi Salim
200e10f469 Support to encoding decoding skb prio on IFE action
Example usage:
    Set the skb priority using skbedit then allow it to be encoded

    sudo tc qdisc add dev $ETH root handle 1: prio
    sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action skbedit prio 17 \
    action ife encode \
    allow prio \
    dst 02:15:15:15:15:15

    Note: You dont need the skbedit action if you are already encoding the
    skb priority earlier. A zero skb priority will not be sent

    Alternative hard code static priority of decimal 33 (unlike skbedit)
    then mark of 0x12 every time the filter matches

    sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
    u32 match ip protocol 1 0xff flowid 1:2 \
    action ife encode \
    type 0xDEAD \
    use prio 33 \
    use mark 0x12 \
    dst 02:15:15:15:15:15

    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:15:23 -05:00
Jamal Hadi Salim
084e2f6566 Support to encoding decoding skb mark on IFE action
Example usage:
Set the skb using skbedit then allow it to be encoded

sudo tc qdisc add dev $ETH root handle 1: prio
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit mark 17 \
action ife encode \
allow mark \
dst 02:15:15:15:15:15

Note: You dont need the skbedit action if you are already encoding the
skb mark earlier. A zero skb mark, when seen, will not be encoded.

Alternative hard code static mark of 0x12 every time the filter matches

sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action ife encode \
type 0xDEAD \
use mark 0x12 \
dst 02:15:15:15:15:15

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:15:23 -05:00
Jamal Hadi Salim
ef6980b6be introduce IFE action
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
  match 00000000/00000000 at 0 (success 0 )
        action order 1: ife decode action reclassify
         index 1 ref 1 bind 1 installed 14 sec used 14 sec
         type: 0x0
         Metadata: allow mark allow hash allow prio allow qmap
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

YYYY: Lets show the sender side now ..

xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:	ethertype 0xdead
xxx:	add skb->mark to whitelist of metadatum to send
xxx:	rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
sudo tc -s filter ls dev $ETH parent 1: protocol ip
xxx:
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 0 success 0)
  match c0a87aed/ffffffff at 16 (success 0 )
  match 00010000/00ff0000 at 8 (success 0 )

	action order 1:  skbedit mark 17
	 index 6 ref 1 bind 1
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0

	action order 2: ife encode action pipe
	 index 3 ref 1 bind 1
	 dst MAC: 02:15:15:15:15:15 type: 0xDEAD
 	 Metadata: allow mark
 	Action statistics:
	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
	backlog 0b 0p requeues 0
xxx:

test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:15:22 -05:00
David S. Miller
d67703fced Here's another round of updates for -next:
* big A-MSDU RX performance improvement (avoid linearize of paged RX)
  * rfkill changes: cleanups, documentation, platform properties
  * basic PBSS support in cfg80211
  * MU-MIMO action frame processing support
  * BlockAck reordering & duplicate detection offload support
  * various cleanups & little fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJW0LZ0AAoJEGt7eEactAAdde0P/2meIOehuHBuAtL7REVoNhri
 bz9eSHMTg+ozCspL7F6vW1ifDI9AaJEaqJccmriueE/UQVC3VXRPGRJ4SFCwZGo9
 Zrtys2v9wOq0+XhxyN65Ucf41O9F/5FFabR5OFbf/pZhW5b2cubEjD1P4BB76Iya
 8O6wf9oDDjt3zJgYK+sygm3k9wtDVrH3qEbj8IDnCy22P7010qCsfok9swfaq8OB
 DBgb6BVfDOFTNXvJGH5fRuUKZdtovzzxorXnoG+zjmKmFdMVdgIYj9+2QfnMjW03
 B4/W85svcLLH8V3lHZc4G8oKM4J4XtjH1PskKIMF7ThJsKGMf8tL2vpt9rr8iscd
 Y9SwTEGc9JmhL7n2FaQFlY6ScLcp4ML+2rXxDOMpBmgF3Ne3yfBsJhLKZEl8vSfI
 mKhzGXpUKjJxJWIxkR0ylJy4/zHeIXkgRlUEhb8t+jgAqvOBTwiVY+vljHCDUERa
 sH40r1OqnGJtOHkSRqXSpxwXW+eKgyDd7fnnRX/tyttp2Fuew27/fN63SjpsfN6O
 3lfSM5bl3FcCKx7vqTLuqzsoqGvDDYkSq6GDfKDqeZIk0vaXA3SJNEOKgymFWQfR
 rzsaXvTbBT34GYRg3xS2NCxlmcBPemei/q0x6ZOffxhF41Qpqjs1dPB1Yq3AW4jD
 HGF+NdRbWEqEFVIjQa8w
 =JHOe
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Here's another round of updates for -next:
 * big A-MSDU RX performance improvement (avoid linearize of paged RX)
 * rfkill changes: cleanups, documentation, platform properties
 * basic PBSS support in cfg80211
 * MU-MIMO action frame processing support
 * BlockAck reordering & duplicate detection offload support
 * various cleanups & little fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:03:27 -05:00
David S. Miller
4ec620700c Merge branch 'bridge-mcast-tmp-router-port'
Nikolay Aleksandrov says:

====================
bridge: mcast: add support for temp router port

This set adds support for temporary router port which doesn't depend only
on the incoming queries. It can be refreshed by setting multicast_router to
the same value (3). The first two patches are minor changes that prepare
the code for the third which adds this new type of router port.
In order to be able to dump its information the mdb router port format
is changed in patch 04 and extended similar to how mdb entries format was
done recently.
The related iproute2 changes will be posted if this is accepted.

v2: set val first and adjust router type later in patch 01, patch 03 was
split in 2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:08 -05:00
Nikolay Aleksandrov
59f78f9f6c bridge: mcast: add support for more router port information dumping
Allow for more multicast router port information to be dumped such as
timer and type attributes. For that that purpose we need to extend the
MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries
recently. The new format is thus:
[MDBA_ROUTER_PORT] = { <- nested attribute
    u32 ifindex <- router port ifindex for user-space compatibility
    [MDBA_ROUTER_PATTR attributes]
}
This way it remains compatible with older users (they'll simply retrieve
the u32 in the beginning) and new users can parse the remaining
attributes. It would also allow to add future extensions to the router
port without breaking compatibility.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov
a55d8246ab bridge: mcast: add support for temporary port router
Add support for a temporary router port which doesn't depend only on the
incoming query. It can be refreshed if set to the same value, which is
a no-op for the rest.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov
4950cfd1e6 bridge: mcast: do nothing if port's multicast_router is set to the same val
This is needed for the upcoming temporary port router. There's no point
to go through the logic if the value is the same.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov
7f0aec7a66 bridge: mcast: use names for the different multicast_router types
Using raw values makes it difficult to extend and also understand the
code, give them names and do explicit per-option manipulation in
br_multicast_set_port_router.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:55:07 -05:00
David S. Miller
ec1606c090 Merge branch 'mv88e6xxx-vlan-filtering'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: implement VLAN filtering

This patchset fixes hardware bridging for non 802.1Q aware systems.

The mv88e6xxx DSA driver currently depends on CONFIG_VLAN_8021Q and
CONFIG_BRIDGE_VLAN_FILTERING enabled for correct bridging between switch ports.

Patch 1/9 adds support for the VLAN filtering switchdev attribute in DSA.

Patchs 2/9 and 3/9 add helper functions for the following patches.

Patchs 4/9 to 6/9 assign dynamic address databases to VLANs, ports, and
bridge groups (the lowest available FID is cleared and assigned), and thus
restore support for per-port FDB operations.

Patchs 7/9 to 9/9 refine ports isolation and setup 802.1Q on user demand.

With this patchset, ports get correctly bridged and the driver behaves as
expected, with or without 802.1Q support.

With CONFIG_VLAN_8021Q enabled, setting a default PVID to the bridge correctly
propagates the corresponding VLAN, in addition to the hardware bridging:

    # echo 42 > /sys/class/net/<bridge>/bridge/default_pvid

But considering CONFIG_BRIDGE_VLAN_FILTERING enabled, the hardware VLAN
filtering is enabled on all bridge members only when the user requests it:

    # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:54 -05:00
Vivien Didelot
214cdb9987 net: dsa: mv88e6xxx: support VLAN filtering
Implement port_vlan_filtering in the driver to toggle the related port
802.1Q mode between DISABLED and SECURE, on user request.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:53 -05:00
Vivien Didelot
46fbe5e5af net: dsa: mv88e6xxx: remove reserved VLANs
Now that ports isolation is correctly configured when joining or leaving
a bridge, there is no need to rely on reserved VLANs to isolate
unbridged ports anymore. Thus remove them, and disable 802.1Q on setup.

This restores the expected behavior of hardware bridging for systems
without 802.1Q or VLAN filtering enabled.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:53 -05:00
Vivien Didelot
b7666efe46 net: dsa: mv88e6xxx: restore VLANTable map control
The In Chip Port Based VLAN Table contains bits used to restrict which
output ports this input port can send frames to.

With the VLAN filtering enabled, these tables work in conjunction with
the VLAN Table Unit to allow egressing frames.

In order to remove the current dependency to BRIDGE_VLAN_FILTERING for
basic hardware bridging to work, it is necessary to restore a fine
control of each port's VLANTable, on setup and when a port joins or
leaves a bridge.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:52 -05:00
Vivien Didelot
466dfa0770 net: dsa: mv88e6xxx: assign dynamic FDB to bridges
Give a new bridge a fresh FDB, assign it to its members, and restore a
fresh FDB to a port leaving a bridge.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:52 -05:00
Vivien Didelot
2db9ce1fd9 net: dsa: mv88e6xxx: assign default FDB to ports
Restore per-port FDB. Assign them on setup, allow adding and deleting
addresses into them, and dump them.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:52 -05:00
Vivien Didelot
3285f9e869 net: dsa: mv88e6xxx: assign dynamic FDB to VLANs
Add a _mv88e6xxx_fid_new function which gives and flushes the lowest FID
available. Call it when preparing a new VTU entry.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:52 -05:00
Vivien Didelot
74b6ba0d76 net: dsa: mv88e6xxx: extract single FDB dump
Move out the code which dumps a single FDB to its own function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:52 -05:00
Vivien Didelot
2fb5ef09de net: dsa: mv88e6xxx: extract single VLAN retrieval
Rename _mv88e6xxx_vlan_init in _mv88e6xxx_vtu_new, eventually called
from a new _mv88e6xxx_vtu_get function, which abstracts the VTU GetNext
VID-1 trick to retrieve a single entry.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:51 -05:00
Vivien Didelot
fb2dabad69 net: dsa: support VLAN filtering switchdev attr
When a user explicitly requests VLAN filtering with something like:

    # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering

Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port
attribute.

Add support for it in the DSA layer with a new port_vlan_filtering
function to let drivers toggle 802.1Q filtering on user demand.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:24:51 -05:00
David S. Miller
7f66ee4156 Merge branch 'devlink'
Jiri Pirko says:

====================
Introduce devlink interface and first drivers to use it

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) setting up port splitters - split port into multiple ones and squash again,
   enables usage of splitter cable
3) setting up shared buffers - shared among multiple ports within
   one chip (work in progress)
4) configuration of switch wide properties - resources division etc - This will
   allow to pass configuration that is unacceptable to be passed as
   a module option.

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool for now called "dl". Command line
interface and outputs are derived from "ip" tool so it should be easy
for users to get used to it.

It is available here as a standalone tool for now:
https://github.com/jpirko/devlink
After this is merge in kernel, I will include the "dl" or "devlink" tool
into iproute2 toolset.

Port type setting example:
	myhost:~$ dl help
	Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
	where  OBJECT := { dev | port | monitor }
	       OPTIONS := { -v/--verbose }

	myhost:~$ dl dev help
	Usage: dl dev show [DEV]

	myhost:~$ dl dev show
	pci/0000:01:00.0

	myhost:~$ dl port help
	Usage: dl port show [DEV/PORT_INDEX]
	Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
	Usage: dl port split DEV/PORT_INDEX count
	Usage: dl port unsplit DEV/PORT_INDEX

	myhost:~$ dl port show
	pci/0000:01:00.0/1: type ib ibdev mlx4_0
	pci/0000:01:00.0/2: type ib ibdev mlx4_0

	myhost:~$ sudo dl port set pci/0000:01:00.0/1 type eth

	myhost:~$ dl port show
	pci/0000:01:00.0/1: type eth netdev ens4
	pci/0000:01:00.0/2: type ib ibdev mlx4_0

	myhost:~$ sudo dl port set ens4 type auto

	myhost:~$ dl port show
	pci/0000:01:00.0/1: type eth(auto) netdev ens4
	pci/0000:01:00.0/2: type ib ibdev mlx4_0

Port splitting example:
	myswitch:~$ sudo modprobe mlxsw_pci
	myswitch:~$ dl port
	pci/0000:03:00.0/1: type eth netdev eth0
	pci/0000:03:00.0/3: type eth netdev eth1
	pci/0000:03:00.0/5: type eth netdev eth2
	...
	pci/0000:03:00.0/63: type eth netdev eth31

	myswitch:~$ sudo dl port split pci/0000:03:00.0/1 2   (or "sudo dl port split eth0 2")

	myswitch:~$ dl port
	pci/0000:03:00.0/3: type eth netdev eth1
	pci/0000:03:00.0/5: type eth netdev eth2
	...
	pci/0000:03:00.0/63: type eth netdev eth31
	pci/0000:03:00.0/1: type eth netdev eth0 split_group 16
	pci/0000:03:00.0/2: type eth netdev eth32 split_group 16

	myswitch:~$ sudo dl port unsplit pci/0000:03:00.0/1

	myswitch:~$ dl port
	pci/0000:03:00.0/3: type eth netdev eth1
	pci/0000:03:00.0/5: type eth netdev eth2
	pci/0000:03:00.0/63: type eth netdev eth31
	pci/0000:03:00.0/1: type eth netdev eth0

v2->v3:
patch 1/9
 -removed generated devlink index and name, use bus name and dev name as
  a handle for all userspace originated commands. Along with that,
  remove sysfs stub. Requested by Hannes Sowa.
patch 2/9
 -add dev param to devlink_register (api change)
patch 4/9
 -add dev param to devlink_register (api change)
patch 9/9
 -set port's speed according to width fix by Ido
v1->v2:
patch 1/9
 -removed no longer used "devlink_dev" helper
 -fix couple of typos and misspells
patch 4/9:
 -removed SET_NETDEV_DEV set to devlink dev
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:31 -05:00
Ido Schimmel
18f1e70c41 mlxsw: spectrum: Introduce port splitting
Allow a user to split or unsplit a port using the newly introduced
devlink ops.

Once split, the original netdev is destroyed and 2 or 4 others are
created, according to user configuration. The new ports are like any
other port, with the sole difference of supporting a lower maximum
speed. When unsplit, the reverse process takes place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:31 -05:00
Ido Schimmel
a133318cde mlxsw: spectrum: Mark unused ports using NULL
When splitting and unsplitting we'll destroy usable ports on the fly, so
mark them using a NULL pointer to indicate that their local port number
is free and can be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Ido Schimmel
558c2d5e52 mlxsw: spectrum: Store local port to module mapping during init
The port netdevs are each associated with a different local port number
in the device. These local ports are grouped into groups of 4 (e.g.
(1-4), (5-8)) called clusters. The cluster constitutes the one of two
possible modules they can be mapped to. This mapping is board-specific
and done by the device's firmware during init.

When splitting a port by 4, the device requires us to first unmap all
the ports in the cluster and then map each to a single lane in the module
associated with the port netdev used as the handle for the operation.
This means that two port netdevs will disappear, as only 100Gb/s (4
lanes) ports can be split and we are guaranteed to have two of these
((1, 3), (5, 7) etc.) in a cluster.

When unsplit occurs we need to reinstantiate the two original 100Gb/s
ports and map each to its origianl module. Therefore, during driver init
store the initial local port to module mapping, so it can be used later
during unsplitting.

Note that a by 2 split doesn't require us to store the mapping, as we
only need to reinstantiate one port whose module is known.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Ido Schimmel
3e9b27b8fc mlxsw: spectrum: Unmap local port from module during teardown
When splitting a port we replace it with 2 or 4 other ports. To be able
to do that we need to remove the original port netdev and unmap it from
its module. However, we first mark it as disabled, as active ports
cannot be unmapped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Jiri Pirko
284ef80357 mlxsw: core: Add devlink port splitter callbacks
Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Jiri Pirko
c4745500e9 mlxsw: Implement devlink interface
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Jiri Pirko
b2facd95ab mlx4: Implement port type setting via devlink interface
So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.

Allow to expose the same ability to set port type in a generic way
using devlink interface.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:29 -05:00
Jiri Pirko
09d4d087cd mlx4: Implement devlink interface
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v2->v3:
-add dev param to devlink_register (api change)
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:29 -05:00
Jiri Pirko
bfcd3a4661 Introduce devlink infrastructure
Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, port splitter and port type setting is implemented.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:29 -05:00
David S. Miller
bd070e2126 Merge branch 'tc-sw-only'
John Fastabend says:

====================
tc software only

This adds a software only flag to tc but incorporates a bunch of comments
from the original attempt at this.

First instead of having the offload decision logic be embedded in cls_u32
I lifted into cls_pkt.h so it can be used anywhere and named the flag
TCA_CLS_FLAGS_SKIP_HW (Thanks Jiri ;)

In order to do this I put the flag defines in pkt_cls.h as well. However
it was suggested that perhaps these flags could be lifted into the
upper layer of TCA_ as well but I'm afraid this can not be done with
existing tc design as far as I can tell. The problem is the filters are
packed and unpacked in the classifier specific code and pushing the flags
through the high level doesn't seem easily doable. And we already have
this design where classifiers handle generic options such as actions and
policers. So I think adding one more thing here is OK as 'tc', et. al.
already know how to handle this type of thing.
====================

Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:05:40 -05:00