Commit Graph

246 Commits

Author SHA1 Message Date
Bhaktipriya Shridhar
3d5479e920 mlxsw: core: Remove deprecated create_workqueue
alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workqueue
mlxsw_wq is used for FDB notif. processing with workitems that are
involved in normal device operation && because it's a network device
which can be depended upon during memory reclaim.

Workitems &trans->timeout_dw and &mlxsw_sp->fdb_notify.dw,
map to mlxsw_sp_fdb_notify_work (processes FDB notifications from the
underlying device and resolves the netdev to which the entry points to
and notifies the bridge using the switchdev notifier) and
mlxsw_emad_trans_timeout_work (provides async EMAD register access)
respectively. They require forward progress under memory pressure and
hence, WQ_MEM_RECLAIM has been set.

Since there are only a fixed number of work items, explicit concurrency
limit is unnecessary here.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 23:49:43 -07:00
Ido Schimmel
d664b41e2a mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()
When rtnl_fill_ifinfo() is called for a certain netdevice it queries its
various parameters such as switch id and physical port name. The
function might get called in an atomic context, which means the
underlying driver must not sleep during the query operation.

Don't query the device and sleep during ndo_get_phys_port_name(), but
instead store the needed parameters in port creation time.

Fixes: 2bf9a58675 ("mlxsw: spectrum: Add support for physical port names")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 11:20:05 -07:00
Ido Schimmel
be94535f95 mlxsw: spectrum: Make split flow match firmware requirements
When a port is created following a split / unsplit we need to map it to
the correct module and lane, enable it and then continue to initialize
its various parameters such as MTU and VLAN filters.

Under certain conditions, such as trying to split ports at the bottom
row of the front panel by four, we get firmware errors.

After evaluating this with the firmware team it was decided to alter the
split / unsplit flow, so that first all the affected ports are mapped,
then enabled and finally each is initialized separately.

Fix the split / unsplit flow by first mapping and enabling all the
affected ports. Newer firmware versions will support both flows.

Fixes: 18f1e70c41 ("mlxsw: spectrum: Introduce port splitting")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 11:20:05 -07:00
David S. Miller
e800072c18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
In netdevice.h we removed the structure in net-next that is being
changes in 'net'.  In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.

The mlx5 conflicts have to do with vxlan support dependencies.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 15:59:24 -04:00
Jiri Pirko
5113bfdbc6 mlxsw: spectrum: Fix ordering in mlxsw_sp_fini
Fixes: 0f433fa0ec ("mlxsw: spectrum_buffers: Implement shared buffer configuration")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 19:00:00 -04:00
Ido Schimmel
2889286585 mlxsw: spectrum: Add missing rollback in flood configuration
When we fail to set the flooding configuration for the broadcast and
unregistered multicast traffic, we should revert the flooding
configuration of the unknown unicast traffic.

Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:43 -04:00
Ido Schimmel
51554db2d2 mlxsw: spectrum: Fix rollback order in LAG join failure
Make the leave procedure in the error path symmetric to the join
procedure and first remove the port from the collector before
potentially destroying the LAG.

Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:27:42 -04:00
Jiri Pirko
b94cdabbf1 mlxsw: spectrum_buffers: Use MLXSW_SP_PB_UNUSED define for unused pb
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15 13:02:43 -04:00
Jiri Pirko
ce78f02042 mlxsw: spectrum_buffers: Use designated initializers for mlxsw_sp_pbs
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-15 13:02:42 -04:00
Jiri Pirko
2d0ed39fbd mlxsw: spectrum_buffers: Implement occupancy monitoring
Implement occupancy API introduced in devlink and mlxsw core. This is
done by accessing SBPM register for Port-Pool and SBSR for Port-TC
current and max occupancy values. Max clear is implemented using the
same registers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:06 -04:00
Jiri Pirko
caf7297e7a mlxsw: core: Introduce support for asynchronous EMAD register access
So far it was possible to have one EMAD register access at a time,
locked by mutex. This patch extends this interface to allow multiple
EMAD register accesses to be in fly at once. That allows faster
processing on firmware side avoiding unused time in between EMADs.
Measured speedup is ~30% for shared occupancy snapshot operation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:06 -04:00
Jiri Pirko
dd9bdb04d2 mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processing
Follow-up patch is going to need to use delayed work as well and
frequently. The FDB notification processing is already using that and
also quite frequently. It makes sense to create separate workqueue just
for mlxsw driver in this case and do not pollute system_wq.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:06 -04:00
Jiri Pirko
42a7f1d774 mlxsw: reg: Extend SBPM register for occupancy control
Since it is not possible to get and clear Port-Pool occupancy data using
SBSR register, there's a need to implement that using SBPM.
Extend pack helper and add unpack helper to get occupancy values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:06 -04:00
Jiri Pirko
26176def3c mlxsw: reg: Add Shared Buffer Status register definition
This register allows to query HW for current and maximal buffer usage.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:05 -04:00
Jiri Pirko
1ceecc88d2 mlxsw: core: Add devlink shared buffer occupancy callbacks
Add middle layer in mlxsw core code to forward shared buffer occupancy
calls into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:05 -04:00
Jiri Pirko
0f433fa0ec mlxsw: spectrum_buffers: Implement shared buffer configuration
Implement previously introduced mlxsw core shared buffer API.
For Spectrum, that is done utilizing registers SBPR, SBCM and SBPM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:05 -04:00
Jiri Pirko
325f2f197d mlxsw: core: Add mlxsw_core_port_driver_priv helper
Needed in following patch.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:05 -04:00
Jiri Pirko
c30a53c7de mlxsw: spectrum_buffers: Get max_buff defaults into limits exposed to user
Although the device supports max_buff magic values 0 and 0xff, these are
not exposed to the user via devlink.
Therefore, adjust the default values to be within configurable range.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:05 -04:00
Jiri Pirko
bc872506f5 mlxsw: spectrum_buffers: Change initialization of PG 9
As explained in commit ff6551ec0c ("mlxsw: spectrum: Correctly
configure headroom size") control packets are directed to priority group
buffer 9 (PG9) in the ports' headroom buffers.

Since we don't want to drop control packets in case they can't be
admitted to the switch's shared buffer we bind PG9 to a different
ingress pool from the one used by all other PGs.

Unlike other PGs, we currently don't expose the binding between PG9 to a
pool and leave it fixed.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:04 -04:00
Jiri Pirko
5408f7cba3 mlxsw: spectrum_buffers: Remove eg pool 3 default init and CPU port TC binding to it
Since there is no congestion control for CPU port traffic, we can change
the CPU port TC binding to pool 0 with min_buff and max_buff zeroed.
Remove initialization for pool egress pool 3 since it is no longer used
by dafault.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:04 -04:00
Jiri Pirko
078f9c7132 mlxsw: spectrum_buffers: Cache shared buffer configuration
In order to achieve faster dumping of current setting and also in order
to provide possibility to get pool mode without a need to query hardware,
do cache the configuration in driver.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:04 -04:00
Jiri Pirko
aa99bc70ba mlxsw: spectrum_buffers: Rename "pool" to "pr" in initialization
Be consintent with rest of the registers (pm, cm) and use "pr" here.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:04 -04:00
Jiri Pirko
b11c3b4018 mlxsw: spectrum_buffers: Push out indexes and direction out of SB structs
Structs are in arrays so use array index as pool/tc/prio index. With
that, there is need to maintain separate arrays for ingress and egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:04 -04:00
Jiri Pirko
94266e3278 mlxsw: spectrum_buffers: Push out shared buffer register writes
Pushed them into helper functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:03 -04:00
Jiri Pirko
a6179bf0d1 mlxsw: core: Add devlink shared buffer callbacks
Add middle layer in mlxsw core code to forward shared buffer calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:22:03 -04:00
Jiri Pirko
9efc8f655c mlxsw: reg: Fix SBPM register name
Fix copy&paste error and state the name of SBPM register correctly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:38:43 -04:00
Jiri Pirko
497e8592c6 mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM
Same field, same values, so share the same enum.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:38:43 -04:00
Jiri Pirko
b2f10571b9 mlxsw: Do not pass around driver_priv directly
Instead of that, pass mlxsw_core and use a helper to get driver priv
from driver code. Looks much cleaner that way.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:38:42 -04:00
Jiri Pirko
307c2431ab mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit*
Instead of passing around driver priv, pass struct mlxsw_core *
directly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:38:42 -04:00
Jiri Pirko
932762b69a mlxsw: Move devlink port registration into common core code
Remove devlink port reg/unreg from spectrum and switchx2 code and rather
do the common work in core. That also ensures code separation where
devlink is only used in core.c.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:38:42 -04:00
Ido Schimmel
d81a6bdb87 mlxsw: spectrum: Add IEEE 802.1Qbb PFC support
Implement the appropriate DCB ops and allow a user to configure certain
traffic classes as lossless.

The operation configures PFC for both the egress (respecting PFC frames)
and ingress (sending PFC frames) parts of the port.

At egress, when a PFC frame is received for a PFC enabled priority, then
all the priorities mapped to the same TC are stopped.

At ingress, the priority group (PG) buffers to which the enabled PFC
priorities are mapped are configured to be lossless. PFC frames will be
transmitted when the Xoff threshold is crossed.

The user-supplied delay parameter is used to determine the PG's size
according to the following formula:

PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU

In the worst case scenario the delay will be made up of packets that
are all of size CELL_SIZE + 1, which means each packet will require
almost twice its true size when buffered in the switch. We therefore
multiply this value by the "cell factor", which is close to 2.

Another MTU is added in case the transmitting host already started
transmitting a maximum length frame when the PFC packet was received.

As with PAUSE enabled ports, when the port's MTU is changed both the
PGs' size and threshold are adjusted accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:20 -04:00
Ido Schimmel
34dba0a59d mlxsw: reg: Introduce per priority counters
We are going to add support for PFC as part of DCB ops, which requires us
to report the number of PFC frames sent and received per priority.

Add per priority counters in order to report number of PFC frames sent
and received per priority.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:20 -04:00
Ido Schimmel
9f7ec052b7 mlxsw: spectrum: Add support for PAUSE frames
When a packet ingress the switch it's placed in its assigned priority
group (PG) buffer in the port's headroom buffer while it goes through
the switch's pipeline. After going through the pipeline - which
determines its egress port(s) and traffic class - it's moved to the
switch's shared buffer awaiting transmission.

However, some packets are not eligible to enter the shared buffer due to
exceeded quotas or insufficient space. Marking their associated PGs as
lossless will cause the packets to accumulate in the PG buffer. Another
reason for packets accumulation are complicated pipelines (e.g.
involving a lot of ACLs).

To prevent packets from being dropped a user can enable PAUSE frames on
the port. This will mark all the active PGs as lossless and set their
size according to the maximum delay, as it's not configured by user.

                         +----------------+   +
                         |                |   |
                         |                |   |
                         |                |   |
                         |                |   |
                         |                |   |
                         |                |   | Delay
                         |                |   |
                         |                |   |
                         |                |   |
                         |                |   |
                         |                |   |
    Xon/Xoff threshold   +----------------+   +
                         |                |   |
                         |                |   | 2 * MTU
                         |                |   |
                         +----------------+   +

The delay (612 [Cells]) was calculated according to worst-case scenario
involving maximum MTU and 100m cables.

After marking the PGs as lossless the device is configured to respect
incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE)
according to user's settings.

Whenever the port's headroom configuration changes we take into account
the PAUSE configuration, so that we correctly set the PG's type (lossy /
lossless), size and threshold. This can happen when:

a) The port's MTU changes, as it directly affects the PG's size.

b) A PG is created following user configuration, by binding a priority
to it.

Note that the relevant SUPPORTED flags were already mistakenly set by
the driver before this commit.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:19 -04:00
Ido Schimmel
155f9de2e0 mlxsw: reg: Add lossless settings for PBMC register
When configuring PAUSE frames and PFC we'll need to configure the
Xon/Xoff threshold for the priority group (PG) buffers.

Add the Xon/Xoff threshold fields to the PBMC register so that we can
configure these when needed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:19 -04:00
Ido Schimmel
6f253d8381 mlxsw: reg: Add Port Flow Control Configuration register
Add the Port Flow Control Configuration (PFCC) register, which
configures both flow control and Priority-based Flow Control (PFC).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:19 -04:00
Ido Schimmel
cc7cf51758 mlxsw: spectrum: Allow setting maximum rate for a TC
Allow a user to set maximum rate for a particular TC using DCB ops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:19 -04:00
Ido Schimmel
8e8dfe9fdf mlxsw: spectrum: Add IEEE 802.1Qaz ETS support
Implement the appropriate DCB ops and allow a user to configure:
	* Priority to traffic class (TC) mapping with a total of 8
	  supported TCs
	* Transmission selection algorithm (TSA) for each TC and the
	  corresponding weights in case of weighted round robin (WRR)

As previously explained, we treat the priority group (PG) buffer in the
port's headroom as the ingress counterpart of the egress TC. Therefore,
when a certain priority to TC mapping is configured, we also configure
the port's headroom buffer.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:18 -04:00
Ido Schimmel
f00817df2b mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)
Introduce basic infrastructure for DCB and add the missing ops in
following patches.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:18 -04:00
Ido Schimmel
90183b980d mlxsw: spectrum: Initialize egress scheduling
Before introducing support for DCB ops we should first make sure we
initialize the relevant parts in the device correctly. Specifically, the
egress scheduling.

The device supports a superset of the 802.1Qaz standard with 4 hierarchy
levels that can be linked to each other in multiple ways and with
different transmission selection algorithms (TSA) employed between them.

However, since we only intend to support the 802.1Qaz standard we
flatten the hierarchies and let the user configure via DCB ops the TSA
and max rate shaper at the subgroup hierarchy (see figure below) and the
mapping between switch priority to traffic class. By default, all switch
priorities are mapped to traffic class 0, strict priority is employed
and max shaper is disabled.

Default configuration:

         switch priority 0      ...         switch priority 7
                 +                                  +
                 |                                  |
                 +----------------------------------+
                 |
              +--v--+                          +-----+
Traffic Class |     |                          |     |
  Hierarchy   | TC0 |           ...            | TC7 |
              |     |                          |     |
              +--+--+                          +--+--+
                 |                                |
              +--v--+                          +--v--+
  Subgroup    | SG0 |                          | SG7 |
  Hierarchy   |     |                          |     |
              +-----+                          +-----+
              | TSA |                          | TSA |
              +-----+           ...            +-----+
              | MAX |                          | MAX |
              +--+--+                          +--+--+
                 |                                |
                 +---------------+----------------+
                                 |
                              +--v--+
                      Group   |     |
                    Hierarchy | GR0 |
                              |     |
                              +--+--+
                                 |
                              +--v--+
                      Port    |     |
                    Hierarchy | PR0 |
                              |     |
                              +-----+

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:18 -04:00
Ido Schimmel
2c63a555e8 mlxsw: reg: Add QoS Switch Traffic Class Table register
As part of DCB ops we'll have to configure the priority to traffic class
mapping of a port.

Add the QoS Switch Traffic Class Table (QTCT) register, which configures
the mapping between the packet switch priority and traffic class on the
transmit port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:18 -04:00
Ido Schimmel
b9b7cee405 mlxsw: reg: Add QoS ETS Element Configuration register
We are going to introduce support for DCB, so we need to be able to
configure the traffic selection algorithm (TSA) used by each traffic
class (TC), as well as the bandwidth percentage allocated to each TC in
case of ETS.

Add the QoS ETS Element Configuration register, which controls the
above parameters.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:17 -04:00
Ido Schimmel
d6b7c13b01 mlxsw: spectrum: Set port's shared buffer size to 0
In addition to the priority group (PG) buffers in the headroom, the
device enables the allocation of headroom shared buffer, which can
be shared between different PGs.

However, we are not going to use the headroom shared buffer and instead
allow the user to use its size for PGs or the switch's shared buffer.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:17 -04:00
Ido Schimmel
7ad7cd6113 mlxsw: reg: Use correct PBMC register length
The last field of the PBMC register is at offset 0x64 and its size is
0x8, so the correct register's length is 0x6C bytes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:17 -04:00
Ido Schimmel
ff6551ec0c mlxsw: spectrum: Correctly configure headroom size
When packets ingress the switch they are assigned a switch priority and
directed to the corresponding priority group (PG) buffer in the port's
headroom buffer.

Since we now map all switch priorities to priority group 0 (PG0) by
default, there is no need to allocate the other priority groups during
initialization. The only exception is PG9, which is used for control
traffic.

At minimum, the PG should be able to store the currently classified
packet (pipeline latency isn't 0) and also the packets arriving during
the classification time. However, an incoming packet will not be
buffered if there is no available MTU-sized buffer space for storing it.

The buffer needed to accommodate for pipeline latency is variable and
needs to take into account both the current link speed and current
latency of the pipeline, which is time-dependent. Testing showed that
setting the PG's size to twice the current MTU is optimal.

Since PG9 is used strictly for control packets and not subject to flow
control, we are not going to resize it according to user configuration,
so we simply set it according to worst case scenario, which is twice the
maximum MTU.

In any case, later patches in the series will allow a user to direct
lossless flows to other PGs than PG0 and set their size to accommodate
for round-trip propagation delay.

The above change also requires us to resize the PG buffer whenever the
port's MTU is changed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:17 -04:00
Ido Schimmel
1a1984490f mlxsw: spectrum: Add bytes to cells helper
Buffers in the switch store packets in units called buffer cells. Add a
helper to convert from bytes to cells, so that the actual number of
cells required (result is round up) is returned.

Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro,
as this unit is also used in the ports' buffers and not only the
switch's shared buffer.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:16 -04:00
Ido Schimmel
dd6cb0f9fd mlxsw: spectrum: Map all switch priorities to priority group 0
During transmission, the skb's priority is used to map the skb to a
traffic class, where the idea is to group priorities with similar
characteristics (e.g. lossy, lossless) to the same traffic class. By
default, all priorities are mapped to traffic class 0.

In the device, we model the skb's priority as the switch priority, which
is assigned to a packet according to its PCP value and ingress port
(untagged packets are assigned the port's default switch priority - 0).

At ingress, the packet is directed to a priority group (PG) buffer in
the port's headroom buffer according to the packet's switch priority and
switch priority to buffer mapping.

While it's possible to configure the egress mapping between skb's
priority (switch priority) and traffic class, there is no mechanism to
configure the ingress mapping to a PG.

In order to keep things simple and since grouping certain priorities into
a traffic class at egress also implies they should be grouped the same
at ingress, treat a PG as the ingress counterpart of an egress traffic
class.

Having established the above, during initialization map all the switch
priorities to PG0 in accordance with the Linux defaults for traffic
class mapping.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:16 -04:00
Ido Schimmel
b98ff151b6 mlxsw: reg: Add Port Prio To Buffer register
When packets ingress the switch they are assigned a switch priority
number that dictates the packet's priority group (PG) buffer in the
port's headroom buffer.

Add the Port Prio To Buffer (PPTB) register, which configures the switch
priority to PG mapping.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06 17:24:16 -04:00
Ido Schimmel
2bf9a58675 mlxsw: spectrum: Add support for physical port names
Export to userspace the front panel name of the port, so that udev can
rename the ports accordingly. The convention suggested by switchdev
documentation is used:

1) Non-split: pX
2) Split: pXsY

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05 15:07:54 -04:00
Ido Schimmel
b555cf4a50 mlxsw: spectrum: Reduce number of supported 802.1D bridges
Resources allocated for these bridges at init time cannot be later used
for other purposes. While current number is supported by the device,
it's mostly theoretical with regards to any real use case, which leads
to poor utilization of device's resources. Solve that by reducing the
number.

The long term plan is to make this value (along with others) user
configurable via devlink and write it to NVRAM, so that it can be used
during the next init. Until then we must hardcode such values.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05 15:07:54 -04:00
Jiri Pirko
233fa44bd6 mlxsw: pci: Implement reset done check
Firmware now tells us that the reset is done by passing a magic value
via register. Use it to shorten the wait in case this is supported.
With old firmware, we still wait until the timeout is reached.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 22:30:01 -04:00
Ido Schimmel
869f63a4d2 mlxsw: spectrum: Check requested ageing time is valid
Commit c62987bbd8 ("bridge: push bridge setting ageing_time down to
switchdev") added a check for minimum and maximum ageing time, but this
breaks existing behaviour where one can set ageing time to 0 for a
non-learning bridge.

Push this check down to the driver and allow the check in the bridge
layer to be removed. Currently ageing time 0 is refused by the driver,
but we can later add support for this functionality.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 14:47:58 -05:00
David S. Miller
810813c47a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 12:34:12 -05:00
Ido Schimmel
5091730d77 mlxsw: pci: Correctly determine if descriptor queue is full
The descriptor queues for sending (SDQs) and receiving (RDQs) packets
are managed by two counters - producer and consumer - which are both
16-bit in size. A queue is considered full when the difference between
the two equals the queue's maximum number of descriptors.

However, if the producer counter overflows, then it's possible for the
full queue check to fail, as it doesn't take the overflow into account.
In such a case, descriptors already passed to the device - but for which
a completion has yet to be posted - will be overwritten, thereby causing
undefined behavior. The above can be achieved under heavy load (~30
netperf instances).

Fix that by casting the subtraction result to u16, preventing it from
being treated as a signed integer.

Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-07 11:39:15 -05:00
Ido Schimmel
912b1c89c5 mlxsw: spectrum: Always decrement bridge's ref count
Since we only support one VLAN filtering bridge we need to associate a
reference count with it, so that when the last port netdev leaves it, we
would know that a different bridge can be offloaded to hardware.

When a LAG device is memeber in a bridge and port netdevs are leaving
the LAG, we should always decrement the bridge's reference count, as it's
incremented for any port in the LAG.

Fixes: 4dc236c317 ("mlxsw: spectrum: Handle port leaving LAG while bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-07 11:39:15 -05:00
Arnd Bergmann
3d1cbe839a net: mellanox: add DEVLINK dependencies
The new NET_DEVLINK infrastructure can be a loadable module, but the drivers
using it might be built-in, which causes link errors like:

drivers/net/built-in.o: In function `mlx4_load_one':
:(.text+0x2fbfda): undefined reference to `devlink_port_register'
:(.text+0x2fc084): undefined reference to `devlink_port_unregister'
drivers/net/built-in.o: In function `mlxsw_sx_port_remove':
:(.text+0x33a03a): undefined reference to `devlink_port_type_clear'
:(.text+0x33a04e): undefined reference to `devlink_port_unregister'

There are multiple ways to avoid this:

a) add 'depends on NET_DEVLINK || !NET_DEVLINK' dependencies
   for each user
b) use 'select NET_DEVLINK' from each driver that uses it
   and hide the symbol in Kconfig.
c) make NET_DEVLINK a 'bool' option so we don't have to
   list it as a dependency, and rely on the APIs to be
   stubbed out when it is disabled
d) use IS_REACHABLE() rather than IS_ENABLED() to check for
   NET_DEVLINK in include/net/devlink.h

This implements a variation of approach a) by adding an
intermediate symbol that drivers can depend on, and changes
the three drivers using it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 09d4d087cd ("mlx4: Implement devlink interface")
Fixes: c4745500e9 ("mlxsw: Implement devlink interface")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03 17:08:59 -05:00
Ido Schimmel
18f1e70c41 mlxsw: spectrum: Introduce port splitting
Allow a user to split or unsplit a port using the newly introduced
devlink ops.

Once split, the original netdev is destroyed and 2 or 4 others are
created, according to user configuration. The new ports are like any
other port, with the sole difference of supporting a lower maximum
speed. When unsplit, the reverse process takes place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:31 -05:00
Ido Schimmel
a133318cde mlxsw: spectrum: Mark unused ports using NULL
When splitting and unsplitting we'll destroy usable ports on the fly, so
mark them using a NULL pointer to indicate that their local port number
is free and can be re-used.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Ido Schimmel
558c2d5e52 mlxsw: spectrum: Store local port to module mapping during init
The port netdevs are each associated with a different local port number
in the device. These local ports are grouped into groups of 4 (e.g.
(1-4), (5-8)) called clusters. The cluster constitutes the one of two
possible modules they can be mapped to. This mapping is board-specific
and done by the device's firmware during init.

When splitting a port by 4, the device requires us to first unmap all
the ports in the cluster and then map each to a single lane in the module
associated with the port netdev used as the handle for the operation.
This means that two port netdevs will disappear, as only 100Gb/s (4
lanes) ports can be split and we are guaranteed to have two of these
((1, 3), (5, 7) etc.) in a cluster.

When unsplit occurs we need to reinstantiate the two original 100Gb/s
ports and map each to its origianl module. Therefore, during driver init
store the initial local port to module mapping, so it can be used later
during unsplitting.

Note that a by 2 split doesn't require us to store the mapping, as we
only need to reinstantiate one port whose module is known.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Ido Schimmel
3e9b27b8fc mlxsw: spectrum: Unmap local port from module during teardown
When splitting a port we replace it with 2 or 4 other ports. To be able
to do that we need to remove the original port netdev and unmap it from
its module. However, we first mark it as disabled, as active ports
cannot be unmapped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Jiri Pirko
284ef80357 mlxsw: core: Add devlink port splitter callbacks
Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Jiri Pirko
c4745500e9 mlxsw: Implement devlink interface
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:30 -05:00
Ido Schimmel
28a01d2d7d mlxsw: spectrum: Allow for PVID deletion
When PVID is toggled off on a port member in a VLAN filtering bridge or
the PVID VLAN is deleted, make the port drop untagged packets. Reverse
the operation when PVID is toggled back on.

Set the PVID back to the default (1), when leaving the bridge so that
untagged traffic will be directed to the CPU.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 10:44:26 -05:00
Ido Schimmel
148f472da5 mlxsw: reg: Add the Switch Port Acceptable Frame Types register
When VLAN filtering is enabled on a bridge and PVID is deleted from a
bridge port, then untagged frames are not allowed to ingress into the
bridge from this port.

Add the Switch Port Acceptable Frame Types (SPAFT) register, which
configures the frame admittance of the port.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 10:44:26 -05:00
Ido Schimmel
6a9863a622 mlxsw: spectrum: Set STP state when leaving 802.1D bridge
When a VLAN device leaves a bridge its STP state is set to DISABLED,
which causes the hardware to discard any packets coming through the port
with this VLAN.

Fix that by setting STP state to FORWARDING when the device leaves its
bridge and allow traffic to be directed to CPU.

Fixes: 26f0e7fb15 ("mlxsw: spectrum: Add support for VLAN devices bridging")
Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 15:52:58 -05:00
Ido Schimmel
1e5ad30c64 mlxsw: Treat local port 64 as valid
MLXSW_PORT_MAX_PORTS represents the maximum number of local ports, which
is 65 for both ASICs (SwitchX-2 and Spectrum) supported by this driver.

Fixes: 93c1edb27f ("mlxsw: Introduce Mellanox switch driver core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 15:52:58 -05:00
Ido Schimmel
4f2c6ae5c6 switchdev: Require RTNL mutex to be held when sending FDB notifications
When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c2812 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 16:21:31 -08:00
Ido Schimmel
bbeeda27ab mlxsw: reg: Use correct offset in field definiton
The rx_lane, tx_lane and module fields in the PMLP register don't have
an additional offset besides the base one (0x04), so set it to 0x00.

Fixes: 4ec14b7634 ("mlxsw: Add interface to access registers and process events")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:32 -08:00
Ido Schimmel
3f47f86781 mlxsw: spectrum: Compare local ports instead of pointers
When dumping the FDB we can't compare the actual pointers of the ports
structs, as it's possible the struct represents a vPort instead of the
underlying physical port.

Solve this by comparing the local port number instead, as it's shared
between the physical ports and all the vPorts on top of him.

Fixes: 54a732018d ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:32 -08:00
Ido Schimmel
304f51584f mlxsw: spectrum: Dump LAG FDB records only once
LAG FDB records can only point to LAG devices or VLAN devices configured
on top of them. Therefore, when dumping the FDB we shouldn't associate
these records with the underlying physical ports.

Fixes: 8a1ab5d766 ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:31 -08:00
Ido Schimmel
e43aca22ed mlxsw: spectrum: Use correct netdev when notifying bridge
LAG FDB entries pointing to VLAN devices should be reported to the
bridge with the matching VLAN device and not the underlying LAG device.

Fixes: aac78a4408 ("mlxsw: spectrum: Adjust FDB notifications for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:31 -08:00
Ido Schimmel
004f85ea82 mlxsw: spectrum: Don't report VLAN for 802.1D FDB entries
When dumping the hardware FDB we should report entries pointing to VLAN
devices with VLAN 0, as packets coming into the bridge are untagged.
Likewise, pass FDB_{ADD,DEL} notifications with VLAN 0 for these
devices.

Fixes: 54a732018d ("mlxsw: spectrum: Adjust switchdev ops for VLAN devices")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:31 -08:00
Ido Schimmel
45827d780a mlxsw: spectrum: Notify bridge's FDB only based on learning_sync
When we disable learning on bridge port we should still update the
software bridge's FDB when entry pointing to this bridge port is
aged-out. We can otherwise have an inconsistency between software and
hardware tables.

Fixes: 8a1ab5d766 ("mlxsw: spectrum: Implement FDB add/remove/dump for LAG")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:31 -08:00
Ido Schimmel
454911333b mlxsw: spectrum: Disable learning according to STP state
When port is put into LISTENING state it shouldn't populate the FDB, so
set the port's STP state in hardware to DISCARDING instead of LEARNING.
It will therefore keep listening to BPDU packets, but discard other
non-control packets and won't perform any learning.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:31 -08:00
Ido Schimmel
9cb026ebb8 mlxsw: spectrum: Don't forward packets when STP state is DISABLED
When STP state is set to DISABLED the port is assumed to be inactive, but
currently we forward packets ingressing through it.

Instead, set the port's STP state in hardware to DISCARDING, which means
it doesn't forward packets or perform any learning, but it does trap
control packets. However, these packets will be dropped by bridge code,
which results in the expected behavior.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:30 -08:00
Ido Schimmel
039c49a6e6 mlxsw: spectrum: Flush FDB when leaving bridge
As explained in previous commit, we should always take care of flushing
the FDB in the driver and not rely on bridge code.

We need to distinguish between two cases with regards to LAG:

1) Port is leaving LAG while LAG is bridged (or VLAN devices on top of
it). In this case don't flush the FDB entries pointing to the LAG ID, as
this will affect other ports still member in the LAG. Only flush the FDB
when the last port in the LAG is leaving the bridge.

2) LAG device is leaving the bridge. In this case the CHANGEUPPER event
is simply propagated to each member port, so make each port flush the
FDB in its turn.

Note that emptying a bridged LAG from ports creates an inconsistency
between hardware and software. A user who later (< ageing_time)
re-populates the LAG won't have any FDB entries pointing to the LAG ID
in hardware, but they will be present in the software bridge's FDB.
Currently there is no good solution to this problem, but this will be
addressed by us in the future.

In order to optimize the flushing process, flush by port or LAG ID if
there are no VLAN interfaces on top of the port. Otherwise, flush using
(Port / LAG ID, FID=VID} for each of the lower 4K FIDs. In the case of
VLAN device simply flush using {Port / LAG ID, vFID} with the vFID to
which the VLAN device is mapped to.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:30 -08:00
Ido Schimmel
4193327135 mlxsw: reg: Add the Switch Filtering DB Flush register
When removing a net device from a bridge we should flush the FDB entries
associated with this net device. Up until now, we relied upon bridge
code to do that for us, but it is possible for user to prevent hardware
from syncing with the software bridge (learning_sync=0), so we need to
flush overselves.

Add the Switch Filtering DB Flush (SFDF) register that is used to flush
FDB entries according to different parameters (per-port, per-FID etc).

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:30 -08:00
Ido Schimmel
4dc236c317 mlxsw: spectrum: Handle port leaving LAG while bridged
It is possible for a user to remove a port from a LAG device, while the
LAG device or VLAN devices on top of it are bridged. In these cases,
bridge's teardown sequence is never issued, so we need to take care of
it ourselves.

When LAG's unlinking event is received by port netdev:

1) Traverse its vPorts list and make those member in a bridge leave it.
   They will be deleted later by LAG code.

2) Make the port netdev itself leave its bridge if member in one.

Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28 15:55:30 -08:00
Dan Carpenter
00ae40e71d mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB
There is a missing break statement so we always return -EOPNOTSUPP.

Fixes: 3a49b4fde2 ('mlxsw: Adding layer 2 multicast support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-13 10:31:21 -05:00
David S. Miller
9d367eddf3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/ethernet/mellanox/mlxsw/spectrum.h
	drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11 23:55:43 -05:00
Ido Schimmel
366ce60315 mlxsw: spectrum: Add FDB lock to prevent session interleaving
Dumping the FDB (invoked with a process context) or handling FDB
notifications (polled periodicly in delayed work) might each entail
multiple EMAD transcations due to the number of entries.

While we only allow one EMAD transaction at a time, there is nothing
stopping the dump and notification processing sessions from
interleaving. However, this is forbidden by the hardware, so we need to
make sure only one of these sessions can run at a time.

Solve this by adding a mutex ('fdb_lock'), as both kernel threads can
sleep while waiting for the response EMAD.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11 00:21:19 -05:00
Elad Raz
3a49b4fde2 mlxsw: Adding layer 2 multicast support
Add SWITCHDEV_OBJ_ID_PORT_MDB switchdev ops support. On first MDB insertion
creates a new multicast group (MID) and add members port to the MID. Also
add new MDB entry for the flooding-domain (fid-vid) and link the MDB entry
to the newly constructed MC group.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz
e4b6f6931c mlxsw: Adding VID to FID translatation
Adding a generic function that translate VID to FID.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz
53ae628316 mlxsw: Changing the maximum number of multicast group to a define
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz
fabe548322 mlxsw: reg: Adding SMID register
Adding back SMID register definition and packing. For each MC group a new
SMID entry will be generated.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Elad Raz
5230b25f06 mlxsw: reg: Add definition of multicast record for SFD register
Multicast-related records have specific format in SFD register.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Jiri Pirko
12f1501e75 mlxsw: spectrum: remove FDB entry in case we get unknown object notification
It may happen that we get notification for FDB entry for object (port,
lag, vport), which does not exist. Currently we ignore that, which only
causes this being re-sent in next notification. The entry will never
disappear. So get rid of it by simply removing it using SFD register.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-09 21:06:33 -05:00
Jiri Pirko
2fa9d45e16 mlxsw: spectrum: pass local_port to mlxsw_sp_port_fdb_uc_op
Do not pass struct mlxsw_sp_port to mlxsw_sp_port_fdb_uc_op and rather
just pass local_port. This is needed in case this is called from SFN
process function and mlxsw_sp_port is not existent for particular
local_port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-09 21:06:33 -05:00
Dan Carpenter
719255d0e2 mlxsw: core: remove an unnecessary condition
We checked "err" on the lines before so we know it's zero here.

These cause a static checker warning because checking known things can
indicate a bug.  Maybe there is a missing assignment or we are checking
the wrong variable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 15:07:47 -05:00
Elad Raz
fc1273afb2 mlxsw: Remember untagged VLANs
When a vlan is been configured, remeber the untagged mode of the vlan.
When displaying the list of configured VLANs, show the untagged attribute.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 14:42:42 -05:00
Elad Raz
26a4ea0f45 mlxsw: Disable vlan_filtering for non .1D bridge
When a port is bridged, the bridge must be vlan aware bridge (.1Q)
or the bridging should be on top of VLAN interfaces (.1D bridge).

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 14:42:41 -05:00
Elad Raz
e4a1305507 mlxsw: Renaming local variable names for consistency
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 14:42:41 -05:00
Elad Raz
29edf44f85 mlxsw: Fixing vlans init range
Initialize VLANs 0..4095 (Remove init for VID 4096).

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 14:42:41 -05:00
Ido Schimmel
f0138e2596 mlxsw: pci: Adjust value of CPU egress traffic class
During initialization, when creating the send descriptor queues (SDQs),
we specify the CPU egress traffic class of each SDQ. The maximum number
of classes of this type is different in the two ASICs supported by this
PCI driver.

New firmware versions check this value is set correctly, which causes
errors on the Spectrum ASIC, as its max exposed egress traffic class is
lower than 7.

Solve this by setting this field to 3, which is an acceptable value for
both ASICs.

Note that we currently do not expose the QoS capabilities of the ASICs,
so setting this to an hardcoded value is OK for now.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:48:50 -05:00
Ido Schimmel
6c72a3d0d3 mlxsw: spectrum: Change bridge port attributes only when bridged
Bridge port attributes are offloaded to hardware when invoked with SELF
flag set, but it really makes no sense to reflect them when port is not
bridged.

Allow a user to change these attribute only when port is bridged and
initialize them correctly when joining or leaving a bridge.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel
5a8f45258e mlxsw: spectrum: Set bridge status in appropriate functions
Set the bridge status of physical ports in the appropriate functions, to
be consistent with LAG join/leave and vPorts joining/leaving bridge.

Also, remove the error messages in these two functions, as we already
emit errors in both the single functions they call.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel
78124078c4 mlxsw: spectrum: Return NOTIFY_BAD on bridge failure
It is possible for us to fail when joining or leaving a bridge, so let
the user know about that by returning NOTIFY_BAD, as already done for
LAG join/leave and 802.1D bridges.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Ido Schimmel
7b31abe70b mlxsw: spectrum: Initialize PVID only once
We set PVID to 1 in mlxsw_sp_port_vlan_init(), so we can remove this
statement.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:07:58 -05:00
Jiri Pirko
f4cee3af0d mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure
KASan reported use-after-free for the hwmon structure. So fix this by
using devm_kzalloc and let the core take care about freeing the memory
during device dettach.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 89309da39 ("mlxsw: core: Implement temperature hwmon interface")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22 16:25:09 -05:00
Jiri Pirko
e7bc73cbb5 mlxsw: core: Allow to reset temperature history via hwmon interface
Add another sysfs hwmon attribute to expose possibility to reset
temperature sensors history.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-22 16:00:04 -05:00
Ido Schimmel
272c447017 mlxsw: spectrum: Add support for VLAN devices on top of LAG
When creating a VLAN device on top of LAG, we are basically creating a
vPort on top of each of the port netdevs member in the LAG. Therefore,
these vPorts should inherit both the LAG status and LAG ID from the
underlying port netdevs.

In addition, when the VLAN device joins or leaves a bridge each of the
underlying vPorts should know about it and act accordingly. This is
achieved by propagating the VLAN event down to the lower devices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15 11:58:24 -05:00