Since the resource list needs to be used also for other entries different
to fwd_entry_ref, make the list generic. For that purpose, introduce a
resource structure with couple of helpers that the code which need to
store a per-block resource should use.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce extension of mlxsw_afa_ops in order to get/put counter indexes
and implement the ops for Spectrum.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For devices with known capabilities of Fibre media type we now report that
to ethtool.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The detailed reset sequence ensures all HW components are in aligned
state before NIC startup. It also supports cards with signed firmware (RBL)
and checks if their FW is valid.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defines fw2x operations table and corresponding methods.
Some of the functions are being shared with 1.x firmware
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New AQC cards will have an updated firmware with new binary interface.
This patch extracts firmware specific operations into a separate table
and prepares for the introduction of new fw 2.x and 3.x
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address to check if HW is not dead/hang could be stored in
capabilities, since it is a constant. Change its name to better reflect
the idea.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These ops are not related to HW and are now implemented in pci module.
Thus, remove these ops pointers and implementation.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver contained a dead code of maintaining multiple pci port instances.
That will never be used since for each pci function a separate NIC
instance is created.
Simplify this, making pci module only responsible for pci resource
management.
NIC initialization is also simplified accordingly.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes unnecessary structure copying, and prepares the driver for
separate firmware ops table introduction.
We also remove extra copy of capabilities structure (which is const actually)
and also replace it with a const pointer in aq_nic_cfg.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of new AQC devices is going to be released. To support more
flexible capabilities management a number of static caps instances is now
declared. Devices now are mainly differs by supported speeds, but in future
more parameters will be customized. A set of AQC100 devices have
fibre media, not twisted pair - this is also reflected in
new capabilities definitions.
HW level also now directly exports hw_ops for each of A0/B0 hardware.
PCI configuration now uses a device configuration table where each
device ID is explicitly mapped with hardware OPs and capabilities
structures.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New set of aquantia devices has an upgraded hardware (B1).
The hardware interface is identical to B0. The difference will
be in firmware which is incompatible with old one.
Reorganized and removed duplicate speed and devid definitions
Introduced explicit flow control configuration defines
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========
Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.
From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.
From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
Thanks,
Saeed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJaYmJ7AAoJEEg/ir3gV/o+SBIH/2VUS1RPNQfgv1pU2WI78Me3
2Iy8zA2fyx5Ko28Kzu/QljlIUs5/4K0rIjDoT5NpmkLf22lWB3QSqyCkqducdl6J
6kAwZ5kPfA8r1jnhlQfhy5VLhaoNhcHeMafXwP9jSy3BvCYRWTQOsp8fN6fBcSX5
s75DcmqE5ljm5b2Y9pIVkYdzz3usQ9/kq+5MPcZodw3XISuXgv1UPlkon4DdVsXQ
7zf3frQnctfTAepkflWMm8BGUO15a4uYfhrMUyvOFHihtIWA9ILEpUC7qdn3uQ6X
phNA6BRoo9bdo7ZSl8+ZsCi+K4LtyevQTHs7Hn/OMSMvxvDLWZswldu2mbg9xVQ=
=YP/D
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-01-19
From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========
Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.
From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.
From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously it was possible to interrupt processing stats updates because
they were handled in a work queue. Interrupting the stats updates could
lead to a situation where we backup the control message queue. This patch
moves the stats update processing out of the work queue to be processed as
soon as hardware sends a request.
Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.
When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.
This patch prevent this division for both ppms and epms.
Fixes: c3164d2fc4 ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The zeroday builder notices that since Usermode Linux does not
have IO memory, the build fails for them when selecting everything
it can enable.
As the driver is clearly using memory-mapped registers to access
the network adapter, we add depends on HAS_IOMEM to solve this
problem.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-19
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) bpf array map HW offload, from Jakub.
2) support for bpf_get_next_key() for LPM map, from Yonghong.
3) test_verifier now runs loaded programs, from Alexei.
4) xdp cpumap monitoring, from Jesper.
5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
6) user space can now retrieve HW JITed program, from Jiong.
Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF firmware currently exposes IRQ moderation capability.
The driver will make use of it by default, inserting 50 usec
delay to every control message exchange. This cuts the number
of messages per second we can exchange by almost half.
None of the other capabilities make much sense for BPF control
vNIC, either. Disable them all.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most vNIC capabilities are netdev related. It makes no sense
to initialize them and waste FW resources. Some are even
counter-productive, like IRQ moderation, which will slow
down exchange of control messages.
Add to nfp_app a mask of enabled control vNIC capabilities
for apps to use. Make flower and BPF enable all capabilities
for now. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_init() is a little long and we are about to add more
code to reading capabilties. Move the capability reading,
parsing and validating out. Only actual initialization
will stay in nfp_net_init().
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow specifying alternative vNIC mailbox location in TLV caps.
This way we can size the mailbox to the needs and not necessarily
waste 512B of ctrl memory space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCIe island clock frequency is used when converting coalescing
parameters from usecs to NFP timestamps. Most chips don't run
at 1200MHz, allow FW to provide us with the real frequency.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP is entirely programmable, including the PCI data interface.
Using a fixed control BAR layout certainly makes implementations
easier, but require careful considerations when space is allocated.
Once BAR area is allocated to one feature nothing else can use it.
Allocating space statically also requires it to be sized upfront,
which leads to either unnecessary limitation or wastage.
We currently have a 32bit capability word defined which tells drivers
which application FW features are supported. Most of the bits
are exhausted. The same bits are also reused for enabling specific
features. Bulk of capabilities don't have a need for an enable bit,
however, leading to confusion and wastage.
TLVs seems like a better fit for expressing capabilities of applications
running on programmable hardware.
This patch leaves the front of the BAR as is, and declares a TLV
capability start at offset 0x58. Most of the space up to 0x0d90
is already allocated, but the used space can be wrapped with RESERVED
TLVs. E.g.:
Address Type Length
0x0058 RESERVED 0xe00 /* Wrap basic structures */
0x0e5c FEATURE_A 0x004
0x0e64 FEATURE_B 0x004
0x0e6c RESERVED 0x990 /* Wrap qeueue stats */
0x1800 FEATURE_C 0x100
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When driver app matching loaded FW is not found users are faced with:
nfp: failed to find app with ID 0x%02x
This message does not properly explain that matching driver code is
either not built into the driver or the driver is too old.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Representors are grouped in sets by type. Currently the whole
sets are under RCU protection, but individual representor pointers
are not. This causes some inconveniences when representors have
to be destroyed, because we have to allocate new sets to remove
any representors. Protect the individual pointers with RCU.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The write side of repr tables is always done under pf->lock.
Add a helper to dereference repr table pointers under protection
of that lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink used to have two global locks: devlink lock and port lock,
our lock ordering looked like this:
devlink lock -> driver's pf->lock -> devlink port lock
After recent changes port lock was replaced with per-instance
lock. Unfortunately, new per-instance lock is taken on most
operations now. This means we can only grab the pf->lock from
the port split/unsplit ops. Lock ordering looks like this:
devlink lock -> driver's pf->lock -> devlink instance lock
Since we can't take pf->lock from most devlink ops, make sure
nfp_apps are prepared to service them as soon as devlink is
registered. Locking the pf must be pushed down after
nfp_app_init() callback.
The init order looks like this:
nfp_app_init
devlink_register
nfp_app_start
netdev/port_register
As soon as app_init is done nfp_apps must be ready to service
devlink-related callbacks. apps can only register their own
devlink objects from nfp_app_start.
Fixes: 2406e7e546 ("devlink: Add per devlink instance lock")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP app is currently shut down as soon as all the vNICs are gone.
This means we can't depend on the app existing throughout the
lifetime of the device. Free the app only from PCI remove path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the helpers for accessing 4 or 8 byte values over
the CPP bus return the length of IO on success. If the IO
was short caller has to deal with error handling. The short
IO for 4/8B values is completely impractical. Make the
helpers return an error if full access was not possible.
Fix the few places which are actually dealing with errors
correctly, most call sites already only deal with negative
return codes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the packets return true for is_last_ethertype_ip, surround it
with likely compiler hint.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Extend the stats group API to have an update_stats() callback which
will be used to fetch the hardware or software counters data.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge the per priority traffic and pfc groups into one group, because
both groups share the same update_stats() callback which will be
introduced in the upcoming patch.
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt
recovery actions happened upon TX timeouts.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Up until this patch, on every TX timeout we would try to do channels
recovery. However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).
This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.
Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue. One of the above actions will move the queue to be ready for
transmit again.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When TX timeout occurs, EQ consumer index and irqn can help in debug for
understanding the SW state of EQ. Add them to the logger prints for the
relevant EQ only.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When driver callback for TX timeout is being called, it handles all
stopped xmit queues (not only the ones which their timeout expired).
Add usecs since last transmit to TX timeout logs per send queue in order
to monitor if the queue timeout expired.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For a given hairpin packet buffer size, different queue sizes
(values of log_hairpin_num_packets) determine how the data is broken
to strides on the RQ. Currently the chosen value is set to 64B strides.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow to specify the size of the hairpin queues along with the
packet buffer data size from the core setup code.
If the driver doesn't provide this, the FW applies proper value that
matches the provided data size and a FW chosen RQ stride size.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs
and RSS TTC table per hairpin instance and steer the related flows
through that table so they are spread between the pairs.
We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there
is one pair and no RSS while for 100Gbs ports two RSSed pairs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This will allow to be able and set TC rule whose steering dest is
RSS TTC steering table.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In order to use RSS for hairpin, we refactor the code that deals with
setup of the TTC steering tables. This is done using an interim ttc
params object that has the flow table attributes, TIR numbers, etc.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As part of the QoS model, on xmit, the HW mandates that all packets going
through a given SQ have the same priority. To align hairpin SQs with that,
we use the priority given as part of the matching for the hairpin hash key.
This ensures that flows/packets mapped to different HW priorities will
go through different hairpin instances. If no priority is given for
matching, we treat that as an 8th priority, this is in order not to
harm cases where priority is specified.
Only the PCP priority trust model is supported.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The peer vhca id spans less bits vs the ifindex and can
well serve for the hairpin hash key, move to use that.
This is a pre-step to put more info into the hairpin hash
key in downstream patch while keeping it at 32 bits.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
on T6, IPv6 filter would occupy 2 tids instead of 4.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use zlib deflate to compress firmware dump. Collect and compress
as much firmware dump as possible into a 32 MB buffer.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update firmware dump collection logic to use compression when available.
Let collection logic attempt to do compression, instead of returning out
of memory early.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable miistat is not used. So it is removed.
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet descriptor generation for IPv6 is broken.
Properly set L3 and L4 protocol flags for IPv6 descriptors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set some missing fields in the IP control offload buffer. This buffer is
used to enable checksum and TCP segmentation offload in the VNIC server.
The buffer length field and the checksum offloading bits were not set
properly, so fix that here.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new LPM tree is created, we try to replace the trees in the
existing virtual routers with it. If we fail, the tree needs to be
freed.
Currently, this does not happen in the unlikely case where we fail to
bind the tree to the first virtual router, since its reference count
never transitions from 1 to 0.
Fix that by taking a reference before binding the tree.
Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scheduling out and in for every FW message can slow us down
unnecessarily. Our experiments show that even under heavy load
the FW responds to 99.9% messages within 200 us. Add a short
busy wait before entering the wait queue.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The special handling of different map types is left to the driver.
Allow offload of array maps by simply adding it to accepted types.
For nfp we have to make sure array elements are not deleted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A cleanup of the PM code left an incorrect #ifdef in place, leading
to a harmless build warning:
drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2502:12: error: 'fm10k_suspend' defined but not used [-Werror=unused-function]
drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2475:12: error: 'fm10k_resume' defined but not used [-Werror=unused-function]
It's easier to use __maybe_unused attributes here, since you
can't pick the wrong one.
Fixes: 8249c47c6b ("fm10k: use generic PM hooks instead of legacy PCIe power hooks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch set those new jit info fields introduced in this patch set.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In the receive queue for 4096 bytes fragments, the page address
set in the SW data0 field of the descriptor is not the one we got
when doing the reassembly in receive. The page structure was retrieved
from the wrong descriptor into SW data0 which is then causing a
page fault when UDP checksum is accessing data above 1500.
Signed-off-by: Rex Chang <rchang@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
restructure the code which adds support for configuring
PCIe VF via mgmt netdevice. which was added by
commit 7829451c69 ("cxgb4: Add control net_device for
configuring PCIe VF")
Original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to convert from mlxsw_sp_port to net_device and back again.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from the prepared TC and in-driver ACL infrastructure and
introduce block sharing offload. For that, a new struct "block" is
introduced in spectrum_acl in order to hold a list of specific
block-port bindings.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead, pass netdev and ingress flag to ruleset unbind op.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to prepare for follow-up changes, make the bind/unbind helpers
very simple. That required move of ht insertion/removal and bind/unbind
calls into mlxsw_sp_acl_ruleset_create/destroy.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the driver exports different switchdev PARENT_IDs for
representors belonging to different SR-IOV PF-pools of an adapter.
This is not correct as the adapter can switch across all vports
of an adapter. This patch fixes this by exporting a common switchdev
PARENT_ID for all reps of an adapter. The PCIE DSN is used as the id.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip supports 64-byte and 128-byte cache line size for more optimal
DMA performance when matched to the CPU cache line size. The default is 64.
If the system is using 128-byte cache line size, set it to 128.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Forward hwrm_func_vf_cfg command from VF to PF driver, to store
VF MAC address in PF's context. This will allow "ip link show"
to display all VF MAC addresses.
Maintain 2 locations of MAC address in VF info structure, one for
a PF assigned MAC and one for VF assigned MAC.
Display VF assigned MAC in "ip link show", only if PF assigned MAC is
not valid.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_check_rings() is called by ethtool, XDP setup, and ndo_setup_tc()
to see if there are enough resources to support the new configuration.
Expand the call to test all resources if the firmware supports the new
API. With the more flexible resource allocation scheme, this call must
be made to check that all resources are available before committing to
allocate the resources.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of the old method of evenly dividing the resources to the VFs,
use the new firmware API to specify min and max resources for each VF.
This way, there is more flexibility for each VF to allocate more or less
resources.
The min is the absolute minimum for each VF to function. The max is the
global resources minus the resources used by the PF. Each VF is
guaranteed the min. Up to max resources may be available for some VFs.
The PF driver can use one of 2 strategies specified in NVRAM to assign
the resources. The old legacy strategy of evenly dividing the resources
or the new flexible strategy.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rfs_capable(), add call to reserve vnic resources to support
NTUPLE. Return true if we can successfully reserve enough vnics.
Otherwise, reserve the minimum 1 VNIC for normal operations not
supporting NTUPLE and return false.
Also, suppress warning message about not enough resources for NTUPLE when
only 1 RX ring is in use. NTUPLE filters by definition require multiple
RX rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new method will call firmware to reserve the desired tx, rx, cmpl
rings, ring groups, stats context, and vnic resources. A second query
call will check the actual resources that firmware is able to reserve.
The driver will then trim and adjust based on the actual resources
provided by firmware. The driver will then reserve the final resources
in use.
This method is a more flexible way of using hardware resources. The
resources are not fixed and can by adjusted by firmware. The driver
adapts to the available resources that the firmware can reserve for
the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In combined mode, the driver is currently not setting RX and TX ring
numbers the same when firmware can allocate more RX than TX or vice versa.
This will confuse the user as the ethtool convention assumes they are the
same in combined mode. Fix it by adding bnxt_trim_dflt_sh_rings() to trim
RX and TX ring numbers to be the same as the completion ring number in
combined mode.
Note that if TCs are enabled and/or XDP is enabled, the number of TX rings
will not be the same as RX rings in combined mode.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new API HWRM_FUNC_RESOURCE_QCAPS provides min and max hardware
resources. Use the new API when it is supported by firmware.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for new firmware APIs to allocate hardware resources,
add a new struct bnxt_hw_resc to hold various min, max and reserved
resources. This new structure is common for PFs and VFs.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After SRIOV has been enabled and disabled, the MSIX vectors assigned to
the VFs have to be re-initialized. Otherwise they cannot be re-used by
the PF. For example, increasing the number of PF rings after disabling
SRIOV may fail if the PF uses MSIX vectors previously assigned to the VFs.
To fix this, we add logic in bnxt_restore_pf_fw_resources() to close the
NIC, clear and re-init MSIX, and re-open the NIC.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new __bnxt_close_nic() function to do all the work previously done
in bnxt_close_nic() except waiting for SRIOV configuration. The new
function will be used in the next patch as part of SRIOV cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The version has new firmware APIs to allocate PF/VF resources more
flexibly.
New toolchains were used to generate this file, resulting in a one-time
large diffstat.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Meson8b the only valid input clock is MPLL2. The bootloader
configures that to run at 500002394Hz which cannot be divided evenly
down to 125MHz using the m250_div clock. Currently the common clock
framework chooses a m250_div of 2 - with the internal fixed
"divide by 10" this results in a RGMII TX clock of 125001197Hz (120Hz
above the requested 125MHz).
Letting the common clock framework propagate the rate changes up to the
parent of m250_mux allows us to get the best possible clock rate. With
this patch the common clock framework calculates a rate of
very-close-to-250MHz (249999701Hz to be exact) for the MPLL2 clock
(which is the mux input). Dividing that by 2 (which is an internal,
fixed divider for the RGMII TX clock) gives us an RGMII TX clock of
124999850Hz (which is only 150Hz off the requested 125MHz, compared to
1197Hz based on the MPLL2 rate set by u-boot and the Amlogic GPL kernel
sources).
SoCs from the Meson GX series are not affected by this change because
the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
since it's running at 1GHz, so it's already a multiple of 250MHz and
125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
set by Odroid-C1's u-boot is close to (but not exactly) 500MHz. The
exact rate is 500002394Hz, which is calculated in
drivers/clk/meson/clk-mpll.c using the following formula:
DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
Odroid-C1's u-boot configures MPLL2 with the following values:
- SDM_DEN = 16384
- SDM = 1638
- N2 = 5
The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
common clock framework chooses a divider which is too big to generate
the 250MHz clock (a divider of 2 would be needed, but this is rounded up
to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
because it requires a (close to) 125MHz RGMII TX clock (on Gbit speeds,
the IP block internally divides that down to 25MHz on 100Mbit/s
connections and 2.5MHz on 10Mbit/s connections - we don't need any
special configuration for that).
Round the divider to the closest value to prevent this issue on Meson8b.
This means we'll now end up with a clock rate for the RGMII TX clock of
125001197Hz (= 125MHz plus 1197Hz), which is close-enough to 125MHz.
This has no effect on the Meson GX SoCs since there fclk_div2 is used as
input clock, which has a rate of 1000MHz (and thus is divisible cleanly
to 250MHz and 125MHz).
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tests (using an oscilloscope and an Odroid-C1 board with a RTL8211F
RGMII PHY) have shown that the PRG_ETH0 register behaves as follows:
- bit 4 is a mux to choose between two parent clocks. according to the
public S805 datasheet the only supported parent clock is MPLL2 (this
was not verified using the oscilloscope).
The public S805/S905 datasheet claims that this bit is reserved.
- bits 9:7 control a one-based divider (register value 1 means "divide
by 1", etc.) for the input clock. we call this clock the "m250_div"
clock because it's value is always supposed to be (close to) 250MHz
(see below for an explanation).
The description in the public S805/S905 datasheet is a bit cryptic,
but it comes down to "input clock = 250MHz * value" (which could also
be expressed as "250MHz = input clock / value")
- there seems to be an internal fixed divide-by-2 clock which takes the
output from the m250_div and divides it by 2. This is not unusual on
Amlogic SoCs, since the SDIO (MMC) driver also uses an internal fixed
divide-by-2 clock.
This is not documented in the public S805/S905 datasheet
- bit 10 controls a gate clock which enables or disables the RGMII TX
clock (which is an output on the MAC/SoC and an input in the PHY). we
call this the "rgmii_tx_en" clock. if this bit is set to "0" the RGMII
TX clock output is close to 0
The description for this bit in the public S805/S905 datasheet is
"Generate 25MHz clock for PHY". Based on these tests it's believed
that this is wrong, and should probably read "Generate the 125MHz
RGMII TX clock for the PHY"
- the RGMII TX clock has to be set to 125MHz - the IP block adjusts the
output (automatically) depending on the line speed (RGMII specifies
that Gbit connections use a 125MHz clock, 100Mbit/s connections use a
25MHz clock and 10Mbit/s connections use a 2.5MHz clock. only Gbit and
100Mbit/s were tested with an oscilloscope). Due to the requirement
that this clock always has to be set to 125MHz and due to the fixed
divide-by-2 parent clock this means that m250_div will always end up
with a rate of (close to) 250MHz.
- bits 6:5 are the TX delay, which is also named "clock phase" in some
of Amlogic's older GPL kernel sources.
The PHY also has an XTAL_IN pin where a 25MHz clock has to be provided.
Tests with the oscilloscope have shown that this is routed to a crystal
right next to the RTL8211F PHY. The same seems to be true on the Khadas
VIM2 (which uses a GXM SoC) board - however the 25MHz crystal is on the
other side of the PCB there.
This updates the clocks in the dwmac-meson8b driver by replacing the
"m25_div" with the "rgmii_tx_en" clock and additionally introducing a
fixed divide-by-2 clock between "m250_div" and "rgmii_tx_en".
Now we also need to set a frequency of 125MHz on the RGMII clock
(opposed to the 25MHz we set before, with that non-existing
divide-by-5-or-10 divider).
Special thanks go to Linus Lüssing for testing the various bits and
checking the results with an oscilloscope on his Odroid-C1!
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Reported-by: Emiliano Ingrassia <ingrassia@epigenesys.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neither the m25_div_clk nor the m250_div_clk or m250_mux_clk are used in
RMII mode. The m25_div_clk output is routed to the RGMII PHY's "RGMII
clock".
This means that we don't need to configure the clocks in RMII mode. The
driver however did this - with no effect since the clocks are not routed
to the PHY in RMII mode.
While here also rename meson8b_init_clk to meson8b_init_rgmii_tx_clk to
make it easier to understand the code.
Fixes: 566e825162 ("net: stmmac: add a glue driver for the Amlogic Meson 8b / GXBB DWMAC")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0dfb33a0d7 ("sch_red: report backlog information") copied
child's backlog into RED's backlog. Back then RED did not maintain
its own backlog counts. This has changed after commit 2ccccf5fb4
("net_sched: update hierarchical backlog too") and commit d7f4f332f0
("sch_red: update backlog as well"). Copying is no longer necessary.
Tested:
$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
backlog 1260b 14p requeues 14
marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
backlog 1260b 14p requeues 14
Recently RED offload was added. We need to make sure drivers don't
depend on resetting the stats. This means backlog should be treated
like any other statistic:
total_stat = new_hw_stat - prev_hw_stat;
Adjust mlxsw.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add initial BPF map offloading for nfp driver. Currently only
programs were supported so far w/o being able to access maps.
Offloaded programs are right now only allowed to perform map
lookups, and control path is responsible for populating the
maps. BPF core infrastructure along with nfp implementation is
provided, from Jakub.
2) Various follow-ups to Josef's BPF error injections. More
specifically that includes: properly check whether the error
injectable event is on function entry or not, remove the percpu
bpf_kprobe_override and rather compare instruction pointer
with original one, separate error-injection from kprobes since
it's not limited to it, add injectable error types in order to
specify what is the expected type of failure, and last but not
least also support the kernel's fault injection framework, all
from Masami.
3) Various misc improvements and cleanups to the libbpf Makefile.
That is, fix permissions when installing BPF header files, remove
unused variables and functions, and also install the libbpf.h
header, from Jesper.
4) When offloading to nfp JIT and the BPF insn is unsupported in the
JIT, then reject right at verification time. Also fix libbpf with
regards to ELF section name matching by properly treating the
program type as prefix. Both from Quentin.
5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler.
This is needed, for example, when building libfd from source as
bpftool doesn't supply a config.h for bfd.h. Fix from Jiong.
6) xdp_convert_ctx_access() is simplified since it doesn't need to
set target size during verification, from Jesper.
7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE
program types, from Roman.
8) Various functions in BPF cpumap were not declared static, from Wei.
9) Fix a double semicolon in BPF samples, from Luis.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use the verifier log to output error messages if map lookup
can't be offloaded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently in the cases where cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP or
CMP_TYPE_RX_L2_TPA_END_CMP the exit path updates cpr->rx_bytes with an
uninitialized length len. Fix this by adding a new exit path that does
not update the cpr stats with the bogus length len and remove the unused
label next_rx_no_prod.
Detected by CoverityScan, CID#1463807 ("Uninitialized scalar variable")
Fixes: 6a8788f256 ("bnxt_en: add support for software dynamic interrupt moderation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to check if p_ent->comp_mode is QED_SPQ_MODE_EBLOCK before
calling qed_spq_add_entry(). The test is fine is the mode is EBLOCK,
but if it isn't then qed_spq_add_entry() might kfree(p_ent).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sts variable is holding link speed as well as state. We should
be using ls to index into ls_to_ethtool.
Fixes: 265aeb511b ("nfp: add support for .get_link_ksettings()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb len should be fetched before gro_receive - otherwise we may get
wrong or even outdated skb data.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Internal functions for registers and HW access were not prefixed.
This introduce noise in global kernel symbols. Here we add explicit prefix
'hw_atl' to all the HW access layer functions.
Alignment and styling were fixed as well.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Original driver code had internal registers and masks declarations
in low case and without any prefix.
Here we make all these uppercase and add already used HW_ATL prefix
to recognize these.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aq_nic_s was hidden in aq_nic_internal.h, that made it difficult to access
nic fields and structures from other modules.
This change moves aq_nic_s struct into aq_nic.h and thus makes it available
to other driver modules, mainly pci module and hw related module.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate useless passing of net_device_ops and ethtools_ops through
deep chain of calls.
Move all pci related code into aq_pci_func module.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware operations and capabilities tables are constants and
never changed. Declare these as constants.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use direct aq_hw_s *self reference where possible
Eliminate useless abstraction PHAL, duplicated structures definitions,
Simplify nic config structure creation and management.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Usage of aq_obj_s structure is noop, here we remove it
replacing access to flags filed directly.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds timestamping support for both receive and transmit
paths. On the receive side no filters are supported i.e either
all pkts will get a timestamp appended infront of the packet or none.
On the transmit side HW doesn't support timestamp insertion but
only generates a separate CQE with transmitted packet's timestamp.
Also HW supports only one packet at a time for timestamping on the
transmit side.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the Precision Time Protocol
Clocks and Timestamping hardware found on Cavium ThunderX
processors.
Signed-off-by: Radoslaw Biernacki <rad@semihalf.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:464:1: warning:
symbol 'mlxsw_sp_qdisc_prio_unoffload' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.
In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for getting the kvdl occupancy through the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host -> KVD hash single
2. IPv6 host -> KVD hash double
3. Adjacency -> KVD linear
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.
The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.
Fixes: c26eba03e4 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dual-port Ether configurations always have a shared TSU to e.g. pass
the packets between those ports. With the TSU init. code gathered under
the single *if*, we now can only get the port # from 'platform_device::id'
only when we actually need it (and not recalculate it each time)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sh_eth_cpu_data::chip_reset() method always resets using ARSTR and
this register is always located at the start of the TSU register region.
Therefore, we can only call this method if we know TSU is there and thus
simplify the probing code a bit...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARSTR is always located at the start of the TSU register region, thus
using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR. Use the correct condition/macro there...
Fixes: 6b4b4fead3 ("sh_eth: Implement ethtool register dump operations")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Gemini ethernet has been around for years as an out-of-tree
patch used with the NAS boxen and routers built on StorLink
SL3512 and SL3516, later Storm Semiconductor, later Cortina
Systems. These ASICs are still being deployed and brand new
off-the-shelf systems using it can easily be acquired.
The full name of the IP block is "Net Engine and Gigabit
Ethernet MAC" commonly just called "GMAC".
The hardware block contains a common TCP Offload Enginer (TOE)
that can be used by both MACs. The current driver does not use
it.
Cc: Tobias Waldvogel <tobias.waldvogel@gmail.com>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I see two issues with parameter new_link:
1. It's not needed. See also phy_interrupt(), works w/o this parameter.
phy_mac_interrupt sets the state to PHY_CHANGELINK and triggers the
state machine which then calls phy_read_status. And phy_read_status
updates the link state.
2. phy_mac_interrupt is used in interrupt context and getting the link
state may sleep (at least when having to access the PHY registers
via MDIO bus).
So let's remove it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver lacks a MODULE_LICENSE tag, leading to a Kbuild warning:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/cirrus/cs89x0.o
This adds license, author, and description according to the
comment block at the start of the file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.
New relocation types have to be added - for helpers and return
addresses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.
Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To be able to split code into reasonable chunks we need to add
the map data structures already. Later patches will add code
piece by piece.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-12
This series contains updates to ixgbe, fm10k and net core.
Alex updates the driver to remove a duplicate MAC address check and
verifies that we have not run out of resources to configure a MAC rule
in our filter table. Also do not assume that dev->num_tc was populated
and configured with the driver, since it can be configured via mqprio
without any hardware coordination. Fixed the recording of stats for
MACVLAN in ixgbe and fm10k instead of recording the receive queue on
MACVLAN offloaded frames. When handling a MACVLAN offload, we should
be stopping/starting traffic on our own queues instead of the upper
devices transmit queues. Fixed possible race conditions with the
MACVLAN cleanup with the interface cleanup on shutdown. With the
recent fixes to ixgbe, we can cap the number of queues regardless of
accel_priv being in use or not, since the actual number of queues are
being reported via real_num_tx_queues.
Tony fixes up the kernel documentation for ixgbe and ixgbevf to resolve
warnings when W=1 is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.
This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit eb789980d0 ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.
Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.
This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On targets that have different sizes for phys_addr_t and dma_addr_t,
we get a type mismatch error:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_alloc_dring':
drivers/net/ethernet/socionext/netsec.c:970:9: error: passing argument 3 of 'dma_zalloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
The code is otherwise correct, as the address is never actually used as a
physical address but only passed into a DMA register. For consistently,
I'm changing the variable name as well, to clarify that this is a DMA
address.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent checks added for formatting kernel-doc comments are causing warnings
if W= is run with a non-zero value. This patch fixes function comments to
resolve warnings when W=1 is used.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recent checks added for formatting kernel-doc comments are causing warnings
if W= is run with a non-zero value. This patch fixes function comments to
resolve warnings when W=1 is used.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This update makes it so that we report the actual number of Tx queues via
real_num_tx_queues but are still restricted to RSS on only the first pool
by setting num_tc equal to 1. Doing this locks us into only having the
ability to setup XPS on the queues in that pool, and only those queues
should be used for transmitting anything other than macvlan traffic.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that instead of bringing rings up/down for various
we just update the netdev pointer for the Rx ring and set or clear the MAC
filter for the interface. By doing it this way we can avoid a number of
races and issues in the code as things were getting messy with the macvlan
clean-up racing with the interface clean-up to bring the rings down on
shutdown.
With this change we opt to leave the rings owned by the PF interface for
both Tx and Rx and just direct the packets once they are received to the
macvlan netdev.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We should not be stopping/starting the upper devices Tx queues when
handling a macvlan offload. Instead we should be stopping and starting
traffic on our own queues.
In order to prevent us from doing this I am updating the code so that we no
longer change the queue configuration on the upper device, nor do we update
the queue_index on our own device. Instead we can just use the queue index
for our local device and not update the netdev in the case of the transmit
rings.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We shouldn't be recording the Rx queue on macvlan offloaded frames since
the macvlan is normally brought up as a single queue device, and it will
trigger warnings for RPS if we have recorded queue IDs larger than the
"real_num_rx_queues" value recorded for the device.
Instead we should be recording the macvlan statistics since we are
bypassing the normal macvlan statistics that would have been generated by
the receive path.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The code throughout ixgbe was assuming that dev->num_tc was populated and
configured with the driver, when in fact this can be configured via mqprio
without any hardware coordination other than restricting us to the real
number of Tx queues we advertise.
Instead of handling things this way we need to keep a local copy of the
number of TCs in use so that we don't accidentally pull in the TC
configuration from mqprio when it is configured in software mode.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We might as well configure the limit to default to 1 pool always for the
interface. This accounts for the fact that the PF counts as 1 pool if
SR-IOV is enabled, and in general we are always running in 1 pool mode when
RSS or DCB is enabled as well, though we don't need to actually evaluate
any of the VMDq features in those cases.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The macvlan driver itself will validate the MAC address that is configured
for a given interface. There is no need for us to verify it again.
Instead we should be checking to verify that we actually allocate the filter
and have not run out of resources to configure a MAC rule in our filter
table.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It's necessary to check hook whether being defined before
calling, improve the reliability.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable "changed" was defined to indicates features changed,
but was used only for feature NETIF_F_HW_VLAN_CTAG_RX. Add checking
for other features.
Fixes: 052ece6dc1 ("net: hns3: add ethtool related offload command")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the int_gl_idx does not be set, the default interrupt coalesce index
is 0. The TX queues and the RX queues will both use the GL0 as the
interrupt coalesce GL switch. But it should be GL1 for TX queues and GL0
for RX queues.
This patch adds the int_gl_idx setup for TX queues and RX queues.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, driver used 2us as the GL unit. The time unit ethtool
command "-c" and "-C" use is 1us, so now the GL unit driver uses
actually is 1us.
This patch changes the unit of GL value macro from
2us to 1us.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the TX GL and the RX GL need to be set separately,
hns3_set_vector_coalesc_gl() has been replaced with
hns3_set_vector_coalesce_rx_gl() and hns3_set_vector_coalesce_tx_gl().
This patch removes hns3_set_vector_coalesc_gl().
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GL update function uses the max GL value between tx_int_gl and
rx_int_gl to set both new tx_int_gl and new rx_int_gl. Therefore, User
can not enable TX GL self-adaptive or RX GL self-adaptive individually.
This patch refactors the code to update the TX GL and the RX GL
separately, making user can enable TX GL self-adaptive or RX GL
self-adaptive individually.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>