The failure in releasing one UAR doesn't mean that we can't continue to
release rest of system pages, so don't return too early.
As part of cleanup, there is no need to print warning if
mlx5_cmd_free_uar() fails because such warning will be printed as part of
mlx5_cmd_exec().
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This function is not in use - delete it.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
port_num is untrusted data from the user, so it should be checked after
calling fill_sgid_attr, which validates it.
Fixes: 8d9ec9addd ("IB/core: Add a sgid_attr pointer to struct rdma_ah_attr")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since slave GID's do not exist in the core gid table we can no longer use
the core code to help do this without creating inconsistencies. Directly
create the AH using mlx4 internal APIs.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
kern_spec->reserved is checked prior to calling
kern_spec_to_ib_spec_filter() which makes this second check redundant.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
dma_map_sg_attrs() returns 0 on error and can't return a negative number
(ensured by BUG_ON), so don't check.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Number of specs is provided by user and in valid case can be equal to zero.
Such argument causes to call to kcalloc() with zero-length request and in
return the ZERO_SIZE_PTR is assigned. This pointer is different from NULL
and makes various if (..) checks to success.
Fixes: b6ba4a9aa5 ("IB/uverbs: Add support for flow counters")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Following the removal of ib_create_flow(), adjust the code to get rid of
ib_destroy_flow() too.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are no kernel users of this interface so lets drop it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In the accepted series "Refactor ib_uverbs_write path", we presented the
roadmap to get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask fields in
favor of simple check of function pointer. So let's put NULL check of
create_flow function callback despite the fact that uverbs_ex_cmd_mask
still exists.
Link: https://www.spinics.net/lists/linux-rdma/msg60753.html
Suggested-by: Michael J Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
usnic has a modified version of the core codes' ib_umem_get() and
related, and the copy misses many of the bug fixes done over the years:
Commit bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
Commit 87773dd56d ("IB: ib_umem_release() should decrement mm->pinned_vm
from ib_umem_get")
Commit 8494057ab5 ("IB/uverbs: Prevent integer overflow in ib_umem_get
address arithmetic")
Commit 8abaae62f3 ("IB/core: disallow registering 0-sized memory region")
Commit 66578b0b2f ("IB/core: don't disallow registering region starting
at 0x0")
Commit 53376fedb9 ("RDMA/core: not to set page dirty bit if it's already
set.")
Commit 8e907ed488 ("IB/umem: Use the correct mm during ib_umem_release")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.
Specifically, Upon internal error state modify QP to an error state can
be assumed to be success as each in-progress WR going to be flushed in
error in any case as expected by that modify command.
In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.
In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.
As the above depends on scheduling the code takes the relevant locks
and actions to make sure that the completion handler for that WR will
always be called after that the post_send/recv were issued but not in
parallel to the other task that handles the error flow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.
Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.
In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.
In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.
As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that all users have been converted to use the version of these APIs
that returns a gid_attr pointer we can delete the old entry points.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Prior patches now ensure that the AV has a sgid_attr, if one would have
been required. Instead of querying for one, take it directly from the AH.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
While processing a path record entry in CM messages the associated GID
attribute is now also supplied.
Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.
Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.
This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Make the sgid_attr available along with path information to the event
consumer, this allows the consumer to keep using the same GID table entry
as the event is related to.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Hold reference to the the sgid_attr which is used in a cm_id until the
cm_id is destroyed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Use the sgid and other information from the path record to figure out the
sgid_attrs.
Store the selected table entry in the sgid_attr for everything else to
use.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This is really just a CM support function, normally a multicast address
does not have a specific SGID - but the RDMA CM usage model does restrict
things to the netdevice the CM id is bound to, at least for roce case.
Store the selected table entry in the sgid_attr for everything else to
use.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The work completion is inspected to determine what dgid table entry was
used to receieve the packet, produces a sgid_attr that matches and sticks
it in the ah_attr.
All callers of this function are now required to release the ah_attr on
success.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).
Remove all supporting code for the INTx IRQ.
If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Many fields in ctxtdata are incorrectly sized and the organization of the
fields within the structure is a jumble.
Fix by:
- Correcting oversize fields.
- Putting fields common to all contexts at the top with hot fields
at the top.
- Moving PSM fields to the bottom of the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fields in this structure are sized excessively based on hardware
limitations and input values.
Fix by reducing fields as appropriate and repositioning to close holes in
the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It is only ever written.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If we already hold the table->lock when doing the kref_put it means we are
in a context where it is safe to do the deletion synchronously, with no
need for the work queue.
This helps to eliminate issues when GID change is requested as part of MAC
address change or bonding event change where expectation is to replace the
GID almost immediately.
Fixes: b150c3862d ("IB/core: Introduce GID entry reference counts")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When sending multicast leave request, consider the net ns in which this
cm_id is created.
Code was duplicated in cma_leave_mc_groups() and rdma_leave_multicast(),
which is now done using a helper function cma_leave_roce_mc_group().
Fixes: bee3c3c918 ("IB/cma: Join and leave multicast groups with IGMP")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In this context the uobject is not allowed to be NULL, so type is the same
as uobject->type, and at least for IDR, id is the same as uobject->id.
FD objects should never handle the FD number outside the uAPI boundary
code.
Suggested-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git
Pull RoCE ICRC counters from Leon Romanovsky:
====================
This series exposes RoCE ICRC counter through existing RDMA hw_counters
sysfs interface.
The first patch has all HW definitions in mlx5_ifc.h file and second patch
is the actual counter implementation.
====================
* branch 'icrc-counter':
IB/mlx5: Support RoCE ICRC encapsulated error counter
net/mlx5: Add RoCE RX ICRC encapsulated counter
This patch adds support to query the counter that counts the
RoCE packets with corrupted ICRC (Invariant Cyclic Redundancy Code).
This counter will be under
/sys/class/infiniband/<mlx5-dev>/ports/<port>/hw_counters/
rx_icrc_encapsulated - The number of RoCE packets with ICRC
error.
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Put all relevant checks for transport domain in the
mlx5_ib_alloc/dealloc_transport_domain functions.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Move some s_flags defines out of rdmavt and into hfi1 because they are
hfi1 specific and therefore should remain in the driver instead of
bubbling up to rdmavt.
Document device specific ranges in rdmavt and remap
those in hfi1.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The field is based on a constant that can never change.
Use the define to assign the register instead.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.
The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.
Both fields are now correctly sized as u8.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current implementation precludes having receive context specific
packet type receive handlers.
Fix this by adding adding c99 const array for the existing handlers and
remove the current 72 bytes of pointers from devdata.
A new pointer in hfi1_ctxtdata will point to the const array.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Expose DEVX tree to be used by upper layers.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Return the matching device EQN for a given user vector number via the
DEVX interface.
Note:
EQs are owned by the kernel and shared by all user processes.
Basically, a user CQ can point to any EQ.
The kernel doesn't enforce any such limitation today either.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to register a memory with the firmware via the DEVX
interface.
The driver translates a given user address to ib_umem then it will
register the physical addresses with the firmware and get a unique id
for this registration to be used for this virtual address.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Return a device UAR index for a given user index via the DEVX interface.
Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.
If no match the doorbell is silently ignored by the hardware. Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.
Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.
The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support in DEVX for modify and query commands, the required lock is
taken (i.e. READ/WRITE) by the KABI infrastructure accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to create and destroy firmware objects via the DEVX
interface.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to run general firmware command via the DEVX interface.
A command that works on some object (e.g. CQ, WQ, etc.) will be added
in next patches while maintaining the required object lock.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce DEVX to enable direct device commands in downstream patches
from this series.
In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.
A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Drivers that use the IOCTL API may have the ib_uverbs_file and need a
way to get the related ib_ucontext from it, this is enabled by this
patch.
Downstream patches from this series will use it.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ioctl parser framework wrongly assumed that each namespace is
populated. This could lead to NULL dereferences. Fix the parser to
always check that a given namespace indeed exists.
Fixes: fac9658cab ("IB/core: Add new ioctl interface")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Sometimes the uverbs uAPI doesn't really care about the structure it gets
from user-space. All it wants to do is to allocate enough space and send
it to the hardware/provider driver. Adding a UVERBS_ATTR_MIN_SIZE that
could be used for this scenarios. We use USHRT_MAX as the kernel known
size to bypass any zero validations.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Adding UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY flag to PTR_IN attributes.
By using this flag, the parse automatically allocates and copies the
user-space data. This data is accessible by using uverbs_attr_get_len
and uverbs_attr_get_alloced_ptr inline accessor functions from the
handler.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
uverbs_finalize_objects is currently used only to commit or abort
objects. Since we want to add automatic allocation/free of PTR_IN
attributes, moving it to uverbs_ioctl.c and renamit it to
uverbs_finalize_attrs.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As provider drivers could use UVERBS_ATTR_FD and UVERBS_ATTR_IDR macros
need to export them.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
memcpy() of mapped addresses is done twice in c4iw_create_listen(),
removing the duplicate memcpy().
Fixes: 170003c894 ("iw_cxgb4: remove port mapper related code")
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before goto err2, the variable qp is checked. So it is not necessary
to check qp in label err2.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Set the vlan flag and vlan_id field in the wc for rdma_listen()
to work over VLAN. This is required by ib_init_ah_attr_from_wc()
which is called by the CM REQ handler.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Increase the max MR limit to support more I/O queues
for NVMe over Fabrics hosts.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allocate agent IDs from a global IDR instead of an atomic variable.
This eliminates the possibility of reusing an ID which is already in
use after 4 billion registrations. We limit the assigned ID to be less
than 2^24 as the mlx4 driver uses the most significant byte of the agent
ID to store the slave number. Users unlucky enough to see a collision
between agent numbers and slave numbers see messages like:
mlx4_ib: egress mad has non-null tid msb:1 class:4 slave:0
and the MAD layer stops working.
We look up the agent under protection of the RCU lock, which means we
have to free the agent using kfree_rcu, and only increment the reference
counter if it is not 0.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reported-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Tested-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For UD the drivers were doing a sgid_index lookup into the cache to get
the attrs, however we can now directly access the same attrs stores in
the ib_ah instead and remove the lookup.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
If the AH has a GRH then hold a reference to the sgid_attr inside the
common struct.
If the QP is modified with an AV that includes a GRH then also hold a
reference to the sgid_attr inside the common struct.
This informs the cache that the sgid_index is in-use so long as the AH or
QP using it exists.
This also means that all drivers can access the sgid_attr directly from
the ah_attr instead of querying the cache during their UD post-send paths.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
While converting GID index from attribute to that of the HCA, GID
attribute is available from the ah_attr. Make use of GID attribute
to simplify the code and also avoid avoid GID query.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.
Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Introduce AH attribute copy, move and replace APIs to be used by core and
provider drivers.
In CM code flow when ah attribute might be re-initialized twice while
processing incoming request, or initialized once while from path record
while sending out CM requests. Therefore use rdma_move_ah_attr API to
handle such scenarios instead of memcpy().
Provider drivers keeps a copy ah_attr during the lifetime of the ah.
Therefore, use rdma_replace_ah_attr() which conditionally release
reference to old ah_attr and holds reference to new attribute whose
referrence is released when the AH is freed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
No reason to call rdma_ah_retrieve_grh, tidy whitespace, and add a
function comment block.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The sgid_attr will ultimately replace the sgid_index in the ah_attr.
This will allow for all layers to have a consistent view of what
gid table entry was selected as processing runs through all stages of the
stack.
This commit introduces the pointer and ensures it is set before calling
any driver callback that includes a struct ah_attr callback, allowing
future patches to adjust both the drivers and the callers to use
sgid_attr instead of sgid_index.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Since we are adding some new fields to this structure it is safest if all
users reliably initialize the struct to zero.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Document this (it's implicitly true due to sleeping operations already
in use in both registration and deregistration). Use this fact to use
spin_lock_irq instead of spin_lock_irqsave. This improves performance
slightly.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
struct rxe_global_route and struct ib_global_route are not the same thing
and should not be memcpy'd over each other, do a member by member copy
instead. This allows the layout of the in-kernel struct ib_global_route to
be changed without breaking rxe.
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit f43c00c04b ("i40iw: Extend port reuse support for listeners")
introduces a sparse warning:
include/linux/spinlock.h:365:9: sparse: context imbalance in
'i40iw_manage_apbvt' - unexpected unlock
Fix this by reorganizing the acquire/release of locks in
i40iw_manage_apbvt and add a new function i40iw_cqp_manage_abvpt_cmd
to perform the CQP command. Also, use __clear_bit and __test_and_set_bit
as we do not need atomic versions.
Fixes: f43c00c04b ("i40iw: Extend port reuse support for listeners")
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Simplify the flow_resources_alloc() function call by reducing
number of goto statements.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Port capability flag represents IBTA PortInfo:CapabilityMask,
but was mistakenly mixed with non-relevant fields. Return that
information for IB only.
Link: https://patchwork.kernel.org/patch/10386245/
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rxe_netdev_from_av can now be done by the core code directly from the
gid_attrs, no need for a helper in the driver.
ib_find_cached_gid_by_port can be switched to use the rdma version here as
well.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the gid_attr argument is NULL then the functions behave identically to
rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
can be consolidated to one function.
Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
ib_query_gid() API is removed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no reason to restrict this function to roce only these days,
allow the filter function to be called on any protocol.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These versions are functionally similar but all return gid_attrs and
related information via reference instead of via copy.
The old API is preserved, implemented as wrappers around the new, until
all callers can be converted.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These call sites have a use of ib_query_gid with a simple lifetime for the
struct gid_attr pointer, with an easy conversion.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch introduces three APIs, rdma_get_gid_attr(),
rdma_put_gid_attr(), and rdma_hold_gid_attr() which expose the reference
counting for GID table entries to the entire stack. The kref counting is
based on the struct ib_gid_attr pointer
Later patches will convert more cache query function to return struct
ib_gid_attrs.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that ib_gid_attr contains the GID, make use of that in the add_gid()
callback functions for the provider drivers to simplify the add_gid()
implementations.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to be able to expose pointers to the ib_gid_attrs in the GID
table we need to make it so the value of the pointer cannot be
changed. Thus each GID table entry gets a unique piece of kref'd memory
that is written only during initialization and remains constant for its
lifetime.
This eventually will allow the struct ib_gid_attrs to be returned without
copy from many of query the APIs, but it also provides a way to track when
all users of a HW table index go away.
For roce we no longer allow an in-use HW table index to be re-used for a
new an different entry. When a GID table entry needs to be removed it is
hidden from the find API, but remains as a valid HW index and all
ib_gid_attr points remain valid. The HW index is not relased until all
users put the kref.
Later patches will broadly replace the use of the sgid_index integer with
the kref'd structure.
Ultimately this will prevent security problems where the OS changes the
properties of a HW GID table entry while an active user object is still
using the entry.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There are at max one or two default GIDs for RoCE. Instead of storing
a default GID property for all the GIDs, store default GID indices as
individual bit per table.
This allows a future simplification to get rid of the GID property field.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When default GIDs are added, their gid type is set by
ib_cache_gid_set_default_gid(). There is no need to set the gid type of a
free GID entry during GID table initialization.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The flows were hidden from the C compiler; expose them as a zero-length
array to allow struct_size to work.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
driver updates and bug fixes. About four new uAPI related things. Not as much
Szykaller patches this time, the bugs it finds are getting harder to fix.
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
entirely. The user space side of this was long ago replaced with RDMA-CM and
syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
tree due to dependencies
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
8T6TXYi2yF+f+vZBTFlV
=Mb8m
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a quiet cycle for RDMA, the big bulk is the usual
smallish driver updates and bug fixes. About four new uAPI related
things. Not as much Szykaller patches this time, the bugs it finds are
getting harder to fix.
Summary:
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the
intent to remove it entirely. The user space side of this was long
ago replaced with RDMA-CM and syzkaller is finding bugs in the
residual UCM interface nobody wishes to fix because nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going
through the RDMA tree due to dependencies"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
RDMA/mlx5: Update SPDX tags to show proper license
RDMA/restrack: Change SPDX tag to properly reflect license
IB/hfi1: Fix comment on default hdr entry size
IB/hfi1: Rename exp_lock to exp_mutex
IB/hfi1: Add bypass register defines and replace blind constants
IB/hfi1: Remove unused variable
IB/hfi1: Ensure VL index is within bounds
IB/hfi1: Fix user context tail allocation for DMA_RTAIL
IB/hns: Use zeroing memory allocator instead of allocator/memset
infiniband: fix a possible use-after-free bug
iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
IB/isert: use T10-PI check mask definitions from core layer
IB/iser: use T10-PI check mask definitions from core layer
RDMA/core: introduce check masks for T10-PI offload
IB/isert: fix T10-pi check mask setting
IB/mlx5: Add counters read support
IB/mlx5: Add flow counters read support
IB/mlx5: Add flow counters binding support
IB/mlx5: Add counters create and destroy support
IB/uverbs: Add support for flow counters
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlsZdg0UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwJOBAAsuuWsOdiJRRhQLU5WfEMFgzcL02R
gsumqZkK7E8LOq0DPNMtcgv9O0KgYZyCiZyTMJ8N7sEYohg04lMz8mtYXOibjcwI
p+nVMko8jQXV9FXwSMGVqigEaLLcrbtkbf/mPriD63DDnRMa/+/Jh15SwfLTydIH
QRTJbIxkS3EiOauj5C8QY3UwzjlvV9mDilzM/x+MSK27k2HFU9Pw/3lIWHY716rr
grPZTwBTfIT+QFZjwOm6iKzHjxRM830sofXARkcH4CgSNaTeq5UbtvAs293MHvc+
v/v/1dfzUh00NxfZDWKHvTUMhjazeTeD9jEVS7T+HUcGzvwGxMSml6bBdznvwKCa
46ynePOd1VcEBlMYYS+P4njRYBLWeUwt6/TzqR4yVwb0keQ6Yj3Y9H2UpzscYiCl
O+0qz6RwyjKY0TpxfjoojgHn4U5ByI5fzVDJHbfr2MFTqqRNaabVrfl6xU4sVuhh
OluT5ym+/dOCTI/wjlolnKNb0XThVre8e2Busr3TRvuwTMKMIWqJ9sXLovntdbqE
furPD/UnuZHkjSFhQ1SQwYdWmsZI5qAq2C9haY8sEWsXEBEcBGLJ2BEleMxm8UsL
KXuy4ER+R4M+sFtCkoWf3D4NTOBUdPHi4jyk6Ooo1idOwXCsASVvUjUEG5YcQC6R
kpJ1VPTKK1XN64I=
=aFAi
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- unify AER decoding for native and ACPI CPER sources (Alexandru
Gagniuc)
- add TLP header info to AER tracepoint (Thomas Tai)
- add generic pcie_wait_for_link() interface (Oza Pawandeep)
- handle AER ERR_FATAL by removing and re-enumerating devices, as
Downstream Port Containment does (Oza Pawandeep)
- factor out common code between AER and DPC recovery (Oza Pawandeep)
- stop triggering DPC for ERR_NONFATAL errors (Oza Pawandeep)
- share ERR_FATAL recovery path between AER and DPC (Oza Pawandeep)
- disable ASPM L1.2 substate if we don't have LTR (Bjorn Helgaas)
- respect platform ownership of LTR (Bjorn Helgaas)
- clear interrupt status in top half to avoid interrupt storm (Oza
Pawandeep)
- neaten pci=earlydump output (Andy Shevchenko)
- avoid errors when extended config space inaccessible (Gilles Buloz)
- prevent sysfs disable of device while driver attached (Christoph
Hellwig)
- use core interface to report PCIe link properties in bnx2x, bnxt_en,
cxgb4, ixgbe (Bjorn Helgaas)
- remove unused pcie_get_minimum_link() (Bjorn Helgaas)
- fix use-before-set error in ibmphp (Dan Carpenter)
- fix pciehp timeouts caused by Command Completed errata (Bjorn
Helgaas)
- fix refcounting in pnv_php hotplug (Julia Lawall)
- clear pciehp Presence Detect and Data Link Layer Status Changed on
resume so we don't miss hotplug events (Mika Westerberg)
- only request pciehp control if we support it, so platform can use
ACPI hotplug otherwise (Mika Westerberg)
- convert SHPC to be builtin only (Mika Westerberg)
- request SHPC control via _OSC if we support it (Mika Westerberg)
- simplify SHPC handoff from firmware (Mika Westerberg)
- fix an SHPC quirk that mistakenly included *all* AMD bridges as well
as devices from any vendor with device ID 0x7458 (Bjorn Helgaas)
- assign a bus number even to non-native hotplug bridges to leave
space for acpiphp additions, to fix a common Thunderbolt xHCI
hot-add failure (Mika Westerberg)
- keep acpiphp from scanning native hotplug bridges, to fix common
Thunderbolt hot-add failures (Mika Westerberg)
- improve "partially hidden behind bridge" messages from core (Mika
Westerberg)
- add macros for PCIe Link Control 2 register (Frederick Lawler)
- replace IB/hfi1 custom macros with PCI core versions (Frederick
Lawler)
- remove dead microblaze and xtensa code (Bjorn Helgaas)
- use dev_printk() when possible in xtensa and mips (Bjorn Helgaas)
- remove unused pcie_port_acpi_setup() and portdrv_acpi.c (Bjorn
Helgaas)
- add managed interface to get PCI host bridge resources from OF (Jan
Kiszka)
- add support for unbinding generic PCI host controller (Jan Kiszka)
- fix memory leaks when unbinding generic PCI host controller (Jan
Kiszka)
- request legacy VGA framebuffer only for VGA devices to avoid false
device conflicts (Bjorn Helgaas)
- turn on PCI_COMMAND_IO & PCI_COMMAND_MEMORY in pci_enable_device()
like everybody else, not in pcibios_fixup_bus() (Bjorn Helgaas)
- add generic enable function for simple SR-IOV hardware (Alexander
Duyck)
- use generic SR-IOV enable for ena, nvme (Alexander Duyck)
- add ACS quirk for Intel 7th & 8th Gen mobile (Alex Williamson)
- add ACS quirk for Intel 300 series (Mika Westerberg)
- enable register clock for Armada 7K/8K (Gregory CLEMENT)
- reduce Keystone "link already up" log level (Fabio Estevam)
- move private DT functions to drivers/pci/ (Rob Herring)
- factor out dwc CONFIG_PCI Kconfig dependencies (Rob Herring)
- add DesignWare support to the endpoint test driver (Gustavo
Pimentel)
- add DesignWare support for endpoint mode (Gustavo Pimentel)
- use devm_ioremap_resource() instead of devm_ioremap() in dra7xx and
artpec6 (Gustavo Pimentel)
- fix Qualcomm bitwise NOT issue (Dan Carpenter)
- add Qualcomm runtime PM support (Srinivas Kandagatla)
- fix DesignWare enumeration below bridges (Koen Vandeputte)
- use usleep() instead of mdelay() in endpoint test (Jia-Ju Bai)
- add configfs entries for pci_epf_driver device IDs (Kishon Vijay
Abraham I)
- clean up pci_endpoint_test driver (Gustavo Pimentel)
- update Layerscape maintainer email addresses (Minghuan Lian)
- add COMPILE_TEST to improve build test coverage (Rob Herring)
- fix Hyper-V bus registration failure caused by domain/serial number
confusion (Sridhar Pitchai)
- improve Hyper-V refcounting and coding style (Stephen Hemminger)
- avoid potential Hyper-V hang waiting for a response that will never
come (Dexuan Cui)
- implement Mediatek chained IRQ handling (Honghui Zhang)
- fix vendor ID & class type for Mediatek MT7622 (Honghui Zhang)
- add Mobiveil PCIe host controller driver (Subrahmanya Lingappa)
- add Mobiveil MSI support (Subrahmanya Lingappa)
- clean up clocks, MSI, IRQ mappings in R-Car probe failure paths
(Marek Vasut)
- poll more frequently (5us vs 5ms) while waiting for R-Car data link
active (Marek Vasut)
- use generic OF parsing interface in R-Car (Vladimir Zapolskiy)
- add R-Car V3H (R8A77980) "compatible" string (Sergei Shtylyov)
- add R-Car gen3 PHY support (Sergei Shtylyov)
- improve R-Car PHYRDY polling (Sergei Shtylyov)
- clean up R-Car macros (Marek Vasut)
- use runtime PM for R-Car controller clock (Dien Pham)
- update arm64 defconfig for Rockchip (Shawn Lin)
- refactor Rockchip code to facilitate both root port and endpoint
mode (Shawn Lin)
- add Rockchip endpoint mode driver (Shawn Lin)
- support VMD "membar shadow" feature (Jon Derrick)
- support VMD bus number offsets (Jon Derrick)
- add VMD "no AER source ID" quirk for more device IDs (Jon Derrick)
- remove unnecessary host controller CONFIG_PCIEPORTBUS Kconfig
selections (Bjorn Helgaas)
- clean up quirks.c organization and whitespace (Bjorn Helgaas)
* tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (144 commits)
PCI/AER: Replace struct pcie_device with pci_dev
PCI/AER: Remove unused parameters
PCI: qcom: Include gpio/consumer.h
PCI: Improve "partially hidden behind bridge" log message
PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() doc
PCI: Move resource distribution for single bridge outside loop
PCI: Account for all bridges on bus when distributing bus numbers
ACPI / hotplug / PCI: Drop unnecessary parentheses
ACPI / hotplug / PCI: Mark stale PCI devices disconnected
ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug
PCI: hotplug: Add hotplug_is_native()
PCI: shpchp: Add shpchp_is_native()
PCI: shpchp: Fix AMD POGO identification
PCI: mobiveil: Add MSI support
PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver
PCI/AER: Decode Error Source Requester ID
PCI/AER: Remove aer_recover_work_func() forward declaration
PCI/DPC: Use the generic pcie_do_fatal_recovery() path
PCI/AER: Pass service type to pcie_do_fatal_recovery()
PCI/DPC: Disable ERR_NONFATAL handling by DPC
...
Pull networking updates from David Miller:
1) Add Maglev hashing scheduler to IPVS, from Inju Song.
2) Lots of new TC subsystem tests from Roman Mashak.
3) Add TCP zero copy receive and fix delayed acks and autotuning with
SO_RCVLOWAT, from Eric Dumazet.
4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
Brouer.
5) Add ttl inherit support to vxlan, from Hangbin Liu.
6) Properly separate ipv6 routes into their logically independant
components. fib6_info for the routing table, and fib6_nh for sets of
nexthops, which thus can be shared. From David Ahern.
7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
messages from XDP programs. From Nikita V. Shirokov.
8) Lots of long overdue cleanups to the r8169 driver, from Heiner
Kallweit.
9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.
10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.
11) Plumb extack down into fib_rules, from Roopa Prabhu.
12) Add Flower classifier offload support to igb, from Vinicius Costa
Gomes.
13) Add UDP GSO support, from Willem de Bruijn.
14) Add documentation for eBPF helpers, from Quentin Monnet.
15) Add TLS tx offload to mlx5, from Ilya Lesokhin.
16) Allow applications to be given the number of bytes available to read
on a socket via a control message returned from recvmsg(), from
Soheil Hassas Yeganeh.
17) Add x86_32 eBPF JIT compiler, from Wang YanQing.
18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
From Björn Töpel.
19) Remove indirect load support from all of the BPF JITs and handle
these operations in the verifier by translating them into native BPF
instead. From Daniel Borkmann.
20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.
21) Allow XDP programs to do lookups in the main kernel routing tables
for forwarding. From David Ahern.
22) Allow drivers to store hardware state into an ELF section of kernel
dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.
23) Various RACK and loss detection improvements in TCP, from Yuchung
Cheng.
24) Add TCP SACK compression, from Eric Dumazet.
25) Add User Mode Helper support and basic bpfilter infrastructure, from
Alexei Starovoitov.
26) Support ports and protocol values in RTM_GETROUTE, from Roopa
Prabhu.
27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
Brouer.
28) Add lots of forwarding selftests, from Petr Machata.
29) Add generic network device failover driver, from Sridhar Samudrala.
* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
strparser: Add __strp_unpause and use it in ktls.
rxrpc: Fix terminal retransmission connection ID to include the channel
net: hns3: Optimize PF CMDQ interrupt switching process
net: hns3: Fix for VF mailbox receiving unknown message
net: hns3: Fix for VF mailbox cannot receiving PF response
bnx2x: use the right constant
Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
enic: fix UDP rss bits
netdev-FAQ: clarify DaveM's position for stable backports
rtnetlink: validate attributes in do_setlink()
mlxsw: Add extack messages for port_{un, }split failures
netdevsim: Add extack error message for devlink reload
devlink: Add extack to reload and port_{un, }split operations
net: metrics: add proper netlink validation
ipmr: fix error path when ipmr_new_table fails
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
net: hns3: remove unused hclgevf_cfg_func_mta_filter
netfilter: provide udp*_lib_lookup for nf_tproxy
qed*: Utilize FW 8.37.2.0
...
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)
-----BEGIN PGP SIGNATURE-----
Comment: Kees Cook <kees@outflux.net>
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
KCDVLYPxwQQqK1Mqig==
=/3L8
-----END PGP SIGNATURE-----
Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"This adds the new overflow checking helpers and adds them to the
2-factor argument allocators. And this adds the saturating size
helpers and does a treewide replacement for the struct_size() usage.
Additionally this adds the overflow testing modules to make sure
everything works.
I'm still working on the treewide replacements for allocators with
"simple" multiplied arguments:
*alloc(a * b, ...) -> *alloc_array(a, b, ...)
and
*zalloc(a * b, ...) -> *calloc(a, b, ...)
as well as the more complex cases, but that's separable from this
portion of the series. I expect to have the rest sent before -rc1
closes; there are a lot of messy cases to clean up.
Summary:
- Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
- Introduce overflow test module (Rasmus, Kees)
- Introduce saturating size helper functions (Matthew, Kees)
- Treewide use of struct_size() for allocators (Kees)"
* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use struct_size() for devm_kmalloc() and friends
treewide: Use struct_size() for vmalloc()-family
treewide: Use struct_size() for kmalloc()-family
device: Use overflow helpers for devm_kmalloc()
mm: Use overflow helpers in kvmalloc()
mm: Use overflow helpers in kmalloc_array*()
test_overflow: Add memory allocation overflow tests
overflow.h: Add allocation size calculation helpers
test_overflow: Report test failures
test_overflow: macrofy some more, do more tests for free
lib: add runtime test of check_*_overflow functions
compiler.h: enable builtin overflow checkers and add fallback code
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:
// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
// sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@
- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@
- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
Signed-off-by: Kees Cook <keescook@chromium.org>
Mellanox code is supposed to be OpenIB compliant code,
so let's update SPDX tags to show it.
Fixes: fc385b7ac4 ("IB/mlx5: Add basic regiser/unregister representors code")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Resource tracking is supposed to be dual licensed: GPL-2.0 and
OpenIB, but the SPDX tag was not compliant to it. Update the tag to
properly reflect license.
Fixes: 02d8883f52 ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This FW contains several fixes and features.
RDMA
- Several modifications and fixes for Memory Windows
- drop vlan and tcp timestamp from mss calculation in driver for
this FW
- Fix SQ completion flow when local ack timeout is infinite
- Modifications in t10dif support
ETH
- Fix aRFS for tunneled traffic without inner IP.
- Fix chip configuration which may fail under heavy traffic conditions.
- Support receiving any-VNI in VXLAN and GENEVE RX classification.
iSCSI / FcoE
- Fix iSCSI recovery flow
- Drop vlan and tcp timestamp from mss calc for fw 8.37.2.0
Misc
- Several registers (split registers) won't read correctly with
ethtool -d
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The comment for the default header queue entry size is incorrect.
Correct the comment and fix the resulting S_IRUGO warning that shows
up in the widened patch context.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The mutex exp_lock in struct hfi1_ctxtdata is used to protect all
Expected TID data of a user context. This patch renames it to exp_mutex
to better reflect its identity and prepare for upcoming patches.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These registers were not added in the 16B work.
Add them and replace blind constants with the correct defines.
Fixes: 72c07e2b67 ("IB/hfi1: Add support to receive 16B bypass packets")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>