* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/ehca: Reject dynamic memory add/remove when ehca adapter is present
IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter
IPoIB: Set netdev offload features properly for child (VLAN) interfaces
IPoIB: Clean up ethtool support
mlx4_core: Add Ethernet PCI device IDs
mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC
mlx4_core: Multiple port type support
mlx4_core: Ethernet MAC/VLAN management
mlx4_core: Get ethernet MTU and default address from firmware
mlx4_core: Support multiple pre-reserved QP regions
Update NetEffect maintainer emails to Intel emails
RDMA/cxgb3: Remove cmid reference on tid allocation failures
IB/mad: Use krealloc() to resize snoop table
IPoIB: Always initialize poll_timer to avoid crash on unload
IB/ehca: Don't allow creating UC QP with SRQ
mlx4_core: Add QP range reservation support
RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
The Mellanox ConnectX can operate as an InfiniBand adapter, as an
Ethernet NIC, or as a Fibre Channel (FC) HBA. The kernel has a
low-level driver, mlx4_core, which handles multiplexing access to the
device, and there is also already an InfiniBad driver, mlx4_ib.
This patch adds a new driver, mlx4_en, which implements a standard
Ethernet NIC driver.
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Multi-protocol adapters support different port types. Each consumer
of mlx4_core queries for supported port types; in particular mlx4_ib
can no longer assume that all physical ports belong to it. Port type
is configured through a sysfs interface. When the type of a port is
changed, all mlx4 interfaces are unregistered, and then registered
again with the new port types.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for managing MAC and VLAN filters for each port.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Get maximum ethernet MTU and default MAC address from the firmware
QUERY_DEV_CAP command.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For ethernet support, we need to reserve QPs for the ethernet and
fibre channel driver. The QPs are reserved at the end of the QP
table. (This way we assure that they are aligned to their size)
We need to consider these reserved ranges in bitmap creation, so we
extend the mlx4 bitmap utility functions to allow reserved ranges at
both the bottom and the top of the range.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
To allow allocating an aligned range of consecutive QP numbers, add an
interface to reserve an aligned range of QP numbers and have the QP
allocation function always take a QP number.
This will be used for RSS support in the mlx4_en Ethernet driver and
also potentially by IPoIB RSS support.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
sparc32 allmodconfig with linux-next:
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
drivers/net/mlx4/alloc.c:164: error: 'PAGE_KERNEL' undeclared (first use in this function)
drivers/net/mlx4/alloc.c:164: error: (Each undeclared identifier is reported only once
drivers/net/mlx4/alloc.c:164: error: for each function it appears in.)
this is due to some header shuffle in linux-next. I didn't look to see what
it was. I'd sugges that this patch be merged ahead of a linux-next merge to
avoid bisection breaks.
We strictly only need asm/pgtable.h, but going direct to asm includes always
seems grubby.
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Byte swap the addresses in the page list for fast register work requests
to big endian to match what the HCA expectx. Also, the addresses must
have the "present" bit set so that the HCA knows it can access them.
Otherwise the HCA will fault the first time it accesses the memory
region.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set the RAE (remote access enable) bit and correctly initialize the
MTT size in MPT entries being set up for fast register memory
regions. Otherwise the callers can't enable remote access and in fact
can't fast register at all (since the HCA will think no MTT entries
are allocated).
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4: Update/add Mellanox Technologies copyright lines to mlx4 driver files
mlx4_core: Add VLAN tag field to WQE control segment struct
RDMA/nes: CM connection setup/teardown rework
IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG
IPoIB/cm: Connected mode is no longer EXPERIMENTAL
RDMA/ucm: BKL is not needed for ib_ucm_open()
RDMA/ucma: BKL is not needed for ucma_open()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update existing Mellanox copyright lines to 2008, and add such lines
to files where they are missing.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
MAINTAINERS: Remove Glenn Streiff from NetEffect entry
mlx4_core: Improve error message when not enough UAR pages are available
IB/mlx4: Add support for memory management extensions and local DMA L_Key
IB/mthca: Keep free count for MTT buddy allocator
mlx4_core: Keep free count for MTT buddy allocator
mlx4_code: Add missing FW status return code
IB/mlx4: Rename struct mlx4_lso_seg to mlx4_wqe_lso_seg
mlx4_core: Add module parameter to enable QoS support
RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes
IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures
IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one()
IB/ehca: Release mutex in error path of alloc_small_queue_page()
IB/ehca: Use default value for Local CA ACK Delay if FW returns 0
IB/ehca: Filter PATH_MIG events if QP was never armed
IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event
RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event
RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an mlx4 device with default FW (which gives a UAR BAR size of 8 MB)
is used in a system with 64 KB pages, then there are only 8192/64==128
UAR pages available. However, the first 128 UAR pages are reserved
for use with event queue doorbells, so no UAR pages are available to
do anything else with, which means that the driver cannot work.
The current driver fails with a fairly cryptic "Failed to allocate
driver access region, aborting" message in this situation. Fix the
driver to detect the problem earlier and print out a clearer
description of the problem and a suggestion of how to fix it (use a
new firmware image).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for the following operations to mlx4 when device firmware
supports them:
- Send with invalidate and local invalidate send queue work requests;
- Allocate/free fast register MRs;
- Allocate/free fast register MR page lists;
- Fast register MR send queue work requests;
- Local DMA L_Key.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
MTT entries are allocated with a buddy allocator, which just keeps
bitmaps for each level of the buddy table. However, all free space
starts out at the highest order, and small allocations start scanning
from the lowest order. When the lowest order tables have no free
space, this can lead to scanning potentially millions of bits before
finding a free entry at a higher order.
We can avoid this by just keeping a count of how many free entries
each order has, and skipping the bitmap scan when an order is
completely empty. This provides a nice performance boost for a
negligible increase in memory usage.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add ICM_ERROR firmware status code. In mapping to errnos, -ENFILE
seems closest.
This is in preparation for providing more detailed log info using
mlx4_err() in low-level driver when a non-zero status is returned.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add a module parameter "enable_qos" to mlx4_core. If this param is
set, enable support for QoS in the INIT_HCA command. By default, the
parameter is set to 0 (disabled).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There was a bug in some versions of the mlx4 driver in
mlx4_alloc_fmr(), which hardcoded the minimum acceptable page_shift to
be 12. However, new ConnectX firmware can support a minimum
page_shift of 9 (log_pg_sz of 9 returned by QUERY_DEV_LIM) -- so with
old drivers, ib_fmr_alloc() would fail for ULPs using the device
minimum when creating FMRs.
To preserve firmware compatibility with released mlx4 drivers, the
firmware will continue to return 12 as before for log_page_sz in
QUERY_DEV_CAP for these drivers. However, to enable new drivers to
take advantage of the available smaller page size, the mlx4 driver now
first sets the log_pg_sz to the device minimum by setting a
log_page_sz value to 0 via the MOD_STAT_CFG command and then reading
the real minimum via QUERY_DEV_CAP.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
flag by using the per-multicast group loopback blocking feature of
mlx4 hardware.
Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Don't hard code a test against a minimum page shift of 12, since the
device may support smaller pages. Test against the actual smallest
page size from the device capabilities.
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a FMR is unmapped, mlx4 resets the map count to 0, and clears the
upper part of the R_Key which is used as the sequence counter.
This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation. RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance,
Oracle processes uses shared memory buffers allocated from a pool of
buffers. When a process dies, we want to reclaim these buffers -- but
we must make sure there are no pending RDMA operations to/from those
buffers. The only way to achieve that is by using unmap and sync the
TPT.
However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.
To prevent this, don't reset the sequence count when unmapping a FMR.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag
for the CQ being created.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Avoid duplicating code in ethernet and FC modules.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Wrap doorbell, buffer and MTT allocation in helper functions for
ethernet and FC modules to use.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The call to mlx4_MODIFY_CQ() had a typo so that mlx4_cq_resize() was
actually asking the FW to modify a CQ's interrupt moderation rather than
asking it to resize a CQ.
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In addition to mlx4_ib, there will be ethernet and FC consumers of
mlx4_core, so move the code for managing kernel doorbells into the
core module to avoid having to duplicate this multiple times.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4 hardware does not support external DDR memory. Moreover, UAR
area (BAR 2) can change depending on FW version.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When detaching the last QP from an MCG entry, we need to make
sure that at any time, there will be no entry with zero number of
QPs which is linked to the list of the MCGs of the corresponding
hash index. So don't write back the MCG entry if we are removing the
last QP; just unlink the entry.
Also, remove an unnecessary MCG read when attaching a QP requires
allocation of a new entry in the AMGM.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
With the advent large clusters which utilize multicore hosts, 64K QPs
is not enough. We should increase the default maximum for QPs to 128K.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ConnectX devices support checksum generation and verification of TCP
and UDP packets for UD IPoIB messages. This patch checks if the HCA
supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it
does. It implements support for handling the IB_SEND_IP_CSUM send
flag and setting the csum_ok field in receive work completions.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Ali Ayub <ali@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The struct mlx4_interface.event() method was supposed to get an enum
mlx4_dev_event, but the driver code was actually passing in the
hardware enum mlx4_event values. Fix up the callers of
mlx4_dispatch_event() so that they pass in the right type of value,
and fix up the event method in mlx4_ib so that it can handle the enum
mlx4_dev_event values.
This eliminates the need for the subtype parameter to the event
method, so remove it.
This also fixes the sparse warning
drivers/net/mlx4/intf.c:127:48: warning: mixing different enum types
drivers/net/mlx4/intf.c:127:48: int enum mlx4_event versus
drivers/net/mlx4/intf.c:127:48: int enum mlx4_dev_event
Signed-off-by: Roland Dreier <rolandd@cisco.com>
mlx4_table_find (for FMR MPTs) requires that ICM memory already be
mapped. Before this fix, FMR allocation depended on ICM memory
already being mapped for the MPT entry. If all currently mapped
entries are taken, the find operation fails (even if the MPT ICM table
still had more entries, which were just not mapped yet).
This fix moves the mpt find operation to fmr_enable, to guarantee that
any required ICM memory mapping has already occurred.
Found by Oren Duer of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 313abe55 ("mlx4_core: For 64-bit systems, vmap() kernel queue
buffers") caused this to pop up on powerpc allyesconfig, looks like a
missing include file:
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_alloc':
drivers/net/mlx4/alloc.c:162: error: implicit declaration of function 'vmap'
drivers/net/mlx4/alloc.c:162: error: 'VM_MAP' undeclared (first use in this function)
drivers/net/mlx4/alloc.c:162: error: (Each undeclared identifier is reported only once
drivers/net/mlx4/alloc.c:162: error: for each function it appears in.)
drivers/net/mlx4/alloc.c:162: warning: assignment makes pointer from integer without a cast
drivers/net/mlx4/alloc.c: In function 'mlx4_buf_free':
drivers/net/mlx4/alloc.c:187: error: implicit declaration of function 'vunmap'
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Now that struct mlx4_buf.u is a struct instead of a union because of
the vmap() changes, there's no point in having a struct at all. So
move .direct and .page_list directly into struct mlx4_buf and get rid
of a bunch of unnecessary ".u"s.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Since kernel virtual memory is not a problem on 64-bit systems, there
is no reason to use our own 2-layer page mapping scheme for large
kernel queue buffers on such systems. Instead, map the page list to a
single virtually contiguous buffer with vmap(), so that can we access
buffer memory via direct indexing.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The firmware QUERY_ADAPTER command does not return vendor_id,
device_id, and revision_id; eliminate these fields from the query.
Initialize the rev_id field of the mlx4 device via init_node_data (MAD
IFC query), as is done in the query_device verb implementation.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit 3d73c288 ("mlx4_core: Fix section mismatches") fixed some of
the section mismatches introduced when error recovery was added, but
there were still more cases of errory recovery code calling into
__devinit code from regular .text. Fix this by getting rid of the
now-incorrect __devinit annotations.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
log_max_eqs is a 4-bit field, not a 3-bit field in the response to the
QUERY_DEV_CAP FW command, so we should mask with 0xf instead of 0x7
when reading it.
Found by Yossi Leybovitch of Mellanox.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>