Commit Graph

2749 Commits

Author SHA1 Message Date
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Linus Torvalds
ece236ce2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/qib: Defer HCA error events to tasklet
  mlx4_core: Bump the driver version to 1.0
  RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
  IB/mlx4: Support PMA counters for IBoE
  IB/mlx4: Use flow counters on IBoE ports
  IB/pma: Add include file for IBA performance counters definitions
  mlx4_core: Add network flow counters
  mlx4_core: Fix location of counter index in QP context struct
  mlx4_core: Read extended capabilities into the flags field
  mlx4_core: Extend capability flags to 64 bits
  IB/mlx4: Generate GID change events in IBoE code
  IB/core: Add GID change event
  RDMA/cma: Don't allow IPoIB port space for IBoE
  RDMA: Allow for NULL .modify_device() and .modify_port() methods
  IB/qib: Update active link width
  IB/qib: Fix potential deadlock with link down interrupt
  IB/qib: Add sysfs interface to read free contexts
  IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
  IB/qib: Remove double define
  IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
  ...
2011-07-22 14:50:12 -07:00
Roland Dreier
4460207561 Merge branches 'cma', 'cxgb4', 'ipath', 'misc', 'mlx4', 'mthca', 'qib' and 'srp' into for-next 2011-07-22 11:56:11 -07:00
Mike Marciniszyn
e67306a380 IB/qib: Defer HCA error events to tasklet
With ib_qib options:

    options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

a run of ib_write_bw -a yields the following:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2910.64            229.80
    ------------------------------------------------------------------

The top cpu use in a profile is:

    CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
    of 0x00 (No unit mask) count 1002300
    Counted LLC_MISSES events (Last level cache demand requests from this core that
    missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
    samples  %        samples  %        app name                 symbol name
    15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
    12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
    4106      7.8860  0              0  vmlinux                  vsnprintf


Analysis of the stats, profile, the code, and the annotated profile indicate:
 - All of the overflow interrupts (one per packet overflow) are
   serviced on CPU0 with no mitigation on the frequency.
 - All of the receive interrupts are being serviced by CPU0.  (That is
   the way truescale.cmds statically allocates the kctx IRQs to CPU)
 - The code is spending all of its time servicing QIB_I_C_ERROR
   RcvEgrFullErr interrupts on CPU0, starving the packet receive
   processing.
 - The decode_err routine is very inefficient, using a printf variant
   to format a "%s" and continues to loop when the errs mask has been
   cleared.
 - Both qib_7322intr and handle_7322_errors read pci registers, which
   is very inefficient.

The fix does the following:
 - Adds a tasklet to service QIB_I_C_ERROR
 - Replaces the very inefficient scnprintf() with a memcpy().  A field
   is added to qib_hwerror_msgs to save the sizeof("string") at
   compile time so that a strlen is not needed during err_decode().
 - The most frequent errors (Overflows) are serviced first to exit the
   loop as early as possible.
 - The loop now exits as soon as the errs mask is clear rather than
   fruitlessly looping through the msp array.

With this fix the performance changes to:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2990.64            2941.35
    ------------------------------------------------------------------

During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.

This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-22 11:56:05 -07:00
Jiri Pirko
7033c4ad87 nes: do vlan cleanup
- unify vlan and nonvlan rx path
- kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-21 13:47:53 -07:00
Manuel Zerpies
3cbe182a5b RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited().

Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:18:39 -07:00
Or Gerlitz
c37791349c IB/mlx4: Support PMA counters for IBoE
Use the per port counter attached to all QPs created on that port to
implement port level packets/bytes performance counters a la IB.
Derived from a patch by Eli Cohen <eli@mellanox.co.il>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:36 -07:00
Or Gerlitz
cfcde11c3d IB/mlx4: Use flow counters on IBoE ports
Allocate flow counter per Ethernet/IBoE port, and attach this counter
to all the QPs created on that port.  Based on patch by Eli Cohen
<eli@mellanox.co.il>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:36 -07:00
Or Gerlitz
6aea213a62 IB/pma: Add include file for IBA performance counters definitions
Move the various definitions and mad structures needed for software
implementation of IBA PM agent from the ipath and qib drivers into a
single include file, which in turn could be used by more consumers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:35 -07:00
Or Gerlitz
6451c712fe IB/mlx4: Generate GID change events in IBoE code
IBoE doesn't use LIDs.  Use the GID change event to update the IB core
cache for addition/deletion of GIDs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:31 -07:00
Or Gerlitz
761d90ed4c IB/core: Add GID change event
Add IB GID change event type.  This is needed for IBoE when the HW
driver updates the GID (e.g when new VLANs are added/deleted) table
and the change should be reflected to the IB core cache.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 21:04:30 -07:00
Moni Shoua
2efdd6a038 RDMA/cma: Don't allow IPoIB port space for IBoE
This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available.  Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:50:25 -07:00
Bart Van Assche
10e1b54bbb RDMA: Allow for NULL .modify_device() and .modify_port() methods
These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 16:44:30 -07:00
Mitko Haralanov
e800bd032c IB/qib: Update active link width
Update the active link width on QLE7220 chips when link goes down if
chip width does not match shadowed width.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:26 -07:00
Ram Vepa
4356d0b64b IB/qib: Fix potential deadlock with link down interrupt
There is a possibility of a deadlock due to the way locks are
acquired and released in qib_set_uevent_bits(). The function
qib_set_uevent_bits() is called in process context and it uses
spin_lock() and spin_unlock().  This same lock is acquired/released
in interrupt context which can lead to a deadlock when running on
the same cpu.

The fix is to replace spin_lock() and spin_unlock() with
spin_lock_irqsave() and spin_unlock_irqrestore() respectively in
qib_set_uevent_bits().

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:23 -07:00
Ram Vepa
2df4f7579d IB/qib: Add sysfs interface to read free contexts
Indicate the number of free user contexts via the sysfs file
/sys/class/infiniband/qib0/nfreectxts as required for PSM.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:09:19 -07:00
Jon Mason
9b89925c0d IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.  Also, pci_is_pcie is a
better way of determining if the device is PCIE or not (as it uses the
same saved PCIE capability offset).

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 12:01:22 -07:00
Edwin van Vliet
ac0cae4495 IB/qib: Remove double define
Signed-off-by: Edwin van Vliet <edwin@cheatah.nl>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:59:23 -07:00
Jon Mason
7f27cda037 IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:57:52 -07:00
Motohiro KOSAKI
5763181172 IB/ipath: Convert old cpumask api into new one
Adapt to new api.  We plan to remove old one later.  Almost all
changes are trivial, but there is one real fix: the following code is
unsafe:

	int ncpus = num_online_cpus()
	for (i = 0; i < ncpus; i++) {
		..
	}

because 1) we don't guarantee last bit of online cpus is equal to
num_online_cpus(). some arch assign sparse cpu number.  2) cpu
hotplugging may change cpu_online_mask at same time.  we need to pin
it by get_online_cpus().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:56:18 -07:00
Motohiro KOSAKI
0cd85e6738 IB/qib: Convert old cpumask api into new one
Adapt to use new APIs.  We plan to remove old one later and plan to
change current->cpus_allowed implementation.

No functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 11:56:03 -07:00
Jack Morgenstein
0c9361fcde RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object
Avoid assigning an IS_ERR value to the cm_id pointer.  This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-18 09:44:55 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
Goldwyn Rodrigues
cdb73db0b6 IB/mthca: Stop returning separate error and status from FW commands
Instead of having firmware command functions return an error and also
a status, leading to code like:

	err = mthca_FW_COMMAND(..., &status);
	if (err)
		goto out;
        if (status) {
		err = -E...;
		goto out;
	}

all over the place, just handle the FW status inside the FW command
handling code (the way mlx4 does it), so we can simply write:

	err = mthca_FW_COMMAND(...);
	if (err)
		goto out;

In addition to simplifying the source code, this also saves a healthy
chunk of text:

    add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
    function                                     old     new   delta
    static.trans_table                           324     584    +260
    mthca_cmd_poll                               352     477    +125
    mthca_cmd_wait                               511     567     +56
    mthca_table_put                              213     240     +27
    mthca_cleanup_db_tab                         372     387     +15
    __mthca_remove_one                           314     323      +9
    mthca_cleanup_user_db_tab                    275     283      +8
    __mthca_init_one                            1738    1746      +8
    mthca_cleanup                                 20      21      +1
    mthca_MAD_IFC                               1081    1082      +1
    mthca_MGID_HASH                               43      40      -3
    mthca_MAP_ICM_AUX                             23      20      -3
    mthca_MAP_ICM                                 19      16      -3
    mthca_MAP_FA                                  23      20      -3
    mthca_READ_MGM                                43      38      -5
    mthca_QUERY_SRQ                               43      38      -5
    mthca_QUERY_QP                                59      54      -5
    mthca_HW2SW_SRQ                               43      38      -5
    mthca_HW2SW_MPT                               60      55      -5
    mthca_HW2SW_EQ                                43      38      -5
    mthca_HW2SW_CQ                                43      38      -5
    mthca_free_icm_table                         120     114      -6
    mthca_query_srq                              214     206      -8
    mthca_free_qp                                662     654      -8
    mthca_cmd                                     38      28     -10
    mthca_alloc_db                              1321    1311     -10
    mthca_setup_hca                             1067    1055     -12
    mthca_WRITE_MTT                               35      22     -13
    mthca_WRITE_MGM                               40      27     -13
    mthca_UNMAP_ICM_AUX                           36      23     -13
    mthca_UNMAP_FA                                36      23     -13
    mthca_SYS_DIS                                 36      23     -13
    mthca_SYNC_TPT                                36      23     -13
    mthca_SW2HW_SRQ                               35      22     -13
    mthca_SW2HW_MPT                               35      22     -13
    mthca_SW2HW_EQ                                35      22     -13
    mthca_SW2HW_CQ                                35      22     -13
    mthca_RUN_FW                                  36      23     -13
    mthca_DISABLE_LAM                             36      23     -13
    mthca_CLOSE_IB                                36      23     -13
    mthca_CLOSE_HCA                               38      25     -13
    mthca_ARM_SRQ                                 39      26     -13
    mthca_free_icms                              178     164     -14
    mthca_QUERY_DDR                              389     375     -14
    mthca_resize_cq                             1063    1048     -15
    mthca_unmap_eq_icm                           123     107     -16
    mthca_map_eq_icm                             396     380     -16
    mthca_cmd_box                                 90      74     -16
    mthca_SET_IB                                 433     417     -16
    mthca_RESIZE_CQ                              369     353     -16
    mthca_MAP_ICM_page                           240     224     -16
    mthca_MAP_EQ                                 183     167     -16
    mthca_INIT_IB                                473     457     -16
    mthca_INIT_HCA                               745     729     -16
    mthca_map_user_db                            816     798     -18
    mthca_SYS_EN                                 157     139     -18
    mthca_cleanup_qp_table                        78      59     -19
    mthca_cleanup_eq_table                       168     149     -19
    mthca_UNMAP_ICM                              143     121     -22
    mthca_modify_srq                             172     149     -23
    mthca_unmap_fmr                              198     174     -24
    mthca_query_qp                               814     790     -24
    mthca_query_pkey                             343     319     -24
    mthca_SET_ICM_SIZE                            34      10     -24
    mthca_QUERY_DEV_LIM                         1870    1846     -24
    mthca_map_cmd                               1130    1105     -25
    mthca_ENABLE_LAM                             401     375     -26
    mthca_modify_port                            247     220     -27
    mthca_query_device                           884     850     -34
    mthca_NOP                                     75      41     -34
    mthca_table_get                              287     249     -38
    mthca_init_qp_table                          333     293     -40
    mthca_MODIFY_QP                              348     308     -40
    mthca_close_hca                              131      89     -42
    mthca_free_eq                                435     390     -45
    mthca_query_port                             755     705     -50
    mthca_free_cq                                581     528     -53
    mthca_alloc_icm_table                        578     524     -54
    mthca_multicast_attach                      1041     986     -55
    mthca_init_hca                               326     271     -55
    mthca_query_gid                              487     431     -56
    mthca_free_srq                               524     468     -56
    mthca_free_mr                                168     111     -57
    mthca_create_eq                             1560    1501     -59
    mthca_multicast_detach                       790     728     -62
    mthca_write_mtt                              918     854     -64
    mthca_register_device                       1406    1342     -64
    mthca_fmr_alloc                              947     883     -64
    mthca_mr_alloc                               652     582     -70
    mthca_process_mad                           1242    1164     -78
    mthca_dev_lim                                910     830     -80
    find_mgm                                     482     400     -82
    mthca_modify_qp                             3852    3753     -99
    mthca_init_cq                               1281    1181    -100
    mthca_alloc_srq                             1719    1610    -109
    mthca_init_eq_table                         1807    1679    -128
    mthca_init_tavor                             761     491    -270
    mthca_init_arbel                            2617    2098    -519

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
2011-07-15 13:33:20 -07:00
David S. Miller
6a7ebdf2fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/bluetooth/l2cap_core.c
2011-07-14 07:56:40 -07:00
Bart Van Assche
fd1b6c4a69 IB/srp: Avoid duplicate devices from LUN scan
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):

- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
  and process the result.

- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
  corresponding to the specified channel🆔lun triplet to verify
  whether the LUN exists.

So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.

Currently the ib_srp driver does neither.  As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-13 09:19:16 -07:00
David S. Miller
e12fe68ce3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-07-05 23:23:37 -07:00
Goldwyn Rodrigues
b2bc478219 RDMA: Check for NULL mode in .devnode methods
Commits 71c29bd5c2 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980ce ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.

However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.

Add the missing checks.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c.  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-04 15:53:28 -07:00
Roland Dreier
c7d74b0909 Merge branches 'cxgb4' and 'qib' into for-next 2011-06-17 11:57:55 -07:00
Mitko Haralanov
3126448451 IB/qib: Ensure that LOS and DFE are being turned off
Due to timing, it is possible for the LOS and DFE to remain on. This
is due to the link progressing to LinkUP prior to the driver getting
the first Status Changed interrupt.  By expanding the conditions under
which LOS is turned off and DFE timeout is being set, timing is no
longer an issue.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:56:59 -07:00
Steve Wise
8da7e7a552 RDMA/cxgb4: Couple of abort fixes
- fix a race where the driver could end up sending a close_con_req
  after an abort_rpl.  In c4iw_ep_disconnect(), send abort or close
  request with the ep mutex held.

- fix a hang where driver fails to wake up when a connection is reset
  during a normal close.  Wake up any waiters in the interrupt path,
  and correctly cleanup after rdma_fini() failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:54:56 -07:00
Steve Wise
301c2c3f03 RDMA/cxgb4: Don't truncate MR lengths
Remove left-over code from T3 that limited MR sizes to 32b.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:54:50 -07:00
Steve Wise
2ff7d09a1b RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs
Memory allocated for user CQs gets rounded up to the next page
boundary.  And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.

This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-06-17 11:52:45 -07:00
Greg Rose
c7ac8679be rtnetlink: Compute and store minimum ifinfo dump size
The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-06-09 20:38:07 -07:00
Alexey Dobriyan
a6b7a40786 net: remove interrupt.h inclusion from netdevice.h
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 22:55:11 -07:00
Linus Torvalds
4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Roland Dreier
8dc4abdf4c Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next 2011-05-25 13:47:20 -07:00
Nir Muchtar
83e9502d8d RDMA/cma: Save PID of ID's owner
Save the PID associated with an RDMA CM ID for reporting via netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar
753f618ae0 RDMA/cma: Add support for netlink statistics export
Add callbacks and data types for statistics export of all current
devices/ids.  The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct.  Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
       __u32 qp_num;
       __u32 bound_dev_if;
       __u32 port_space;
       __s32 pid;
       __u8 cm_state;
       __u8 node_type;
       __u8 port_num;
       __u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Sean Hefty
b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Nir Muchtar
550e5ca77e RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:22 -07:00
Liu Yuan
52f81dbaf1 RDMA/nes: Add a check for strict_strtoul()
It should check if strict_strtoul() succeeds before using
'wqm_quanta_value'.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>

[ Convert to kstrtoul() directly while we're here.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-24 10:06:25 -07:00
Steve Wise
807838686e RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
tx_ack() wasn't checking the endpoint state and consequently would
attempt to post the p2p 0B read on an endpoint/QP that is closing or
aborting.  This causes a NULL pointer dereference crash.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-24 10:01:04 -07:00
Steve Wise
c337374bf2 RDMA/cxgb4: Use completion objects for event blocking
There exists a race condition when using wait_queue_head_t objects
that are declared on the stack.  This was being done in a few places
where we are sending work requests to the FW and awaiting replies, but
we don't have an endpoint structure with an embedded c4iw_wr_wait
struct.  So the code was allocating it locally on the stack.  Bad
design.  The race is:

  1) thread on cpuX declares the wait_queue_head_t on the stack, then
     posts a firmware WR with that wait object ptr as the cookie to be
     returned in the WR reply.  This thread will proceed to block in
     wait_event_timeout() but before it does:

  2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
     this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
     variable in the c4iw_wr_wait object to TRUE and will call
     wake_up(), but before it calls wake_up():

  3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
     wait_event_timeout().  The wait_event_timeout() macro checks the
     condition variable and returns immediately since it is TRUE.  So
     this thread never blocks/sleeps. The function then returns
     effectively deallocating the c4iw_wr_wait object that was on the
     stack.

  4) So at this point cpuY has a pointer to the c4iw_wr_wait object
     that is no longer valid.  Further its pointing to a stack frame
     that might now be in use by some other context/thread.  So cpuY
     continues execution and calls wake_up() on a ptr to a wait object
     that as been effectively deallocated.

This race, when it hits, can cause a crash in wake_up(), which I've
seen under heavy stress. It can also corrupt the referenced stack
which can cause any number of failures.

The fix:

Use struct completion, which supports on-stack declarations.
Completions use a spinlock around setting the condition to true and
the wake up so that steps 2 and 4 above are atomic and step 3 can
never happen in-between.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2011-05-24 09:47:38 -07:00
Roland Dreier
737b94eb41 IB/srp: Fix integer -> pointer cast warnings
Fix

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
    drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
    drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size

by adding an intermediate cast to uintptr_t.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
2011-05-23 11:30:04 -07:00
Roland Dreier
c3af0980ce IB: Add devnode methods to cm_class and umad_class
We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666.  Add
appropriate devnode methods to their class structs for this.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:24:28 -07:00
Ira Weiny
c8367c4cd9 IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
We had a script which was looping through the devices returned from
ibstat and attempted to register a SMI agent on an ethernet device.
This caused a kernel panic for IBoE devices that don't have QP0.

Fix this by checking if the QP exists before using it.

Signed-off-by: Ira Weiny <weiny2@llnl.gov>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:15:48 -07:00
Roland Dreier
71c29bd5c2 IB/uverbs: Add devnode method to set path/mode
We want udev to create a device node under /dev/infiniband with
permission 0666 for uverbsX devices, so add a devnode method to set the
appropriate info.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-23 11:10:05 -07:00
Roland Dreier
04ea2f8197 RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
We want udev to create a device node under /dev/infiniband with
permission 0666 for rdma_cm, so add that info to our struct miscdevice.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
2011-05-23 10:48:43 -07:00
Paul Gortmaker
70c7160619 Add appropriate <linux/prefetch.h> include for prefetch users
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout.  Give
them an explicit include via the following $0.02 script.

 =========================================
 #!/bin/bash
 MANUAL=""
 for i in `git grep -l 'prefetch(.*)' .` ; do
 	grep -q '<linux/prefetch.h>' $i
 	if [ $? = 0 ] ; then
 		continue
 	fi

 	(	echo '?^#include <linux/?a'
 		echo '#include <linux/prefetch.h>'
 		echo .
 		echo w
 		echo q
 	) | ed -s $i > /dev/null 2>&1
 	if [ $? != 0 ]; then
 		echo $i needs manual fixup
 		MANUAL="$i $MANUAL"
 	fi
 done
 echo ------------------- 8\<----------------------
 echo vi $MANUAL
 =========================================

Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
  non-network drivers and the fib_trie.c case    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 21:41:57 -07:00