Commit Graph

3948 Commits

Author SHA1 Message Date
Linus Torvalds
1d21b1bf53 Main batch of InfiniBand/RDMA changes for 3.16:
- Add iWARP port mapper to avoid conflicts between RDMA and normal
    stack TCP connections.
 
  - Fixes for i386 / x86-64 structure padding differences (ABI
    compatibility for 32-on-64) from Yann Droneaud.
 
  - A pile of SRP initiator fixes from Bart Van Assche.
 
  - Fixes for a writeback / memory allocation deadlock with NFS over
    IPoIB connected mode from Jiri Kosina.
 
  - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
    low-level drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTlzyEAAoJEENa44ZhAt0h9yoP/1UeXlejOpCJyiNdtJZ+ilcU
 cb0PEzsjzqACyDqcoQ0EpQM3/3emccVIC3uUXK12mzlTIXOFYTeRLays/TbxZDLt
 FK5D/NrMmmJmciPt1ZRgUX82kFFRGScEfpkXYs7jxtRaNT7CW5KwSNQr6aFXskUz
 1gpdK1ARCN5rWcGl2HJx5o9C4c/Fa/Vov8lOsAkUZXD1SuPNT/fFN0u1pRzU68g0
 k3oj81XnZq5ejOBQKXEHImcmjXwaJ2yjmzxhSsKebqDWDdXuS/F9e4taKneHTZmr
 AdwJaLLJPWmAGi/vYYhkuLKpzIDpzMCqwr39lEabmjWvznYOlnjfVUXwUTE2nwNC
 DIXuHOLFrSvF2cNxh8ZeEYKS8AV+PjAOahPC5whkWkY256Q67uB7cy9ilWAK+7xS
 QcQ5Inr6iXvxIGYA4hNwUo8aK0NuKFwhkVVFEbkPaurbQZPqiKwyVE3w2FOws/Qp
 0kLLCVvpRQYjKzkxyof2tb1AcNuVNKXHrYk6RaBDJ9mjxHbhvY4OSt4CBxAAXBu6
 zoedUydN1Nz1UgAB1jDsBdyE2QQnXockA1+JJKNq6gM5Dz0DUdAylzQ2NqY9tnYz
 RTzihEPYIiQUkV3B8ErbqsuO6z7M830AXO5AR6bLZn1zgJ0cbMLBaKLA8LRufJI/
 qxNVwL32Uv1PjKZ+yX1x
 =Wcdc
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main InfiniBand/RDMA updates from Roland Dreier:

 - add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - a pile of SRP initiator fixes from Bart Van Assche.

 - fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level
   drivers.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits)
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service
  RDMA/core: Add support for iWARP Port Mapper user space service
  IB/mlx4: Fix gfp passing in create_qp_common()
  IB/umad: Fix use-after-free on close
  IB/core: Fix kobject leak on device register error flow
  RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp
  mlx4_core: Fix GFP flags parameters to be gfp_t
  IB/core: Fix port kobject deletion during error flow
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix sparse warnings about redeclared functions
  IB/mad: Fix sparse warning about gfp_t use
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB: Add a QP creation flag to use GFP_NOIO allocations
  IB: Return error for unsupported QP creation flags
  IB: Allow build of hw/ and ulp/ subdirectories independently
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
  IB/srp: Avoid problems if a header uses pr_fmt
  IB/umad: Fix error handling
  ...
2014-06-10 10:41:33 -07:00
Roland Dreier
eeaddf3670 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-06-10 10:12:14 -07:00
Steve Wise
9eccfe109b RDMA/cxgb4: Add support for iWARP Port Mapper user space service
Based on original work by Vipul Pandya.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>

[ Fix htons -> ntohs to make sparse happy.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10 10:12:06 -07:00
Tatyana Nikolova
5647263cb1 RDMA/nes: Add support for iWARP Port Mapper user space service
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10 10:12:06 -07:00
Tatyana Nikolova
30dc5e63d6 RDMA/core: Add support for iWARP Port Mapper user space service
This patch adds iWARP Port Mapper (IWPM) Version 2 support.  The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections.  They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports.  The iWARP Port
Mapper provides a standard mechanism to accomplish this.  Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application.  If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space.  All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process.  Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection.  The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client.  In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
   ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
   generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
   corresponding to the two iWarp drivers - nes and cxgb4 which use
   the IWPM service

4) Enums are added to enumerate the attributes in the netlink
   messages, which are exchanged between the user space IWPM service
   and the iWARP drivers

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com>

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
  Carpenter and Steve Wise.  Fix sparse endianness in hash.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10 10:11:45 -07:00
Jiri Kosina
6fcd8d0d93 IB/mlx4: Fix gfp passing in create_qp_common()
There are two kzalloc() calls which were not converted to use value of
gfp passed to create_qp_common() instead of using hardcoded GFP_KERNEL
in 40f2287bd5 ("IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO").  Fix
this by passing gfp value down properly.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-09 10:17:12 -07:00
Linus Torvalds
052e5c7e28 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the remaining fixes for v3.15.

  This series includes:

   - iser-target fix for ImmediateData exception reference count bug
     (Sagi + nab)
   - iscsi-target fix for MC/S login + potential iser-target MRDSL
     buffer overrun (Santosh + Roland)
   - iser-target fix for v3.15-rc multi network portal shutdown
     regression (nab)
   - target fix for allowing READ_CAPCITY during ALUA Standby access
     state (Chris + nab)
   - target fix for NULL pointer dereference of alua_access_state for
     un-configured devices (Chris + nab)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix alua_access_state attribute OOPs for un-configured devices
  target: Allow READ_CAPACITY opcode in ALUA Standby access state
  iser-target: Fix multi network portal shutdown regression
  iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value()
  iser-target: Add missing target_put_sess_cmd for ImmedateData failure
2014-06-07 15:01:39 -07:00
Bart Van Assche
60e1751cb5 IB/umad: Fix use-after-free on close
Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
triggers a use-after-free.  __fput() invokes f_op->release() before it
invokes cdev_put().  Make sure that the ib_umad_device structure is
freed by the cdev_put() call instead of f_op->release().  This avoids
that changing the port mode from IB into Ethernet and back to IB
followed by restarting opensmd triggers the following kernel oops:

    general protection fault: 0000 [#1] PREEMPT SMP
    RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
    Call Trace:
     [<ffffffff81190f20>] cdev_put+0x20/0x30
     [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
     [<ffffffff8118e35e>] ____fput+0xe/0x10
     [<ffffffff810723bc>] task_work_run+0xac/0xe0
     [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
     [<ffffffff814b8398>] int_signal+0x12/0x17

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6b5: IB/umad: Fix error handling
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-06 11:38:31 -07:00
Haggai Eran
584482ac80 IB/core: Fix kobject leak on device register error flow
The ports kobject isn't being released during error flow in device
registration.  This patch refactors the ports kobject cleanup into a
single function called from both the error flow in device registration
and from the unregistration function.

A couple of attributes aren't being deleted (iw_stats_group, and
ib_class_attributes).  While this may be handled implicitly by the
destruction of their kobjects, it seems better to handle all the
attributes the same way.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Make free_port_list_attributes() static.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05 09:37:10 -07:00
Yann Droneaud
b7dfa8895f RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp
The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.

So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding is
not added.

The tool pahole can be used to find such implicit padding:

  $ pahole --anon_include \
           --nested_anon_include \
           --recursive \
           --class_name c4iw_alloc_ucontext_resp \
           drivers/infiniband/hw/cxgb4/iw_cxgb4.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
  --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
  @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp {
          __u64                      status_page_key;      /*     0     8 */
          __u32                      status_page_size;     /*     8     4 */

  -       /* size: 12, cachelines: 1, members: 2 */
  -       /* last cacheline: 12 bytes */
  +       /* size: 16, cachelines: 1, members: 2 */
  +       /* padding: 4 */
  +       /* last cacheline: 16 bytes */
   };

This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.

If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.

Additionally, as reported by Dan Carpenter, without the implicit
padding being properly cleared, an information leak would take place
in most architectures.

This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp,
and, like 92b0ca7cb1 ("IB/mlx5: Fix stack info leak in
mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_alloc_ucontext_resp as expected by
unpatched and patched i386 libcxgb4.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
Link: http://marc.info/?i=20140328082428.GH25192@mwanda
Cc: <stable@vger.kernel.org>
Fixes: 05eb23893c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
Reported-by: Yann Droneaud <ydroneaud@opteya.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05 09:13:54 -07:00
Haggai Eran
cad6d02acc IB/core: Fix port kobject deletion during error flow
When encountering an error during the add_port function, adding a port
to sysfs, the port kobject is freed without being deleted from sysfs.

Instead of freeing it directly, the patch uses kobject_put to release
the kobject and delete it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:03:49 -07:00
Haggai Eran
373c0ea181 IB/core: Remove unneeded kobject_get/put calls
The ib_core module will call kobject_get on the parent object of each
kobject it creates.  This is redundant since kobject_add does that
anyway.

As a side effect, this patch should fix leaking the ports kobject and
the device kobject during unregister flow, since the previous code
didn't seem to take into account the kobject_get calls on behalf of
the child kobjects.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:03:49 -07:00
Roland Dreier
8385fd8414 IB/core: Fix sparse warnings about redeclared functions
Fix a few functions that are declared with __attribute_const__ in the
ib_verbs.h header file but defined without it in verbs.c.  This gets rid
of the following sparse warnings:

    drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers
    drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers
    drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers
    drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04 10:01:42 -07:00
Nicholas Bellinger
6cc44a6fb4 iser-target: Add missing target_put_sess_cmd for ImmedateData failure
This patch addresses a bug where an early exception for SCSI WRITE
with ImmediateData=Yes was missing the target_put_sess_cmd() call
to drop the extra se_cmd->cmd_kref reference obtained during the
normal iscsit_setup_scsi_cmd() codepath execution.

This bug was manifesting itself during session shutdown within
isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would
end up waiting indefinately for the last se_cmd->cmd_kref put to
occur for the failed SCSI WRITE + ImmediateData descriptors.

This fix follows what traditional iscsi-target code already does
for the same failure case within iscsit_get_immediate_data().

Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03 19:17:31 -07:00
Roland Dreier
5343c00dd0 IB/mad: Fix sparse warning about gfp_t use
Properly convert gfp_t & result to bool to fix:

    drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types)
    drivers/infiniband/core/sa_query.c:621:33:    expected bool [unsigned] [usertype] preload
    drivers/infiniband/core/sa_query.c:621:33:    got restricted gfp_t

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-03 10:24:24 -07:00
Jiri Kosina
40f2287bd5 IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
Modify the various routines used to allocate memory resources which
serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
driver to use GFP_KERNEL in it's QP allocations as done prior to this
commit, and the IB driver to use GFP_NOIO when the IB verbs
IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:58:11 -07:00
Or Gerlitz
09b93088d7 IB: Add a QP creation flag to use GFP_NOIO allocations
This addresses a problem where NFS client writes over IPoIB connected
mode may deadlock on memory allocation/writeback.

The problem is not directly memory reclamation.  There is an indirect
dependency between network filesystems writing back pages and
ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
make forward progress until ipoib_cm_tx_init() succeeds and it is
stuck in page reclaim itself waiting for network transmission.
Ordinarily this situation may be avoided by having the caller use
GFP_NOFS but ipoib_cm_tx_init() does not have that information.

To address this, take a general approach and add a new QP creation
flag that tells the low-level hardware driver to use GFP_NOIO for the
memory allocations related to the new QP.

Use the new flag in the ipoib connected mode path, and if the driver
doesn't support it, re-issue the QP creation without the flag.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:58:11 -07:00
Or Gerlitz
60093dc0c8 IB: Return error for unsupported QP creation flags
Fix the usnic and thw qib drivers to err when QP creation flags that
they don't understand are provided.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:58:11 -07:00
Yann Droneaud
729ee4efcc IB: Allow build of hw/ and ulp/ subdirectories independently
It is not possible to build only the drivers/infiniband/hw/ (or ulp/)
subdirectory with command such as:

    $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/

This fails with following error messages:

    make[2]: Nothing to be done for `all'.
    make[2]: Nothing to be done for `relocs'.
      CHK     include/config/kernel.release
      Using /home/ydroneaud/src/linux as source for kernel
      GEN     /home/ydroneaud/src/linux/obj-x86_64/Makefile
      CHK     include/generated/uapi/linux/version.h
      CHK     include/generated/utsrelease.h
      CALL    /home/ydroneaud/src/linux/scripts/checksyscalls.sh
    /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory
    make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'.  Stop.
    make[1]: *** [drivers/infiniband/hw/] Error 2
    make: *** [sub-make] Error 2

This patch creates a Makefile in hw/ and ulp/ and moves each
corresponding parts of drivers/infiniband/Makefile in the new
Makefiles.

It should not break build except if some hw/ drivers or ulp/ were
allowed previously to be built while CONFIG_INFINIBAND is set to 'n',
but according to drivers/infiniband/Kconfig, it's not possible. So it
should be safe to apply.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:51:12 -07:00
Yann Droneaud
b6f04d3d21 RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.

So for most ABI struct c4iw_create_cq_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding
is not added.

The tool pahole can be used to find such implicit padding:

  $ pahole --anon_include \
           --nested_anon_include \
           --recursive \
           --class_name c4iw_create_cq_resp \
           drivers/infiniband/hw/cxgb4/iw_cxgb4.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
  --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
  @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
          __u32                      size;                 /*    28     4 */
          __u32                      qid_mask;             /*    32     4 */

  -       /* size: 36, cachelines: 1, members: 6 */
  -       /* last cacheline: 36 bytes */
  +       /* size: 40, cachelines: 1, members: 6 */
  +       /* padding: 4 */
  +       /* last cacheline: 40 bytes */
   };

This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.

If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.

This patch adds an explicit padding at end of structure
c4iw_create_cq_resp, and, like 92b0ca7cb1 ("IB/mlx5: Fix stack info
leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_create_cq_resp as expected by
unpatched and patched i386 libcxgb4.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Fixes: e24a72a330 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:44:57 -07:00
Joe Perches
d236cd0e20 IB/srp: Avoid problems if a header uses pr_fmt
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of
header files before it gets around to defining PFX.  This causes
problems if any of the header files do a pr_... and use pr_fmt().

Fix this by using KBUILD_MODNAME instead of the private PFX.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:32:54 -07:00
Bart Van Assche
8ec0a0e6b5 IB/umad: Fix error handling
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
or if nonseekable_open() fails.

Avoid leaking a kref count, that sm_sem is kept down and also that the
IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
nonseekable_open() fails.

Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL.

Moving the kref_get() call from the start of ib_umad_*open() to the
end is safe since it is the responsibility of the caller of these
functions to ensure that the cdev pointer remains valid until at least
when these functions return.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>

[ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>

[ nonseekable_open() can't actually fail, but....  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:27:30 -07:00
Jack Morgenstein
65fed8a8c1 IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs
This commit adds the sysfs interface for enabling QP0 on VFs for
selected VF/port.

By default, no VFs are enabled for QP0 operation.

To enable QP0 operation on a VF/port, under
/sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries:

- smi_enabled (read-only). Indicates whether smi is currently
  enabled for the indicated VF/port

- enable_smi_admin (rw). Used by the admin to request that smi
  capability be enabled or disabled for the indicated VF/port.
  0 = disable, 1 = enable.
  The requested enablement will occur at the next reset of the
  VF (e.g. driver restart on the VM which owns the VF).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:13:19 -07:00
Jack Morgenstein
99ec41d0a4 mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs
This commit adds the infrastructure for enabling selected VFs to
operate SMI (QP0) MADs without restriction.

Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs
are MLX QPs.  As such, they operate over VL15.  Therefore, they are
not affected by "credit" problems or changes in the VLArb table (which
may shut down VL0).

Non-enabled VFs may only create UD proxy QP0 qps (which are forced by
the hypervisor to send packets using the q-key it assigns and places
in the qp-context).  Thus, non-enabled VFs will not pose a security
risk.  The hypervisor discards any privileged MADs it receives from
these non-enabled VFs.

By default, all VFs are NOT enabled, and must explicitly be enabled
by the administrator.

The sysfs interface which operates the VF enablement infrastructure
is provided in the next commit.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:13:09 -07:00
Jack Morgenstein
97982f5a91 IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses
Currently, VFs in SRIOV VFs are denied QP0 access.  The main reason
for this decision is security, since Subnet Management Datagrams
(SMPs) are not restricted by network partitioning and may affect the
physical network topology.  Moreover, even the SM may be denied access
from portions of the network by setting management keys unknown to the
SM.

However, it is desirable to grant SMI access to certain privileged
VFs, so that certain network management activities may be conducted
within virtual machines instead of the hypervisor.

This commit does the following:

1. Create QP0 tunnel QPs for all VFs.

2. Discard SMI mads sent-from/received-for non-privileged VFs in the
   hypervisor MAD multiplex/demultiplex logic.  SMI mads from/for
   privileged VFs are allowed to pass.

3. MAD_IFC wrapper changes/fixes.  For non-privileged VFs, only
   host-view MAD_IFC commands are allowed, and only for SMI LID-Routed
   GET mads.  For privileged VFs, there are no restrictions.

This commit does not allow privileged VFs as yet.  To determine if a VF
is privileged, it calls function mlx4_vf_smi_enabled().  This function
returns 0 unconditionally for now.

The next two commits allow defining and activating privileged VFs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:12:58 -07:00
Jack Morgenstein
61565013cf IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped
mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations
counters and for modifying the IB port capability flags.

For example, when opensm is started up on the hypervisor,
mlx4_ib_modify_port is called to set the port's IsSM flag.

In multifunction mode, the SET_PORT command used in this flow should
be wrapped (so that the PF port capability flags are also tracked,
thus enabling the aggregate of all the VF/PF capability flags to be
tracked properly).

The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT()
to differentiate it from procedure mlx4_SET_PORT() in port.c.
mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port().

Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag
even when running over RoCE.  Therefore, when RoCE is active,
mlx4_ib_modify_port should return OK unconditionally (since the
capability flags and qkey violations counter are not relevant).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:12:58 -07:00
Vinit Agnihotri
0a66d2bd30 IB/qib: Additional Intel branding changes
This patches changes user visible function names containing "qlogic"
in module init and cleanup.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:06:39 -07:00
Dan Carpenter
3c735d481b RDMA/cxgb3: Remove a couple unneeded conditions
We know that "reset_tpt_entry" is false on this side of the if else
statement so there is no need to check again.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28 10:04:00 -07:00
Colin Ian King
bfdfcfee3c IB/mlx4: fix unitialised variable is_mcast
Commit 297e0dad72 ("IB/mlx4: Handle Ethernet L2 parameters for IP
based GID addressing") introduced a bug where is_mcast is now no
longer initialized on the non-multicast condition and so it can be
any random value from the stack.  This issue was detected by cppcheck:

    [drivers/infiniband/hw/mlx4/ah.c:103]: (error) Uninitialized
      variable: is_mcast

Simple fix is to initialise is_mcast to zero.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28 10:00:06 -07:00
Manuel Schölling
49410185c3 IB/ipath: Use time_before()/_after()
Time comparisons must use time_after / time_before to avoid problems
when jiffies wraps.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28 09:57:06 -07:00
Roland Dreier
6c9b5d9b00 IB/mlx5: Fix warning about cast of wr_id back to pointer on 32 bits
We need to cast wr_id to unsigned long before casting to a pointer.
This fixes:

       drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_umr_cq_handler':
    >> drivers/infiniband/hw/mlx5/mr.c:724:13: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
          context = (struct mlx5_ib_umr_context *)wc.wr_id;

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-28 09:23:03 -07:00
Upinder Malhi
ed477c4c83 IB/usnic: Fix source file missing copyright and license
Prepends copyright and license to usnic_uiom_interval_tree.c

Signed-off-by: Upinder Malhi <umalhi@cisco.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 13:24:40 -07:00
Dennis Dalessandro
7e6d3e5c70 IB/ipath: Translate legacy diagpkt into newer extended diagpkt
This patch addresses an issue where the legacy diagpacket is sent in
from the user, but the driver operates on only the extended
diagpkt. This patch specifically initializes the extended diagpkt
based on the legacy packet.

Cc: <stable@vger.kernel.org>
Reported-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 13:21:04 -07:00
Mike Marciniszyn
911eccd284 IB/qib: Fix port in pkey change event
The code used a literal 1 in dispatching an IB_EVENT_PKEY_CHANGE.

As of the dual port qib QDR card, this is not necessarily correct.

Change to use the port as specified in the call.

Cc: <stable@vger.kernel.org>
Reported-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 13:20:26 -07:00
Dan Carpenter
e4514cbd97 RDMA/cxgb3: Fix information leak in send_abort()
The cpl_abort_req struct has several reserved members which need to be
cleared to avoid disclosing kernel information.  I have added a memset()
so now it matches the cxgb4 version of this function.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:55:40 -07:00
Yann Droneaud
43bc889380 IB/mlx5: add missing padding at end of struct mlx5_ib_create_srq
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.

So for most ABIs struct mlx5_ib_create_srq gets implicitly padded to be
aligned on a 8 bytes multiple, while for i386, such padding is not
added.

Tool pahole could be used to find such implicit padding:

  $ pahole --anon_include \
           --nested_anon_include \
           --recursive \
           --class_name mlx5_ib_create_srq \
           drivers/infiniband/hw/mlx5/mlx5_ib.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt    2014-03-28 11:43:07.386413682 +0100
  --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt  2014-03-27 13:06:17.788472721 +0100
  @@ -69,7 +68,6 @@ struct mlx5_ib_create_srq {
          __u64                      db_addr;              /*     8     8 */
          __u32                      flags;                /*    16     4 */

  -       /* size: 20, cachelines: 1, members: 3 */
  -       /* last cacheline: 20 bytes */
  +       /* size: 24, cachelines: 1, members: 3 */
  +       /* padding: 4 */
  +       /* last cacheline: 24 bytes */
   };

ABI disagreement will make an x86_64 kernel try to read past
the buffer provided by an i386 binary.

When boundary check will be implemented, the x86_64 kernel will
refuse to read past the i386 userspace provided buffer and the
uverb will fail.

Anyway, if the structure lay in memory on a page boundary and
next page is not mapped, ib_copy_from_udata() will fail and the
uverb will fail.

This patch makes create_srq_user() takes care of the input
data size to handle the case where no padding was provided.

This way, x86_64 kernel will be able to handle struct mlx5_ib_create_srq
as sent by unpatched and patched i386 libmlx5.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:16 -07:00
Yann Droneaud
a8237b32a3 IB/mlx5: add missing padding at end of struct mlx5_ib_create_cq
The i386 ABI disagrees with most other ABIs regarding alignment of
data type larger than 4 bytes: on most ABIs a padding must be added at
end of the structures, while it is not required on i386.

So for most ABI struct mlx5_ib_create_cq get padded to be aligned on a
8 bytes multiple, while for i386, such padding is not added.

The tool pahole can be used to find such implicit padding:

  $ pahole --anon_include \
  	 --nested_anon_include \
  	 --recursive \
  	 --class_name mlx5_ib_create_cq \
  	 drivers/infiniband/hw/mlx5/mlx5_ib.o

Then, structure layout can be compared between i386 and x86_64:

  +++ obj-i386/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt    2014-03-28 11:43:07.386413682 +0100
  --- obj-x86_64/drivers/infiniband/hw/mlx5/mlx5_ib.o.pahole.txt  2014-03-27 13:06:17.788472721 +0100
  @@ -34,9 +34,8 @@ struct mlx5_ib_create_cq {
          __u64                      db_addr;              /*     8     8 */
          __u32                      cqe_size;             /*    16     4 */

  -       /* size: 20, cachelines: 1, members: 3 */
  -       /* last cacheline: 20 bytes */
  +       /* size: 24, cachelines: 1, members: 3 */
  +       /* padding: 4 */
  +       /* last cacheline: 24 bytes */
   };

This ABI disagreement will make an x86_64 kernel try to read past the
buffer provided by an i386 binary.

When boundary check will be implemented, a x86_64 kernel will refuse
to read past the i386 userspace provided buffer and the uverb will
fail.

Anyway, if the structure lies in memory on a page boundary and next
page is not mapped, ib_copy_from_udata() will fail when trying to read
the 4 bytes of padding and the uverb will fail.

This patch makes create_cq_user() takes care of the input data size to
handle the case where no padding is provided.

This way, x86_64 kernel will be able to handle struct
mlx5_ib_create_cq as sent by unpatched and patched i386 libmlx5.

Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapter")
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:13 -07:00
Shachar Raindel
a74d24168d IB/mlx5: Refactor UMR to have its own context struct
Instead of having the UMR context part of each memory region, allocate
a struct on the stack.  This allows queuing multiple UMRs that access
the same memory region.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:09 -07:00
Haggai Eran
48fea837bb IB/mlx5: Set QP offsets and parameters for user QPs and not just for kernel QPs
For user QPs, the creation process does not currently initialize the fields:

 * qp->rq.offset
 * qp->sq.offset
 * qp->sq.wqe_shift

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:08 -07:00
Haggai Eran
b475598aec mlx5_core: Store MR attributes in mlx5_mr_core during creation and after UMR
The patch stores iova, pd and size during mr creation and after UMRs
that modify them.  It removes the unused access flags field.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:06 -07:00
Haggai Eran
8605933a22 IB/mlx5: Add MR to radix tree in reg_mr_callback
For memory regions that are allocated using reg_umr, the suffix of
mlx5_core_create_mkey isn't being called.  Instead the creation is
completed in a callback function (reg_mr_callback).  This means that
these MRs aren't being added to the MR radix tree.  Add them in the
callback.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Haggai Eran
096f7e72c6 IB/mlx5: Fix error handling in reg_umr
If ib_post_send fails when posting the UMR work request in reg_umr,
the code doesn't release the temporary pas buffer allocated, and
doesn't dma_unmap it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:05 -07:00
Sagi Grimberg
c7f44fbda6 mlx5_core: Copy DIF fields only when input and output space values match
Some DIF implementations (SCSI initiator/target) may want to use different
input/output values for application tag and/or reference tag. So in
case memory/wire domain values don't match HW must not copy them.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:53:02 -07:00
Sagi Grimberg
5c273b1677 mlx5_core: Simplify signature handover wqe for interleaved buffers
No need for repetition format pattern in case the data and protection
are already interleaved in the memory domain since the pattern
already exists. A single key entry is sufficient and may save some
extra fetch ops.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:58 -07:00
Sagi Grimberg
8524867b9c mlx5_core: Fix signature handover operation for interleaved buffers
When the data and protection are interleaved in the memory domain, no
need to expand the mkey total length.

At the moment no Linux user works (iSER initiator & target) in
interleaved mode. This may change in the future as for SCSI
pass-through devices there is no real point in target performing
de-interleaving and re-interleaving of the protection data in the PT
stage. Regardless, signature verbs support this mode.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-27 11:52:54 -07:00
Or Gerlitz
c7ca4b69d9 IB/iser: Bump version to 1.4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Roi Dayan
e7eeffa4a0 IB/iser: Add missing newlines to logging messages
Logging messages need terminating newlines to avoid possible message
interleaving.  Add them.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
66d4e62d27 IB/iser: Fix a possible race in iser connection states transition
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event
and ep_disconnect may race. In this case, the iser connection state
may transition to UP (after ep_disconnect transitioned it to
TERMINATING), while the connection is being torn down.

Upon RDMA_CM event ESTABLISHED we allow iser connection state to
transition to UP only from PENDING. We also make sure to protect this
state change (done under the connection lock).

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
b73c3adabd IB/iser: Simplify connection management
iSER relies on refcounting to manage iser connections establishment
and teardown.

Following commit 39ff05dbbb ("IB/iser: Enhance disconnection logic
for multi-pathing"), iser connection maintain 3 references:

 - iscsi_endpoint (at creation stage)
 - cma_id (at connection request stage)
 - iscsi_conn (at bind stage)

We can avoid taking explicit refcounts by correctly serializing iser
teardown flows (graceful and non-graceful).

Our approach is to trigger a scheduled work to handle ordered teardown
by gracefully waiting for 2 cleanup stages to complete:

 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion
 2. Flush errors processing

Each completed stage will notify a waiting worker thread when it is
done to allow teardwon continuation.

Since iSCSI connection establishment may trigger endpoint disconnect
without a successful endpoint connect, we rely on the iscsi <-> iser
binding (.conn_bind) to learn about the teardown policy we should take
wrt cleanup stages.

Since all cleanup worker threads are scheduled (release_wq) in
.ep_disconnect it is safe to assume that when module_exit is called,
all cleanup workers are already scheduled. Thus proper module unload
shall flush all scheduled works before allowing safe exit, to
guarantee no resources got left behind.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Linus Torvalds
5fa6a683c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "It looks like a sizeble collection but this is nearly 3 weeks of bug
  fixing while you were away.

   1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute
      the packet through a non-IPSEC protected path and the code has to
      be able to handle SKBs attached to routes lacking an attached xfrm
      state.  From Steffen Klassert.

   2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported
      sub-protocols, also from Steffen Klassert.

   3) Set local_df on fragmented netfilter skbs otherwise we won't be
      able to forward successfully, from Florian Westphal.

   4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without
      holding RCU lock, from Bjorn Mork.

   5) local_df test in ip_may_fragment is inverted, from Florian
      Westphal.

   6) jme driver doesn't check for DMA mapping failures, from Neil
      Horman.

   7) qlogic driver doesn't calculate number of TX queues properly, from
      Shahed Shaikh.

   8) fib_info_cnt can drift irreversibly positive if we fail to
      allocate the fi->fib_metrics array, from Sergey Popovich.

   9) Fix use after free in ip6_route_me_harder(), also from Sergey
      Popovich.

  10) When SYSCTL is disabled, we don't handle local_port_range and
      ping_group_range defaults properly at all, from Cong Wang.

  11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim
      driver, fix from Bjorn Mork.

  12) cassini driver needs nested lock annotations for TX locking, from
      Emil Goode.

  13) On init error ipv6 VTI driver can unregister pernet ops twice,
      oops.  Fix from Mahtias Krause.

  14) If macvlan device is down, don't propagate IFF_ALLMULTI changes,
      from Peter Christensen.

  15) Missing NULL pointer check while parsing netlink config options in
      ip6_tnl_validate().  From Susant Sahani.

  16) Fix handling of neighbour entries during ipv6 router reachability
      probing, from Duan Jiong.

  17) x86 and s390 JIT address randomization has some address
      calculation bugs leading to crashes, from Alexei Starovoitov and
      Heiko Carstens.

  18) Clear up those uglies with nop patching and net_get_random_once(),
      from Hannes Frederic Sowa.

  19) Option length miscalculated in ip6_append_data(), fix also from
      Hannes Frederic Sowa.

  20) A while ago we fixed a race during device unregistry when a
      namespace went down, turns out there is a second place that needs
      similar protection.  From Cong Wang.

  21) In the new Altera TSE driver multicast filtering isn't working,
      disable it and just use promisc mode until the cause is found.
      From Vince Bridgers.

  22) When we disable router enabling in ipv6 we have to flush the
      cached routes explicitly, from Duan Jiong.

  23) NBMA tunnels should not cache routes on the tunnel object because
      the key is variable, from Timo Teräs.

  24) With stacked devices GRO information in skb->cb[] can be not setup
      properly, make sure it is in all code paths.  From Eric Dumazet.

  25) Really fix stacked vlan locking, multiple levels of nesting with
      intervening non-vlan devices are possible.  From Vlad Yasevich.

  26) Fallback ipip tunnel device's mtu is not setup properly, from
      Steffen Klassert.

  27) The packet scheduler's tcindex filter can crash because we
      structure copy objects with list_head's inside, oops.  From Cong
      Wang.

  28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric
      Dumazet.

  29) In some configurations 'itag' in __mkroute_input() can end up
      being used uninitialized because of how fib_validate_source()
      works.  Fix it by explitly initializing itag to zero like all the
      other fib_validate_source() callers do, from Li RongQing"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  batman: fix a bogus warning from batadv_is_on_batman_iface()
  ipv4: initialise the itag variable in __mkroute_input
  bonding: Send ALB learning packets using the right source
  bonding: Don't assume 802.1Q when sending alb learning packets.
  net: doc: Update references to skb->rxhash
  stmmac: Remove unbalanced clk_disable call
  ipv6: gro: fix CHECKSUM_COMPLETE support
  net_sched: fix an oops in tcindex filter
  can: peak_pci: prevent use after free at netdev removal
  ip_tunnel: Initialize the fallback device properly
  vlan: Fix build error wth vlan_get_encap_level()
  can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option
  MAINTAINERS: Pravin Shelar is Open vSwitch maintainer.
  bnx2x: Convert return 0 to return rc
  bonding: Fix alb mode to only use first level vlans.
  bonding: Fix stacked device detection in arp monitoring
  macvlan: Fix lockdep warnings with stacked macvlan devices
  vlan: Fix lockdep warning with stacked vlan devices.
  net: Allow for more then a single subclass for netif_addr_lock
  net: Find the nesting level of a given device by type.
  ...
2014-05-23 15:29:43 -07:00