Commit Graph

148 Commits

Author SHA1 Message Date
Michael S. Tsirkin
46aa92d1ba vhost/net: fix heads usage of ubuf_info
ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.

Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 14:28:54 -04:00
Linus Torvalds
ecc88efbe7 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target updates from Nicholas Bellinger:
 "The highlights in this series include:

   - Improve sg_table lookup scalability in RAMDISK_MCP (martin)

   - Add device attribute to expose config name for INQUIRY model (tregaron)

   - Convert tcm_vhost to use lock-less list for cmd completion (asias)

   - Add tcm_vhost support for multiple target's per endpoint (asias)

   - Add tcm_vhost support for multiple queues per vhost (asias)

   - Add missing mapped_lun bounds checking during make_mappedlun setup
     in generic fabric configfs code (jan engelhardt + nab)

   - Enforce individual iscsi-target network portal export once per
     TargetName endpoint (grover + nab)

   - Add WRITE_SAME w/ UNMAP=0 emulation to FILEIO backend (nab)

  Things have been mostly quiet this round, with majority of the work
  being done on the iser-target WIP driver + associated iscsi-target
  refactoring patches currently in flight for v3.10 code.

  At this point there is one patch series left outstanding from Asias to
  add support for UNMAP + WRITE_SAME w/ UNMAP=1 to FILEIO awaiting
  feedback from hch & Co, that will likely be included in a post
  v3.9-rc1 PULL request if there are no objections.

  Also, there is a regression bug recently reported off-list that seems
  to be effecting v3.5 and v3.6 kernels with MSFT iSCSI initiators that
  is still being tracked down.  No word if this effects >= v3.7 just
  yet, but if so there will likely another PULL request coming your
  way.."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (26 commits)
  target: Rename spc_get_write_same_sectors -> sbc_get_write_same_sectors
  target/file: Add WRITE_SAME w/ UNMAP=0 emulation support
  iscsi-target: Enforce individual network portal export once per TargetName
  iscsi-target: Refactor iscsit_get_np sockaddr matching into iscsit_check_np_match
  target: Add missing mapped_lun bounds checking during make_mappedlun setup
  target: Fix lookup of dynamic NodeACLs during cached demo-mode operation
  target: Fix parameter list length checking in MODE SELECT
  target: Fix error checking for UNMAP commands
  target: Fix sense data for out-of-bounds IO operations
  target_core_rd: break out unterminated loop during copy
  tcm_vhost: Multi-queue support
  tcm_vhost: Multi-target support
  target: Add device attribute to expose config_item_name for INQUIRY model
  target: don't truncate the fail intr address
  target: don't always say "ipv6" as address type
  target/iblock: Use backend REQ_FLUSH hint for WriteCacheEnabled status
  iscsi-target: make some temporary buffers larger
  tcm_vhost: Optimize gup in vhost_scsi_map_to_sgl
  tcm_vhost: Use iov_num_pages to calculate sgl_count
  tcm_vhost: Introduce iov_num_pages
  ...
2013-02-26 11:42:23 -08:00
Linus Torvalds
06991c28f3 Driver core patches for 3.9-rc1
Here is the big driver core merge for 3.9-rc1
 
 There are two major series here, both of which touch lots of drivers all
 over the kernel, and will cause you some merge conflicts:
   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.
   - remove CONFIG_EXPERIMENTAL
 
 If you need me to provide a merged tree to handle these resolutions,
 please let me know.
 
 Other than those patches, there's not much here, some minor fixes and
 updates.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9
 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW
 =yWAQ
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg Kroah-Hartman:
 "Here is the big driver core merge for 3.9-rc1

  There are two major series here, both of which touch lots of drivers
  all over the kernel, and will cause you some merge conflicts:

   - add a new function called devm_ioremap_resource() to properly be
     able to check return values.

   - remove CONFIG_EXPERIMENTAL

  Other than those patches, there's not much here, some minor fixes and
  updates"

Fix up trivial conflicts

* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
  base: memory: fix soft/hard_offline_page permissions
  drivercore: Fix ordering between deferred_probe and exiting initcalls
  backlight: fix class_find_device() arguments
  TTY: mark tty_get_device call with the proper const values
  driver-core: constify data for class_find_device()
  firmware: Ignore abort check when no user-helper is used
  firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
  firmware: Make user-mode helper optional
  firmware: Refactoring for splitting user-mode helper code
  Driver core: treat unregistered bus_types as having no devices
  watchdog: Convert to devm_ioremap_resource()
  thermal: Convert to devm_ioremap_resource()
  spi: Convert to devm_ioremap_resource()
  power: Convert to devm_ioremap_resource()
  mtd: Convert to devm_ioremap_resource()
  mmc: Convert to devm_ioremap_resource()
  mfd: Convert to devm_ioremap_resource()
  media: Convert to devm_ioremap_resource()
  iommu: Convert to devm_ioremap_resource()
  drm: Convert to devm_ioremap_resource()
  ...
2013-02-21 12:05:51 -08:00
Asias He
1b7f390eb3 tcm_vhost: Multi-queue support
This adds virtio-scsi multi-queue support to tcm_vhost. In order to use
multi-queue, guest side multi-queue support is need. It can
be found here:

   https://lkml.org/lkml/2012/12/18/166

Currently, only one thread is created by vhost core code for each
vhost_scsi instance. Even if there are multi-queues, all the handling of
guest kick (vhost_scsi_handle_kick) are processed in one thread. This is
not optimal. Luckily, most of the work is offloaded to the tcm_vhost
workqueue.

Some initial perf numbers:
1 queue,  4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 127K/127k IOPS

4 queues, 4 targets, 1 lun per target
4K request size, 50% randread + 50% randwrite: 181K/181k IOPS

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:30:14 -08:00
Asias He
67e18cf9ab tcm_vhost: Multi-target support
In order to take advantages of Paolo's multi-queue virito-scsi, we need
multi-target support in tcm_vhost first. Otherwise all the requests go
to one queue and other queues are idle.

This patch makes:

1. All the targets under the wwpn is seen and can be used by guest.
2. No need to pass the tpgt number in struct vhost_scsi_target to
   tcm_vhost.ko. Only wwpn is needed.
3. We can always pass max_target = 255 to guest now, since we abort the
   request who's target id does not exist.

Changes in v2:
- Handle non-contiguous tpgt

Changes in v3:
- Simplfy lock in vhost_scsi_set_endpoint
- Return -EEXIST when does not match

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:29:53 -08:00
Asias He
1810053e8d tcm_vhost: Optimize gup in vhost_scsi_map_to_sgl
We can get all the pages in one time instead of calling
gup N times.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:27:41 -08:00
Asias He
f3158f362c tcm_vhost: Use iov_num_pages to calculate sgl_count
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:27:41 -08:00
Asias He
765b34fdc1 tcm_vhost: Introduce iov_num_pages
Add a helper to calculate the number of pages needed for a iov entry.

(nab: Drop unnecessary inline)

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:27:40 -08:00
Asias He
9d6064a347 tcm_vhost: Use llist for cmd completion list
This drops the cmd completion list spin lock and makes the cmd
completion queue lock-less.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-02-13 11:27:31 -08:00
Linus Torvalds
e06b84052a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
    bunch of folks.  From Emmanuel Grumbach.

 2) Work limiting code in brcmsmac wifi driver can clear tx status
    without processing the event.  From Arend van Spriel.

 3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.

 4) l2tp tunnel delete can race with close, fix from Tom Parkin.

 5) pktgen_add_device() failures are not checked at all, fix from Cong
    Wang.

 6) Fix unintentional removal of carrier off from tun_detach(),
    otherwise we confuse userspace, from Michael S.  Tsirkin.

 7) Don't leak socket reference counts and ubufs in vhost-net driver,
    from Jason Wang.

 8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
    Horman.

 9) Protect against USB networking devices which spam the host with 0
    length frames, from Bjørn Mork.

10) Prevent neighbour overflows in ipv6 for locally destined routes,
    from Marcelo Ricardo.  This is the best short-term fix for this, a
    longer term fix has been implemented in net-next.

11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops.  This
    mistake is largely because the ipv6 functions don't even have some
    kind of prefix in their names to suggest they are ipv6 specific.
    From Tom Parkin.

12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
    Yuchung Cheng.

13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
    Francois Romieu and your's truly.

14) Fix infinite loops and divides by zero in TCP congestion window
    handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.

15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
    from Phil Sutter.

16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.

17) Protect XEN netback driver against hostile frontend putting garbage
    into the rings, don't leak pages in TX GOP checking, and add proper
    resource releasing in error path of xen_netbk_get_requests().  From
    Ian Campbell.

18) SCTP authentication keys should be cleared out and released with
    kzfree(), from Daniel Borkmann.

19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
    up corrupting socket memory accounting to the point where packet
    sending is halted indefinitely.  Just remove the adjustments
    entirely, they aren't really needed.  From Eric Dumazet.

20) ATM Iphase driver uses a data type with the same name as the S390
    headers, rename to fix the build.  From Heiko Carstens.

21) Fix a typo in copying the inner network header offset from one SKB
    to another, from Pravin B Shelar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  tcp: fix for zero packets_in_flight was too broad
  brcmsmac: rework of mac80211 .flush() callback operation
  ssb: unregister gpios before unloading ssb
  bcma: unregister gpios before unloading bcma
  rtlwifi: Fix scheduling while atomic bug
  net: usbnet: fix tx_dropped statistics
  tcp: ipv6: Update MIB counters for drops
  ...
2013-02-09 07:55:24 +11:00
Michael S. Tsirkin
71f1e45aa9 tcm_vhost: fix pr_err on early kick
It's OK to get kick before backend is set or after
it is cleared, we can just ignore it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-01-31 13:06:06 -08:00
Jason Wang
2b8b328b61 vhost_net: handle polling errors when setting backend
Currently, the polling errors were ignored, which can lead following issues:

- vhost remove itself unconditionally from waitqueue when stopping the poll,
  this may crash the kernel since the previous attempt of starting may fail to
  add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
  failed.

Solve this by:

- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
  will be checked and returned when userspace want to set the backend

After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:43:03 -05:00
Jason Wang
692a998b90 vhost_net: correct error handling in vhost_net_set_backend()
Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-29 15:43:03 -05:00
Kees Cook
43893cbefc drivers/vhost: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:52:45 -08:00
Linus Torvalds
5bd665f28d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "It has been a very busy development cycle this time around in target
  land, with the highlights including:

   - Kill struct se_subsystem_dev, in favor of direct se_device usage
     (hch)
   - Simplify reservations code by combining SPC-3 + SCSI-2 support for
     virtual backends only (hch)
   - Simplify ALUA code for virtual only backends, and remove left over
     abstractions (hch)
   - Pass sense_reason_t as return value for I/O submission path (hch)
   - Refactor MODE_SENSE emulation to allow for easier addition of new
     mode pages.  (roland)
   - Add emulation of MODE_SELECT (roland)
   - Fix bug in handling of ExpStatSN wrap-around (steve)
   - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
   - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
   - Convert ib_srpt to use modern target_submit_cmd caller + drop
     legacy ioctx->kref usage (nab)
   - Convert ib_srpt to use modern target_submit_tmr caller (nab)
   - Add link_magic for fabric allow_link destination target_items for
     symlinks within target_core_fabric_configfs.c code (nab)
   - Allocate pointers in instead of full structs for
     config_group->default_groups (sebastian)
   - Fix 32-bit highmem breakage for FILEIO (sebastian)

  All told, hch was able to shave off another ~1K LOC by killing the
  se_subsystem_dev abstraction, along with a number of PR + ALUA
  simplifications.  Also, a nice patch by Roland is the refactoring of
  MODE_SENSE handling, along with the addition of initial MODE_SELECT
  emulation support for virtual backends.

  Sebastian found a long-standing issue wrt to allocation of full
  config_group instead of pointers for config_group->default_group[]
  setup in a number of areas, which ends up saving memory with big
  configurations.  He also managed to fix another long-standing BUG wrt
  to broken 32-bit highmem support within the FILEIO backend driver.

  Thank you again to everyone who contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi_target: Add NodeACL tags for initiator group support
  target/tcm_fc: fix the lockdep warning due to inconsistent lock state
  sbp-target: fix error path in sbp_make_tpg()
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  iscsi-target: use kstrdup() for iscsi_param
  target/file: merge fd_do_readv() and fd_do_writev()
  target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
  target: Add link_magic for fabric allow_link destination target_items
  ib_srpt: Convert TMR path to target_submit_tmr
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
  target: Make spc_get_write_same_sectors return sector_t
  target/configfs: use kmalloc() instead of kzalloc() for default groups
  target/configfs: allocate only 6 slots for dev_cg->default_groups
  target/configfs: allocate pointers instead of full struct for default_groups
  target: update error handling for sbc_setup_write_same()
  iscsit: use GFP_ATOMIC under spin lock
  iscsi_target: Remove redundant null check before kfree
  target/iblock: Forward declare bio helpers
  target: Clean up flow in transport_check_aborted_status()
  target: Clean up logic in transport_put_cmd()
  ...
2012-12-15 14:25:10 -08:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Wei Yongjun
405d55c99d tcm_vhost: remove unused variable in vhost_scsi_allocate_cmd()
The variable se_sess is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:19 +02:00
Michael S. Tsirkin
f9611c43ab vhost-net: enable zerocopy tx by default
Zero copy TX has been around for a while now.
We seem to be down to eliminating theoretical bugs
and performance tuning at this point:
it's probably time to enable it by default so that
most users get the benefit.

Keep the flag around meanwhile so users can experiment
with disabling this if they experience regressions.
I expect that we will remove it in the future.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
cedb9bdce0 vhost-net: skip head management if no outstanding
For short packets zerocopy mode adds overhead
of managing heads which isn't necessary: we
could simly update used ring directly
same as with zerocopy disabled.

Things seem to run a bit faster if we detect
and bypass head management when zcopy isn't used.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
1280c27f8e vhost-net: flush outstanding DMAs on memory change
When memory map changes, we need to flush outstanding
DMAs as they might in theory reference old memory addresses.
To do this simply stop initiating new DMAs
and wait for ubufs ref count to drop to 0.
Afterwards reset the count back to 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
935cdee7ee vhost: avoid backend flush on vring ops
vring changes already do a flush internally where appropriate, so we do
not need a second flush.

It's currently not very expensive but a follow-up patch makes flush more
heavy-weight, so remove the extra flush here to avoid regressing
performance if call or kick fds are changed on data path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-12-06 17:09:18 +02:00
Michael S. Tsirkin
64e9a9b8a0 vhost-net: initialize zcopy packet counters
These packet counters are used to drive the zercopy
selection heuristic so nothing too bad happens if they are off a bit -
and they are also reset once in a while.
But it's cleaner to clear them when backend is set so that
we start in a known state.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:42:17 -05:00
David S. Miller
8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
Michael S. Tsirkin
bd97120fc3 vhost: fix length for cross region descriptor
If a single descriptor crosses a region, the
second chunk length should be decremented
by size translated so far, instead it includes
the full descriptor length.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:27:01 -05:00
Sachin Kamat
91aa76518d vhost: Remove duplicate inclusion of linux/vhost.h
linux/vhost.h was included twice.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19 14:51:40 -05:00
Masanari Iida
744627e91c treewide: fix printk typo in multiple drivers
Correct spelling typo in multiple drivers.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 11:08:17 +01:00
David S. Miller
d4185bbf62 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c

Minor conflict between the BCM_CNIC define removal in net-next
and a bug fix added to net.  Based upon a conflict resolution
patch posted by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-10 18:32:51 -05:00
Christoph Hellwig
de103c93af target: pass sense_reason as a return value
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values.  This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.

(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
      sense_reason_t with Roland's MODE SELECT changes)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-06 20:55:46 -08:00
Michael S. Tsirkin
24eb21a148 vhost-net: reduce vq polling on tx zerocopy
It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540c7.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:58 -04:00
Michael S. Tsirkin
eaae8132ef vhost-net: select tx zero copy dynamically
Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:58 -04:00
Michael S. Tsirkin
b211616d71 vhost: move -net specific code out
Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:58 -04:00
Michael S. Tsirkin
c4fcb586c3 vhost: track zero copy failures using DMA length
This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:57 -04:00
Michael S. Tsirkin
70e4cb9aaf vhost-net: cleanup macros for DMA status tracking
Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:57 -04:00
Michael S. Tsirkin
e19d6763cc skb: report completion status for zero copy skbs
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-02 21:29:57 -04:00
Michael S. Tsirkin
910a578f7e vhost: fix mergeable bufs on BE hosts
We copy head count to a 16 bit field, this works by chance on LE but on
BE guest gets 0. Fix it up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alexander Graf <agraf@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-24 23:19:30 -04:00
Linus Torvalds
a188e7e93a Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target updates from Nicholas Bellinger:
 "Things have been calm for the most part with no new fabric drivers in
  flight for v3.7 (we're up to eight now !), so this update is primarily
  focused on addressing a few long-standing items within target-core and
  iscsi-target fabric code.

  The highlights include:

   - target: Simplify fabric sense data length handling (roland)
   - qla2xxx: Fix endianness of task management response code (roland)
   - target: fix truncation of mode data, support zero allocation length
     (paolo)
   - target: Properly support zero-length commands in normal processing
     path (paolo)
   - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
     PDU (ronnie + nab)
   - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
     demo-mode (ronnie + nab)
   - target/file: Re-enable optional fd_buffered_io=1 operation (nab +
     hch)
   - iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
     initiator MDRSL declaration (nab)
   - target: Add target_submit_cmd_map_sgls for SGL fabric memory
     passthrough (nab + hch)
   - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
     nab)
   - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
     + hch)

  The last series for adding a new target_submit_cmd_map_sgls() fabric
  caller (as requested by hch) that accepts pre-allocated SGL memory
  (using existing logic), along with converting tcm_loop + tcm_vhost has
  only been in -next for the last days, but has gotten enough review
  +testing and is clear enough a mechanical change that I think it's
  reasonable to merge for -rc1 code.

  Thanks again to everyone who contributed this round! Extra special
  thanks to Roland (PureStorage) for tracking down the qla2xxx target
  TMR response code endian issue, and to Paolo (Redhat) for resolving
  the long standing zero-length CDB issues within target-core between
  virtual and pSCSI backends."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
  iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
  iscsit: proper endianess conversions
  iscsit: use the itt_t abstract type
  iscsit: add missing endianess conversion in iscsit_check_inaddr_any
  iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
  iscsit: mark various functions static
  target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
  target/usb-gadget: strlen() doesn't count the terminator
  target/usb-gadget: remove duplicate initialization
  tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add control CDB READ payload zero work-around
  tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
  iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
  iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
  iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
  iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
  iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
  iscsi-target: Add base MaxXmitDataSegmentLength code
  target/file: Re-enable optional fd_buffered_io=1 operation
  ...
2012-10-10 19:52:19 +09:00
Nicholas Bellinger
9f0abc1554 tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
This patch converts tcm_vhost to use target_submit_cmd_map_sgls() for
I/O submission and mapping of pre-allocated SGL memory from incoming
virtio-scsi SGL memory -> se_cmd descriptors.

This includes removing the original open-coded fabric uses of target
core callers to support transport_generic_map_mem_to_cmd() between
target_setup_cmd_from_cdb() and transport_handle_cdb_direct() logic.

It also includes adding a handful of new tcm_vhost_cmnd member +
assignments in vhost_scsi_allocate_cmd() used from cmwq process
context I/O submission within tcm_vhost_submission_work()

(v2: Use renamed target_submit_cmd_map_sgls)

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-10-02 14:16:20 -07:00
Al Viro
cecb46f194 vhost_set_vring(): turn pollstart/pollstop into bool
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26 21:10:12 -04:00
Roland Dreier
9c58b7ddd7 target: Simplify fabric sense data length handling
Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2.  However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().

So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core.  Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.

(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
      change transport_get_sense_buffer to follow v3.6-rc6 code w/o
      ->set_fabric_sense_len usage)

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 17:12:58 -07:00
Roland Dreier
2ed772b7b9 target: Remove unused target_core_fabric_ops.get_fabric_sense_len method
There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 16:15:47 -07:00
Michael S. Tsirkin
6de7145ca3 tcm_vhost: Fix vhost_scsi_target structure alignment
Here TRANSPORT_IQN_LEN is 224, which is a multiple of 4.
Since vhost_tpgt is 2 bytes and abi_version is 4, the total size would
be 230.  But gcc needs struct size be aligned to first field size, which
is 4 bytes, so it pads the structure by extra 2 bytes to the total of
232.

This padding is very undesirable in an ABI:
- it can not be initialized easily
- it can not be checked easily
- it can leak information between kernel and userspace

Simplest solution is probably just to make the padding
explicit.

(v2: Add check for zero'ed backend->reserved field for VHOST_SCSI_SET_ENDPOINT
     and VHOST_SCSI_CLEAR_ENDPOINT ops as requested by MST)

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-08-20 14:52:11 -07:00
Nicholas Bellinger
5b7517f814 tcm_vhost: Change vhost_scsi_target->vhost_wwpn to char *
This patch changes the vhost_scsi_target->vhost_wwpn[] type used
by VHOST_SCSI_* ioctls to 'char *' as requested by Blue Swirl in
order to match the latest QEMU vhost-scsi RFC-v3 userspace code.

Queuing this up into target-pending/master for a -rc3 PULL.

Reported-by: Blue Swirl <blauwirbel@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-08-16 19:03:13 -07:00
Nicholas Bellinger
101998f6fc tcm_vhost: Post-merge review changes requested by MST
This patch contains the post RFC-v5 (post-merge) changes, this includes:

- Add locking comment
- Move vhost_scsi_complete_cmd ahead of TFO callbacks in order to
  drop forward declarations
- Drop extra '!= NULL' usage in vhost_scsi_complete_cmd_work()
- Change vhost_scsi_*_handle_kick() to use pr_debug
- Fix possible race in vhost_scsi_set_endpoint() for vs->vs_tpg checking
  + assignment.
- Convert tv_tpg->tpg_vhost_count + ->tv_tpg_port_count from atomic_t ->
  int, and make sure reference is protected by ->tv_tpg_mutex.
- Drop unnecessary vhost_scsi->vhost_ref_cnt
- Add 'err:' label for exception path in vhost_scsi_clear_endpoint()
- Add enum for VQ numbers, add usage in vhost_scsi_open()
- Add vhost_scsi_flush() + vhost_scsi_flush_vq() following
  drivers/vhost/net.c
- Add smp_wmb() + vhost_scsi_flush() call during vhost_scsi_set_features()
- Drop unnecessary copy_from_user() usage with GET_ABI_VERSION ioctl
- Add missing vhost_scsi_compat_ioctl() caller for vhost_scsi_fops
- Fix function parameter definition first line to follow existing
  vhost code style
- Change 'vHost' usage -> 'vhost' in handful of locations
- Change -EPERM -> -EBUSY usage for two failures in tcm_vhost_drop_nexus()
- Add comment for tcm_vhost_workqueue in tcm_vhost_init()
- Make GET_ABI_VERSION return 'int' + add comment in tcm_vhost.h

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-08-16 17:33:40 -07:00
Fengguang Wu
f0e0e9bba5 tcm_vhost: Fix incorrect IS_ERR() usage in vhost_scsi_map_iov_to_sgl
Fix up a new coccinelle warnings reported by Fengguang Wu + Intel
0-DAY kernel build testing backend:

drivers/vhost/tcm_vhost.c:537:23-29: ERROR: allocation function on line
533 returns NULL not ERR_PTR on failure

vim +537 drivers/vhost/tcm_vhost.c
   534          if (!sg)
   535                  return -ENOMEM;
   536          pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__,
 > 537                 sg, sgl_count, IS_ERR(sg));
   538          sg_init_table(sg, sgl_count);
   539
   540          tv_cmd->tvc_sgl = sg;

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-08-16 17:33:28 -07:00
Nicholas Bellinger
057cbf49a1 tcm_vhost: Initial merge for vhost level target fabric driver
This patch adds the initial code for tcm_vhost, a Vhost level TCM
fabric driver for virtio SCSI initiators into KVM guest.

This code is currently up and running on v3.5-rc2 host+guest
from target-pending/for-next-merge.

Using tcm_vhost requires Zhi's -> Stefan -> nab's qemu vhost-scsi tree here:

  http://git.kernel.org/?p=virt/kvm/nab/qemu-kvm.git;a=shortlog;h=refs/heads/vhost-scsi

--

Changelog v4 -> v5:

  Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
  starting point for v3.6-rc code (Stefan + ALiguori + nab)
  Convert vhost_scsi_handle_vq() to vq_err() (nab + MST)
  Minor style fixes from checkpatch (nab)

Changelog v3 -> v4:

  Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
  Use TRANSPORT_IQN_LEN in vhost_scsi_target->vhost_wwpn[] def (nab)
  Move back to drivers/vhost/, and just use drivers/vhost/Kconfig.tcm (mst)
  Move TCM_VHOST related ioctl defines from include/linux/vhost.h ->
  drivers/vhost/tcm_vhost.h as requested by MST (nab)
  Move Kbuild.tcm include from drivers/staging -> drivers/vhost/, and
  just use 'if STAGING' around 'source drivers/vhost/Kbuild.tcm'

Changelog v2 -> v3:

  Unlock on error in tcm_vhost_drop_nexus() (DanC)
  Fix strlen() doesn't count the terminator (DanC)
  Call kfree() on an error path (DanC)
  Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
  Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
  Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
  as requested by MST (nab)

Changelog v1 -> v2:

  Fix tv_cmd completion -> release SGL memory leak (nab)
  Fix sparse warnings for static variable usage ((Fengguang Wu)
  Fix sparse warnings for min() typing + printk format specs (Fengguang Wu)
  Convert to cmwq submission for I/O dispatch (nab + hch)

Changelog v0 -> v1:

  Merge into single source + header file, and move to drivers/vhost/

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-07-29 13:49:10 -07:00
Stefan Hajnoczi
163049aefd vhost: make vhost work queue visible
The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm.  Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions.  However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-07-22 01:22:23 +03:00
Stefan Hajnoczi
0dd05a3b60 vhost: Separate vhost-net features from vhost features
In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
constant.

(Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Asias He <asias@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2012-07-22 01:21:53 +03:00
Jens Freimann
d7ffde35e3 vhost: use USER_DS in vhost_worker thread
On some architectures address spaces are set up in a way that this is
not necessary to work properly but on some others (like s390) it is.
Make sure we operate on the user address space to allow copy_xxx_user()
from the vhost_worker() thread by setting it explicitly before calling
use_mm() and revert it after unuse_mm().

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26 21:10:56 -07:00
David S. Miller
028940342a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-05-16 22:17:37 -04:00
Basil Gor
c53cff5e42 vhost-net: fix handle_rx buffer size
Take vlan header length into account, when vlan id is stored as
vlan_tci. Otherwise tagged packets coming from macvtap will be
truncated.

Signed-off-by: Basil Gor <basil.gor@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 18:16:57 -04:00