Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...
* part_stat_*() now updates part0 together if the specified partition
is not part0. ie. part_stat_*() are now essentially all_stat_*().
* {disk|all}_stat_*() are gone.
* part_round_stats() is updated similary. It handles part0 stats
automatically and disk_round_stats() is killed.
* part_{inc|dec}_in_fligh() is implemented which automatically updates
part0 stats for parts other than part0.
* disk_map_sector_rcu() is updated to return part0 if no part matches.
Combined with the above changes, this makes NULL special case
handling in callers unnecessary.
* Separate stats show code paths for disk are collapsed into part
stats show code paths.
* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move disk->capacity to part0->nr_sects and convert all users who
directly accessed the field to use {get|set}_capacity(). This is done
early to allow the __dev field to be moved.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev. To make sure no
user is left behind, rename generic devices fields to __dev.
This is in preparation of unifying partition 0 handling with other
partitions.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There are two variants of stat functions - ones prefixed with double
underbars which don't care about preemption and ones without which
disable preemption before manipulating per-cpu counters. It's unclear
whether the underbarred ones assume that preemtion is disabled on
entry as some callers don't do that.
This patch unifies diskstats access by implementing disk_stat_lock()
and disk_stat_unlock() which take care of both RCU (for partition
access) and preemption (for per-cpu counter access). diskstats access
should always be enclosed between the two functions. As such, there's
no need for the versions which disables preemption. They're removed
and double underbars ones are renamed to drop the underbars. As an
extra argument is added, there's no danger of using the old version
unconverted.
disk_stat_lock() uses get_cpu() and returns the cpu index and all
diskstat functions which access per-cpu counters now has @cpu
argument to help RT.
This change adds RCU or preemption operations at some places but also
collapses several preemption ops into one at others. Overall, the
performance difference should be negligible as all involved ops are
very lightweight per-cpu ones.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
disk->part[] is protected by its matching bdev's lock. However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking. As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses. As some of those accesses are
writes, this theoretically can lead to silent corruption.
This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.
* Rename disk->part[] to disk->__part[] to make sure no one outside
genhd layer proper accesses it directly.
* Use RCU for disk->__part[] dereferencing.
* Implement disk_{get|put}_part() which can be used to get and put
partitions from gendisk respectively.
* Iterators are implemented to help iterate through all partitions
safely.
* Functions which require RCU readlock are marked with _rcu suffix.
* Use disk_put_part() in __blkdev_put() instead of directly putting
the contained kobject.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* Implement disk_devt() and part_devt() and use them to directly
access devt instead of computing it from ->major and ->first_minor.
Note that all references to ->major and ->first_minor outside of
block layer is used to determine devt of the disk (the part0) and as
->major and ->first_minor will continue to represent devt for the
disk, converting these users aren't strictly necessary. However,
convert them for consistency.
* Implement disk_max_parts() to avoid directly deferencing
genhd->minors.
* Update bdget_disk() such that it doesn't assume consecutive minor
space.
* Move devt computation from register_disk() to add_disk() and make it
the only one (all other usages use the initially determined value).
These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.
* implment part_to_disk()
* fix comment about gendisk->part indexing
* rename get_part() to disk_map_sector()
* don't use n which is always zero while printing disk information in
diskstats_show()
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.
That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.
This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.
This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This reverts commit 5b6155ee70, because
the block device ioctl's really aren't ready for it.
In particular, the "struct file *" and the "struct inode *" arguments do
not necessarily match, which means that the unlocked version of the
ioctl (that only gets a "struct file *") isn't actually able to handle
the cases it needs to handle.
This fixes bugzilla
http://bugzilla.kernel.org/show_bug.cgi?id=11401
Reported-and-bisected-by: Laurent Riffard <laurent.riffard@free.fr>
Acked-by: Peter Osterlund <petero2@telia.com>
Cc: Alan Cox <alan@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name of brd block device is "ramdisk", it's not "brd".
(The block device is registered by register_blkdev(RAMDISK_MAJOR, "ramdisk")
So it should be unregistered by unregister_blkdev(RAMDISK_MAJOR, "ramdisk")
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We leak the memory allocated for the nbd_dev array at multiple places.
Fix them by either adding a kfree() or by rearranging code to return
before we allocate the memory.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the needlessly global blkif_ioctl() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Bug fix. If SCSI tape support is turned off we get an implicit declaration
of cciss_unregister_scsi error in cciss_remove_one.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds support for multi-lun devices in a SAS environment. It's
required for the support of media changers.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch changes way we notify the scsi layer that something has changed
on the SCSI tape side of the driver. The user can now just tell the driver
to rescan a particular controller rather than having to know the SCSI nexus
to echo into the SCSI mid-layer.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch fixes a problem where the logical volume count may go negative.
In some instances if several logical are configured on a controller and all
of them are deleted using the online utilities the volume count in /proc may
go negative with no way get it correct again.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch removes redundant code where ever logical volumes are added or
removed. It adds 3 new functions that are called instead of having the same
code spread throughout the driver. It also removes the cciss_getgeometry
function.
The patch is fairly complex but we haven't figured out how to make it any
simpler and still do everything that needs to be done. Some of the
complexity comes from having to special case booting from cciss. Otherwise
the gendisk doesn't get added in time and the switchroot will fail.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch makes the rebuild_lun_table smart enough to not rip a logical
volume out from under the OS. Without this fix if a customer is running
hpacucli to monitor their storage the driver will blindly remove and re-add
the disks whenever the utility calls the CCISS_REGNEWD ioctl. Unfortunately,
both hpacucli and ACUXE call the ioctl repeatedly. Customers have reported
IO coming to a standstill. Calling the ioctl is the problem, this patch is
the fix.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Return -EFAULT instead of -ENOMEM if copy_from_user() fails.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
curr_queue is a local variable in a for loop, and it's being initialized
at the start of each loop. So any assignment at the end of the loop is
pointless.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some module parameters with only one line have the '\n' at the end of the
description. This is not needed nor wanted as after the description the
type (i.e. int) is followed by a newline.
Some modules contain a multi-line description, these are not affected
by this patch.
Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Ed L. Cashin <ecashin@coraid.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Roland Dreier <rolandd@cisco.com>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ATA over Ethernet: The semaphore emsgs_sema is used for signalling an
event, convert it in a completion.
Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently virtio_blk assumes a 512 byte hard sector size. This can cause
trouble / performance issues if the backing has a different block size
(like a file on an ext3 file system formatted with 4k block size or a dasd).
Lets add a feature flag that tells the guest to use a different hard sector
size than 512 byte.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to the tests in do_initcalls(), the proper error code in case no
device is found is -ENODEV, not -ENXIO or -EIO.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code that needed this #include was removed one year ago.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Many people will see this option the first time now that it is in
drivers/block/
Make it clear that virtually noone needs it.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This patch moves hd.c to drivers/block/
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: rmk@arm.linux.org.uk
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (80 commits)
ide-floppy: fix unfortunate function naming
ide-tape: unify idetape_create_read/write_cmd
ide: add ide_pc_intr() helper
ide-{floppy,scsi}: read Status Register before stopping DMA engine
ide-scsi: add more debugging to idescsi_pc_intr()
ide-scsi: use pc->callback
ide-floppy: add more debugging to idefloppy_pc_intr()
ide-tape: always log debug info in idetape_pc_intr() if debugging is enabled
ide-tape: add ide_tape_io_buffers() helper
ide-tape: factor out DSC handling from idetape_pc_intr()
ide-{floppy,tape}: move checking of ->failed_pc to ->callback
ide: add ide_issue_pc() helper
ide: add PC_FLAG_DRQ_INTERRUPT pc flag
ide-scsi: move idescsi_map_sg() call out from idescsi_issue_pc()
ide: add ide_transfer_pc() helper
ide-scsi: set drive->scsi flag for devices handled by the driver
ide-{cd,floppy,tape}: remove checking for drive->scsi
ide: add PC_FLAG_ZIP_DRIVE pc flag
ide-tape: factor out waiting for good ireason from idetape_transfer_pc()
ide-tape: set PC_FLAG_DMA_IN_PROGRESS flag in idetape_transfer_pc()
...
pd_special_command uses blk_put_request with struct request on the
stack. As a result, blk_put_request needs a hack to catch a NULL
request_queue. This converts pd_special_command to use
blk_execute_rq.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (37 commits)
splice: fix generic_file_splice_read() race with page invalidation
ramfs: enable splice write
drivers/block/pktcdvd.c: avoid useless memset
cdrom: revert commit 22a9189 (cdrom: use kmalloced buffers instead of buffers on stack)
scsi: sr avoids useless buffer allocation
block: blk_rq_map_kern uses the bounce buffers for stack buffers
block: add blk_queue_update_dma_pad
DAC960: push down BKL
pktcdvd: push BKL down into driver
paride: push ioctl down into driver
block: use get_unaligned_* helpers
block: extend queue_flag bitops
block: request_module(): use format string
Add bvec_merge_data to handle stacked devices and ->merge_bvec()
block: integrity flags can't use bit ops on unsigned short
cmdfilter: extend default read filter
sg: fix odd style (extra parenthesis) introduced by cmd filter patch
block: add bounce support to blk_rq_map_user_iov
cfq-iosched: get rid of enable_idle being unused warning
allow userspace to modify scsi command filter on per device basis
...
This patch changes the way we determine the maximum number of outstanding
commands for each controller.
Most Smart Array controllers can support up to 1024 commands, the notable
exceptions are the E200 and E200i.
The next generation of controllers which were just added support a mode of
operation called Zero Memory Raid (ZMR). In this mode they only support
64 outstanding commands. In Full Function Raid (FFR) mode they support
1024.
We have been setting the queue depth by arbitrarily assigning some value
for each controller. We needed a better way to set the queue depth to
avoid lots of annoying "fifo full" messages. So we made the driver a
little smarter. We now read the config table and subtract 4 from the
returned value. The -4 is to allow some room for ioctl calls which are
not tracked the same way as io commands are tracked.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix regression in cciss driver that if no logical drives are configured,
no device nodes at all get created.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cca.cpqcorp.net>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid the 'memset(...,0, ...)' before calling 'init_cdrom_command' because
this function already does it.
Signed-off-by: Christophe Jaillet <jaillet.christophe@wanadoo.fr>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Push the lock_kernel down into the driver and switch to unlocked_ioctl
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Leaves us with lock_kernel for two methods. Also remove a bogus printk
with no printk level and return -ENOTTY not -EINVAL for correctness.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(Jens: added smp_lock.h include to pt.c, otherwise it wont compile because
of missing {un}lock_kernel() definition)
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When devices are stacked, one device's merge_bvec_fn may need to perform
the mapping and then call one or more functions for its underlying devices.
The following bio fields are used:
bio->bi_sector
bio->bi_bdev
bio->bi_size
bio->bi_rw using bio_data_dir()
This patch creates a new struct bvec_merge_data holding a copy of those
fields to avoid having to change them directly in the struct bio when
going down the stack only to have to change them back again on the way
back up. (And then when the bio gets mapped for real, the whole
exercise gets repeated, but that's a problem for another day...)
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Avoid allocations causing swap activity on the resume path by
preventing the allocations from doing IO and allowing them
to access the emergency pools.
These paths are used when a frontend device is trying to connect
to its backend driver over Xenbus. These reconnections are triggered
on demand by IO, so by definition there is already IO underway,
and further IO would naturally deadlock. On resume, this path
is triggered when the running system tries to continue using its
devices. If it cannot then the resume will fail; to try to avoid this
we let it dip into the emergency pools.
[ linux-2.6.18-xen changesets e8b49cfbdac, fdb998e79aba ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Return 0 instead of -EINVAL if the blkfront device is a cdrom,
i.e. had the VDISK_CDROM attribute. This allows udev's cdrom_id
to correctly detect the device as a cdrom device.
[ Add blkif_ioctl, and CDROMMULTISESSION ]
[ linux-2.6.18-xen changeset d2bd9af846b5 ]
Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add support for the next generation of HP Smart Array SAS/SATA
controllers. Shipping date is late Fall 2008.
Bump the driver version to 3.6.20 to reflect the new hardware support from
patch 1 of this set.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>