We normally just use the BIO_UPTODATE flag to signal 0/-EIO. If
we have more information available, we should pass that along to
the trace output.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
blktrace.c block bio complete callback needs to gain a new argument to reflect
the newly added "error" tracepoint argument. This is needed to match the new
block_bio_complete TRACE_EVENT as of
commit de983a7bfcb7c020901ca6e2314cf55a4207ab5a.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Jeff Moyer <jmoyer@redhat.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cfq_group->ref is used with queue_lock hold, the only exception is
cfq_set_request, which looks like a bug to me, so ref doesn't need
to be an atomic and atomic operation is slower.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cfq_queue->ref is used with queue_lock hold, so ref doesn't need to be an atomic
and atomic operation is slower.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The "error" field in block_bio_complete is not assigned, leaving the memory area
uninitialized (keeping garbage data). Pass an additional tracepoint argument to
this event to initialize this field.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Alan.Brunelle@hp.com
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We can't use krefs since it's apparently restricted to very basic
reference counting.
This reverts commit e4a683c8.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
/proc/diskstats would display a strange output as follows.
$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137
Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.
The detailed root cause is as follows.
Assuming that there are two partition, sda1 and sda2.
1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.
| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------
3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.
| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------
The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.
Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Add kref_test_and_get() function, which atomically add a reference only if
refcount is not zero. This prevent to add a reference to an object that is
already being removed.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Work items processed by kintegrityd_wq won't block much, may burn a
lot of CPU cycles and affect IO latency. Use alloc_workqueue() to
mark it highpri and CPU intensive with max concurrency of 1.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
kblockd is used for unplugging and may affect IO latency and
throughput and the max number of concurrent work items are bound by
the number of block devices. Make it HIGHPRI workqueue w/ default max
concurrency.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This reverts commit c8d2e93735.
We run into merging problems with the SCSI tree, revert this one
so it can be handled by a postmerge tree there.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch fixes a spelling error in a source code comment and removes
superfluous braces in the function exit_io_context().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Commit a8adbe3 forgot to remove the return variable, kill it.
drivers/block/loop.c: In function 'lo_splice_actor':
drivers/block/loop.c:398: warning: unused variable 'ret'
[...]
fs/nfsd/vfs.c: In function 'nfsd_splice_actor':
fs/nfsd/vfs.c:848: warning: unused variable 'ret'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The major/minor device numbers are always defined and used as `unsigned'.
Signed-off-by: Yang Zhang <kthreadd@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
When cfq_choose_cfqg() is called in select_queue(), there must be at least one
backlogged CFQ queue waiting for dispatching, hence there must be at least one
backlogged CFQ group on service tree. So we never call choose_service_tree()
with cfqg == NULL.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch pulls calls to buf->ops->confirm() from all actors passed
(also indirectly) to splice_from_pipe_feed().
Is avoiding the call to buf->ops->confirm() while splice()ing to
/dev/null is an intentional optimization? No other user does that
and this will remove this special case.
Against current linux.git 6313e3c217.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Replace sd_media_change() with sd_check_events(). sd used to set the
changed state whenever the device is not ready, which can cause event
loop while the device is not ready. Media presence handling code is
changed such that the changed state is set iff the media presence
actually changes. UA still always sets the changed state and
NOT_READY always (at least where it used to set ->changed) clears
media presence, so no event is lost.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Replace sr_media_change() with sr_check_events(). It normally only
uses GET_EVENT_STATUS_NOTIFICATION to check both media change and
eject request. If @clearing includes DISK_EVENT_MEDIA_CHANGE, it
issues TUR and compares whether media presence has changed. The SCSI
specific media change uevent is kept for compatibility.
sr_media_change() was doing both media change check and revalidation.
The revalidation part is split into sr_block_revalidate_disk().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The usage of TUR has been confusing involving several different
commits updating different parts over time. Currently, the only
differences between scsi_test_unit_ready() and sr_test_unit_ready()
are,
* scsi_test_unit_ready() also sets sdev->changed on NOT_READY.
* scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
NOT_READY.
Due to the above two differences, sr is using its own
sr_test_unit_ready(), but sd - the sole user of the above extra
handling - doesn't even need them.
Where scsi_test_unit_ready() is used in sd_media_changed(), the code
is looking for device ready w/ media present state which is true iff
TUR succeeds w/o sense data or UA, and when the device is not ready
for whatever reason sd_media_changed() explicitly marks media as
missing so there's no reason to set sdev->changed automatically from
scsi_test_unit_ready() on NOT_READY.
Drop both special handlings from scsi_test_unit_ready(), which makes
it equivalant to sr_test_unit_ready(), and replace
sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop the
unnecessary explicit NOT_READY check from sd_media_changed().
Checking return value is enough for testing device readiness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
sr_test_unit_ready() returns 0 iff TUR succeeded - IOW, when media is
present and the device is actually ready, so the return value wouldn't
be zero when TUR ends with sense data. sr_media_change() incorrectly
tests (retval || (scsi_sense_valid(sshdr)...)) when it tries to test
whether TUR failed without sense data or with sense data indicating
media-not-present.
Fix the test using scsi_status_is_good() and update comments.
- Fixed a comment typo spotted by Eike.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
In principle, cdrom just needs to pass through ->check_events() but
CDROM_MEDIA_CHANGED ioctl makes things a bit more complex. Just as
with ->media_changed() support, cdrom code needs to buffer the events
and serve them to ioctl and vfs as requested.
As the code has to deal with both ->check_events() and
->media_changed(), and vfs and ioctl event buffering, this patch adds
check_events caching on top of the existing cdi->mc_flags buffering.
It may be a good idea to deprecate CDROM_MEDIA_CHANGED ioctl and
remove all this mess.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently, media presence polling for removeable block devices is done
from userland. There are several issues with this.
* Polling is done by periodically opening the device. For SCSI
devices, the command sequence generated by such action involves a
few different commands including TEST_UNIT_READY. This behavior,
while perfectly legal, is different from Windows which only issues
single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
ATAPI devices lock up after being periodically queried such command
sequences.
* There is no reliable and unintrusive way for a userland program to
tell whether the target device is safe for media presence polling.
For example, polling for media presence during an on-going burning
session can make it fail. The polling program can avoid this by
opening the device with O_EXCL but then it risks making a valid
exclusive user of the device fail w/ -EBUSY.
* Userland polling is unnecessarily heavy and in-kernel implementation
is lighter and better coordinated (workqueue, timer slack).
This patch implements framework for in-kernel disk event handling,
which includes media presence polling.
* bdops->check_events() is added, which supercedes ->media_changed().
It should check whether there's any pending event and return if so.
Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be
called parallelly.
* gendisk->events and ->async_events are added. These should be
initialized by block driver before passing the device to add_disk().
The former contains the mask of all supported events and the latter
the mask of all events which the device can report without polling.
/sys/block/*/events[_async] export these to userland.
* Kernel parameter block.events_dfl_poll_msecs controls the system
polling interval (default is 0 which means disable) and
/sys/block/*/events_poll_msecs control polling intervals for
individual devices (default is -1 meaning use system setting). Note
that if a device can report all supported events asynchronously and
its polling interval isn't explicitly set, the device won't be
polled regardless of the system polling interval.
* If a device is opened exclusively with write access, event checking
is automatically disabled until all write exclusive accesses are
released.
* There are event 'clearing' events. For example, both of currently
defined events are cleared after the device has been successfully
opened. This information is passed to ->check_events() callback
using @clearing argument as a hint.
* Event checking is always performed from system_nrt_wq and timer
slack is set to 25% for polling.
* Nothing changes for drivers which implement ->media_changed() but
not ->check_events(). Going forward, all drivers will be converted
to ->check_events() and ->media_change() will be dropped.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
There's no reason for register_disk() and del_gendisk() to be in
fs/partitions/check.c. Move both to genhd.c. While at it, collapse
unlink_gendisk(), which was artificially in a separate function due to
genhd.c / check.c split, into del_gendisk().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
If priority is changed, continuing to check workload_expires and service tree
count of the previous workload does not make sense. We should always choose
the workload with lowest key of new priority in such case.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It's able to check whether a CFQ group on a service tree by
checking "cfqg->rb_node". There's no need to maintain an
extra flag here.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
When a cfq group is running, it won't be dequeued from service tree, so
there's no need to store the active one in st->active. Just gid rid of it.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n. This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.
The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller. All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
arm: omap1: devices: need to return with a value
OMAP1: camera.h: add missing include
omap: dma: Add read-back to DMA interrupt handler to avoid spuriousinterrupts
OMAP2: Devkit8000: Fix mmc regulator failure
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (w83795) Check for BEEP pin availability
hwmon: (w83795) Clear intrusion alarm immediately
hwmon: (w83795) Read the intrusion state properly
hwmon: (w83795) Print the actual temperature channels as sources
hwmon: (w83795) List all usable temperature sources
hwmon: (w83795) Expose fan control method
hwmon: (w83795) Fix fan control mode attributes
hwmon: (lm95241) Check validity of input values
hwmon: Change mail address of Hans J. Koch
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: sysfs: fix printk warnings
PCI: fix pci_bus_alloc_resource() hang, prefer positive decode
PCI: read current power state at enable time
PCI: fix size checks for mmap() on /proc/bus/pci files
x86/PCI: coalesce overlapping host bridge windows
PCI hotplug: ibmphp: Add check to prevent reading beyond mapped area
Make sure I2C adapters being registered have the required struct
fields set. If they don't, problems will happen later.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
It's about time to make it clear that i2c_adapter.id is deprecated.
Hopefully this will remind the last user to move over to a different
strategy.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Drivers don't need to include <linux/i2c-id.h>, especially not when
they don't use anything that header file provides.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Michael Hunold <michael@mihu.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Delete unused I2C adapter IDs. Special cases are:
* I2C_HW_B_RIVA was still set in driver rivafb, however no other
driver is ever looking for this value, so we can safely remove it.
* I2C_HW_B_HDPVR is used in staging driver lirc_zilog, however no
adapter ID is ever set to this value, so the code in question never
runs. As the code additionally expects that I2C_HW_B_HDPVR may not
be defined, we can delete it now and let the lirc_zilog driver
maintainer rewrite this piece of code.
Big thanks for Hans Verkuil for doing all the hard work :)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
A few new i2c-drivers came into the kernel which clear the clientdata-pointer
on exit. This is obsolete meanwhile, so fix it and hope the word will spread.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Alan Cox <alan@linux.intel.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Move the logging bits from kernel.h into printk.h so that
there is a bit more logical separation of the generic from
the printk logging specific parts.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fix in commit 6b4e81db25 ("i8k: Tell gcc that *regs gets
clobbered") to work around the gcc miscompiling i8k.c to add "+m
(*regs)" caused register pressure problems and a build failure.
Changing the 'asm' statement to 'asm volatile' instead should prevent
that and works around the gcc bug as well, so we can remove the "+m".
[ Background on the gcc bug: a memory clobber fails to mark the function
the asm resides in as non-pure (aka "__attribute__((const))"), so if
the function does nothing else that triggers the non-pure logic, gcc
will think that that function has no side effects at all. As a result,
callers will be mis-compiled.
Adding the "+m" made gcc see that it's not a pure function, and so
does "asm volatile". The problem was never really the need to mark
"*regs" as changed, since the memory clobber did that part - the
problem was just a bug in the gcc "pure" function analysis - Linus ]
Signed-off-by: Jim Bos <jim876@xs4all.nl>
Acked-by: Jakub Jelinek <jakub@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On the W83795ADG, there's a single pin for BEEP and OVT#, so you
can't have both. Check the configuration and don't create beep
attributes when BEEP pin is not available.
The W83795G has a dedicated BEEP pin so the functionality is always
available there.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
When asked to clear the intrusion alarm, do so immediately. We have to
invalidate the cache to make sure the new status will be read. But we
also have to read from the status register once to clear the pending
alarm, as writing to CLR_CHS surprising won't clear it automatically.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
We can't read the intrusion state from the real-time alarm registers
as we do for all other alarm flags, because real-time alarm bits don't
stick (by definition) and the intrusion state has to stick until
explicitly cleared (otherwise it has little value.)
So we have to use the interrupt status register instead, which is read
from the same address but with a configuration bit flipped in another
register.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>