This function is used by the block layer queue to bail out of
requests if the current request is towards an RPMB
"block device".
This was done to avoid boot time scanning of this "block
device" which was never really a block device, thus duct-taping
over the fact that it was badly engineered.
This problem is now gone as we removed the offending RPMB block
device in another patch and replaced it with a character
device.
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The RPMB partition on the eMMC devices is a special area used
for storing cryptographically safe information signed by a
special secret key. To write and read records from this special
area, authentication is needed.
The RPMB area is *only* and *exclusively* accessed using
ioctl():s from userspace. It is not really a block device,
as blocks cannot be read or written from the device, also
the signed chunks that can be stored on the RPMB are actually
256 bytes, not 512 making a block device a real bad fit.
Currently the RPMB partition spawns a separate block device
named /dev/mmcblkNrpmb for each device with an RPMB partition,
including the creation of a block queue with its own kernel
thread and all overhead associated with this. On the Ux500
HREFv60 platform, for example, the two eMMCs means that two
block queues with separate threads are created for no use
whatsoever.
I have concluded that this block device design for RPMB is
actually pretty wrong. The RPMB area should have been designed
to be accessed from /dev/mmcblkN directly, using ioctl()s on
the main block device. It is however way too late to change
that, since userspace expects to open an RPMB device in
/dev/mmcblkNrpmb and we cannot break userspace.
This patch tries to amend the situation using the following
strategy:
- Stop creating a block device for the RPMB partition/area
- Instead create a custom, dynamic character device with
the same name.
- Make this new character device support exactly the same
set of ioctl()s as the old block device.
- Wrap the requests back to the same ioctl() handlers, but
issue them on the block queue of the main partition/area,
i.e. /dev/mmcblkN
We need to create a special "rpmb" bus type in order to get
udev and/or busybox hot/coldplug to instantiate the device
node properly.
Before the patch, this appears in 'ps aux':
101 root 0:00 [mmcqd/2rpmb]
123 root 0:00 [mmcqd/3rpmb]
After applying the patch these surplus block queue threads
are gone, but RPMB is as usable as ever using the userspace
MMC tools, such as 'mmc rpmb read-counter'.
We get instead those dynamice devices in /dev:
brw-rw---- 1 root root 179, 0 Jan 1 2000 mmcblk0
brw-rw---- 1 root root 179, 1 Jan 1 2000 mmcblk0p1
brw-rw---- 1 root root 179, 2 Jan 1 2000 mmcblk0p2
brw-rw---- 1 root root 179, 5 Jan 1 2000 mmcblk0p5
brw-rw---- 1 root root 179, 8 Jan 1 2000 mmcblk2
brw-rw---- 1 root root 179, 16 Jan 1 2000 mmcblk2boot0
brw-rw---- 1 root root 179, 24 Jan 1 2000 mmcblk2boot1
crw-rw---- 1 root root 248, 0 Jan 1 2000 mmcblk2rpmb
brw-rw---- 1 root root 179, 32 Jan 1 2000 mmcblk3
brw-rw---- 1 root root 179, 40 Jan 1 2000 mmcblk3boot0
brw-rw---- 1 root root 179, 48 Jan 1 2000 mmcblk3boot1
brw-rw---- 1 root root 179, 33 Jan 1 2000 mmcblk3p1
crw-rw---- 1 root root 248, 1 Jan 1 2000 mmcblk3rpmb
Notice the (248,0) and (248,1) character devices for RPMB.
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
In may, Steven sent a patch deleting the bounce buffer handling
and the CONFIG_MMC_BLOCK_BOUNCE option.
I chose the less invasive path of making it a runtime config
option, and we merged that successfully for kernel v4.12.
The code is however just standing in the way and taking up
space for seemingly no gain on any systems in wide use today.
Pierre says the code was there to improve speed on TI SDHCI
controllers on certain HP laptops and possibly some Ricoh
controllers as well. Early SDHCI controllers lacked the
scatter-gather feature, which made software bounce buffers
a significant speed boost.
We are clearly talking about the list of SDHCI PCI-based
MMC/SD card readers found in the pci_ids[] list in
drivers/mmc/host/sdhci-pci-core.c.
The TI SDHCI derivative is not supported by the upstream
kernel. This leaves the Ricoh.
What we can however notice is that the x86 defconfigs in the
kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which
means that any such laptop would have to have a custom
configured kernel to actually take advantage of this
bounce buffer speed-up. It simply seems like there was
a speed optimization for the Ricoh controllers that noone
was using. (I have not checked the distro defconfigs but
I am pretty sure the situation is the same there.)
Bounce buffers increased performance on the OMAP HSMMC
at one point, and was part of the original submission in
commit a45c6cb816 ("[ARM] 5369/1: omap mmc: Add new
omap hsmmc controller for 2430 and 34xx, v3")
This optimization was removed in
commit 0ccd76d4c2 ("omap_hsmmc: Implement scatter-gather
emulation")
which found that scatter-gather emulation provided even
better performance.
The same was introduced for SDHCI in
commit 2134a922c6 ("sdhci: scatter-gather (ADMA) support")
I am pretty positively convinced that software
scatter-gather emulation will do for any host controller what
the bounce buffers were doing. Essentially, the bounce buffer
was a reimplementation of software scatter-gather-emulation in
the MMC subsystem, and it should be done away with.
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: Juha Yrjola <juha.yrjola@solidboot.com>
Cc: Steven J. Hill <Steven.Hill@cavium.com>
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Suggested-by: Steven J. Hill <Steven.Hill@cavium.com>
Suggested-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
If we don't have the block layer enabled, we do not present card
status and extcsd in the debugfs.
Debugfs is not ABI, and maintaining files of no relevance for
non-block devices comes at a high maintenance cost if we shall
support it with the block layer compiled out.
The debugfs entries suffer from all the same starvation
issues as the other userspace things, under e.g. a heavy
dd operation.
The expected number of debugfs users utilizing these two
debugfs files is already low as there is an ioctl() to get the
same information using the mmc-tools, and of these few users
the expected number of people using it on SDIO or combo cards
are expected to be zero.
It is therefore logical to move this over to the block layer
when it is enabled, using the new custom requests and issue
it using the block request queue.
On the other hand it moves some debugfs code from debugfs.c
and into block.c.
Tested during heavy dd load by cat:in the status file.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We have a data pointer for the ioctl() data, but we need to
pass other data along with the DRV_OP:s, so make this a
void * so it can be reused.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
This moves the boot partition lock command (issued from sysfs)
into a custom block layer request, just like the ioctl()s,
getting rid of yet another instance of mmc_get_card().
Since we now have two operations issuing special DRV_OP's, we
rename the result variable ->drv_op_result.
Tested by locking the boot partition from userspace:
> cd /sys/devices/platform/soc/80114000.sdi4_per2/mmc_host/mmc3/
mmc3:0001/block/mmcblk3/mmcblk3boot0
> echo 1 > ro_lock_until_next_power_on
[ 178.645324] mmcblk3boot1: Locking boot partition ro until next power on
[ 178.652221] mmcblk3boot0: Locking boot partition ro until next power on
Also tested this with a huge dd job in the background: it
is now possible to lock the boot partitions on the card even
under heavy I/O.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
We will expand the DRV_OP usage, so we need to know which
operation we're performing. Tag the operations with an
enum:ed type and rename the function so it is clear that
it deals with any command and put a switch statement in
it. Currently only ioctls are supported.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Just as we can use blk_mq_rq_from_pdu() to get the per-request
tag we can use blk_mq_rq_to_pdu() to get a request from a tag.
Introduce a static inline helper so we are on the clear what
is happening.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
commit cdf8a6fb48
"mmc: block: Introduce queue semantics"
deleted the last user of mmc_req_is_special() and it was
a horrible hack to classify requests as "special" or
"not special" to begin with, so delete the helper.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
This switches also the multiple-command ioctl() call to issue
all ioctl()s through the block layer instead of going directly
to the device.
We extend the passed argument with an argument count and loop
over all passed commands in the ioctl() issue function called
from the block layer.
By doing this we are again loosening the grip on the big host
lock, since two calls to mmc_get_card()/mmc_put_card() are
removed.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Avri Altman <Avri.Altman@sandisk.com>
This wraps single ioctl() commands into block requests using
the custom block layer request types REQ_OP_DRV_IN and
REQ_OP_DRV_OUT.
By doing this we are loosening the grip on the big host lock,
since two calls to mmc_get_card()/mmc_put_card() are removed.
We are storing the ioctl() in/out argument as a pointer in
the per-request struct mmc_blk_request container. Since we
now let the block layer allocate this data, blk_get_request()
will allocate it for us and we can immediately dereference
it and use it to pass the argument into the block layer.
We refactor the if/else/if/else ladder in mmc_blk_issue_rq()
as part of the job, keeping some extra attention to the
case when a NULL req is passed into this function and
making that pipeline flush more explicit.
Tested on the ux500 with the userspace:
mmc extcsd read /dev/mmcblk3
resulting in a successful EXTCSD info dump back to the
console.
This commit fixes a starvation issue in the MMC/SD stack
that can be easily provoked in the following way by
issueing the following commands in sequence:
> dd if=/dev/mmcblk3 of=/dev/null bs=1M &
> mmc extcs read /dev/mmcblk3
Before this patch, the extcsd read command would hang
(starve) while waiting for the dd command to finish since
the block layer was holding the card/host lock.
After this patch, the extcsd ioctl() command is nicely
interpersed with the rest of the block commands and we
can issue a bunch of ioctl()s from userspace while there
is some busy block IO going on without any problems.
Conversely userspace ioctl()s can no longer starve
the block layer by holding the card/host lock.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Avri Altman <Avri.Altman@sandisk.com>
The mmc_queue_req is a per-request state container the MMC core uses
to carry bounce buffers, pointers to asynchronous requests and so on.
Currently allocated as a static array of objects, then as a request
comes in, a mmc_queue_req is assigned to it, and used during the
lifetime of the request.
This is backwards compared to how other block layer drivers work:
they usally let the block core provide a per-request struct that get
allocated right beind the struct request, and which can be obtained
using the blk_mq_rq_to_pdu() helper. (The _mq_ infix in this function
name is misleading: it is used by both the old and the MQ block
layer.)
The per-request struct gets allocated to the size stored in the queue
variable .cmd_size initialized using the .init_rq_fn() and
cleaned up using .exit_rq_fn().
The block layer code makes the MMC core rely on this mechanism to
allocate the per-request mmc_queue_req state container.
Doing this make a lot of complicated queue handling go away. We only
need to keep the .qnct that keeps count of how many request are
currently being processed by the MMC layer. The MQ block layer will
replace also this once we transition to it.
Doing this refactoring is necessary to move the ioctl() operations
into custom block layer requests tagged with REQ_OP_DRV_[IN|OUT]
instead of the custom code using the BigMMCHostLock that we have
today: those require that per-request data be obtainable easily from
a request after creating a custom request with e.g.:
struct request *rq = blk_get_request(q, REQ_OP_DRV_IN, __GFP_RECLAIM);
struct mmc_queue_req *mq_rq = req_to_mq_rq(rq);
And this is not possible with the current construction, as the request
is not immediately assigned the per-request state container, but
instead it gets assigned when the request finally enters the MMC
queue, which is way too late for custom requests.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[Ulf: Folded in the fix to drop a call to blk_cleanup_queue()]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Heiner Kallweit <hkallweit1@gmail.com>
eMMC can have multiple internal partitions that are represented as separate
disks / queues. However switching between partitions is only done when the
queue is empty. Consequently the array of mmc requests that are queued can
be shared between partitions saving memory.
Keep a pointer to the mmc request queue on the card, and use that instead
of allocating a new one for each partition.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Change from viewing the requests in progress as 'current' and 'previous',
to viewing them as a queue. The current request is allocated to the first
free slot. The presence of incomplete requests is determined from the
count (mq->qcnt) of entries in the queue. Non-read-write requests (i.e.
discards and flushes) are not added to the queue at all and require no
special handling. Also no special handling is needed for the
MMC_BLK_NEW_REQUEST case.
As well as allowing an arbitrarily sized queue, the queue thread function
is significantly simpler.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Instead of masking and setting two bits in the "flags" field
for the mmc_queue, just use two bools named "suspended" and
"new_request".
The masking and setting would likely have race conditions
anyways, it is better to use a simple member like this.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
The mmc_active member of struct mmc_queue_req has a very
confusing name: this is certainly not always "active", it is
the asynchronous request associated by the mmc_queue_req
but it is not guaranteed to be "active" in any sense, such
as being running on the host.
Simply rename this member to "areq".
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
This is the first step in cleaning up the private mmc header files. In this
change we makes sure each header file builds standalone, as that helps to
resolve dependencies.
While changing this, it also seems reasonable to stop including other
headers from inside a header itself which it don't depend upon.
Additionally, in some cases such dependencies are better resolved by
forward declaring the needed struct.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Once upon a time it made sense to keep the mmc block device driver and its
related code, in its own directory called card. Over time, more an more
functions/structures have become shared through generic mmc header files,
between the core and the card directory. In other words, the relationship
between them has become closer.
By sharing functions/structures via generic header files, it becomes easy
for outside users to abuse them. In a way to avoid that from happen, let's
move the files from card directory into the core directory, as it enables
us to move definitions of functions/structures into mmc core specific
header files.
Note, this is only the first step in providing a cleaner mmc interface for
outside users. Following changes will do the actual cleanup, as that is not
part of this change.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>