linux_dsm_epyc7002/block
Jianpeng Ma 975927b942 block: Add blk_rq_pos(rq) to sort rq when plushing
My workload is a raid5 which had 16 disks. And used our filesystem to
write using direct-io mode.

I used the blktrace to find those message:
8,16   0     6647     2.453665504  2579  M   W 7493152 + 8 [md0_raid5]
8,16   0     6648     2.453672411  2579  Q   W 7493160 + 8 [md0_raid5]
8,16   0     6649     2.453672606  2579  M   W 7493160 + 8 [md0_raid5]
8,16   0     6650     2.453679255  2579  Q   W 7493168 + 8 [md0_raid5]
8,16   0     6651     2.453679441  2579  M   W 7493168 + 8 [md0_raid5]
8,16   0     6652     2.453685948  2579  Q   W 7493176 + 8 [md0_raid5]
8,16   0     6653     2.453686149  2579  M   W 7493176 + 8 [md0_raid5]
8,16   0     6654     2.453693074  2579  Q   W 7493184 + 8 [md0_raid5]
8,16   0     6655     2.453693254  2579  M   W 7493184 + 8 [md0_raid5]
8,16   0     6656     2.453704290  2579  Q   W 7493192 + 8 [md0_raid5]
8,16   0     6657     2.453704482  2579  M   W 7493192 + 8 [md0_raid5]
8,16   0     6658     2.453715016  2579  Q   W 7493200 + 8 [md0_raid5]
8,16   0     6659     2.453715247  2579  M   W 7493200 + 8 [md0_raid5]
8,16   0     6660     2.453721730  2579  Q   W 7493208 + 8 [md0_raid5]
8,16   0     6661     2.453721974  2579  M   W 7493208 + 8 [md0_raid5]
8,16   0     6662     2.453728202  2579  Q   W 7493216 + 8 [md0_raid5]
8,16   0     6663     2.453728436  2579  M   W 7493216 + 8 [md0_raid5]
8,16   0     6664     2.453734782  2579  Q   W 7493224 + 8 [md0_raid5]
8,16   0     6665     2.453735019  2579  M   W 7493224 + 8 [md0_raid5]
8,16   0     6666     2.453741401  2579  Q   W 7493232 + 8 [md0_raid5]
8,16   0     6667     2.453741632  2579  M   W 7493232 + 8 [md0_raid5]
8,16   0     6668     2.453748148  2579  Q   W 7493240 + 8 [md0_raid5]
8,16   0     6669     2.453748386  2579  M   W 7493240 + 8 [md0_raid5]
8,16   0     6670     2.453851843  2579  I   W 7493144 + 104 [md0_raid5]
8,16   0        0     2.453853661     0  m   N cfq2579 insert_request
8,16   0     6671     2.453854064  2579  I   W 7493120 + 24 [md0_raid5]
8,16   0        0     2.453854439     0  m   N cfq2579 insert_request
8,16   0     6672     2.453854793  2579  U   N [md0_raid5] 2
8,16   0        0     2.453855513     0  m   N cfq2579 Not idling.st->count:1
8,16   0        0     2.453855927     0  m   N cfq2579 dispatch_insert
8,16   0        0     2.453861771     0  m   N cfq2579 dispatched a request
8,16   0        0     2.453862248     0  m   N cfq2579 activate rq,drv=1
8,16   0     6673     2.453862332  2579  D   W 7493120 + 24 [md0_raid5]
8,16   0        0     2.453865957     0  m   N cfq2579 Not idling.st->count:1
8,16   0        0     2.453866269     0  m   N cfq2579 dispatch_insert
8,16   0        0     2.453866707     0  m   N cfq2579 dispatched a request
8,16   0        0     2.453867061     0  m   N cfq2579 activate rq,drv=2
8,16   0     6674     2.453867145  2579  D   W 7493144 + 104 [md0_raid5]
8,16   0     6675     2.454147608     0  C   W 7493120 + 24 [0]
8,16   0        0     2.454149357     0  m   N cfq2579 complete rqnoidle 0
8,16   0     6676     2.454791505     0  C   W 7493144 + 104 [0]
8,16   0        0     2.454794803     0  m   N cfq2579 complete rqnoidle 0
8,16   0        0     2.454795160     0  m   N cfq schedule dispatch

From above messages,we can find rq[W 7493144 + 104] and rq[W
7493120 + 24] do not merge.
Because the bio order is:
  8,16   0     6638     2.453619407  2579  Q   W 7493144 + 8 [md0_raid5]
  8,16   0     6639     2.453620460  2579  G   W 7493144 + 8 [md0_raid5]
  8,16   0     6640     2.453639311  2579  Q   W 7493120 + 8 [md0_raid5]
  8,16   0     6641     2.453639842  2579  G   W 7493120 + 8 [md0_raid5]
The bio(7493144) first and bio(7493120) later.So the subsequent
bios will be divided into two parts.
When flushing plug-list,because elv_attempt_insert_merge only support
backmerge,not supporting frontmerge.
So rq[7493120 + 24] can't merge with rq[7493144 + 104].

From my test,i found those situation can count 25% in our system.
Using this patch, there is no this situation.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
CC:Shaohua Li <shli@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-10-25 21:58:17 +02:00
..
partitions s390/partitions: make partition detection independent from DASD ioctls 2012-09-26 15:45:05 +02:00
blk-cgroup.c blkcg: stop iteration early if root_rl is the only request list 2012-10-22 22:00:26 +02:00
blk-cgroup.h blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
blk-core.c block: Add blk_rq_pos(rq) to sort rq when plushing 2012-10-25 21:58:17 +02:00
blk-exec.c [SCSI] block: Fix blk_execute_rq_nowait() dead queue handling 2012-07-20 08:58:39 +01:00
blk-flush.c blk-flush: move the queue kick into 2011-10-24 16:24:31 +02:00
blk-integrity.c block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros 2011-10-31 19:31:12 -04:00
blk-ioc.c block: uninitialized ioc->nr_tasks triggers WARN_ON 2012-08-01 12:17:27 +02:00
blk-iopoll.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
blk-lib.c block: Make blkdev_issue_zeroout use WRITE SAME 2012-09-20 14:31:49 +02:00
blk-map.c block: re-use existing 'reading' variable instead of checking direction again 2011-12-21 15:27:24 +01:00
blk-merge.c block: Implement support for WRITE SAME 2012-09-20 14:31:45 +02:00
blk-settings.c block: Implement support for WRITE SAME 2012-09-20 14:31:45 +02:00
blk-softirq.c sched, block: Unify cache detection 2012-01-27 13:28:48 +01:00
blk-sysfs.c block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() 2012-09-21 15:32:57 +02:00
blk-tag.c block/blk-tag.c: Remove useless kfree 2012-09-12 22:25:12 +02:00
blk-throttle.c workqueue: use mod_delayed_work() instead of __cancel + queue 2012-08-21 13:18:24 -07:00
blk-timeout.c block: Drop dead function blk_abort_queue() 2012-06-15 08:46:23 +02:00
blk.h block: Clean up special command handling logic 2012-09-20 14:31:38 +02:00
bsg-lib.c block: drop custom queue draining used by scsi_transport_{iscsi|fc} 2012-06-25 11:53:48 +02:00
bsg.c bsg: fix sysfs link remove warning 2012-02-08 20:02:03 +01:00
cfq-iosched.c block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED 2012-06-04 10:02:29 +02:00
compat_ioctl.c block: Add BLKROTATIONAL ioctl 2012-01-11 16:29:31 +01:00
deadline-iosched.c elevator: make elevator_init_fn() return 0/-errno 2012-03-06 21:27:21 +01:00
elevator.c block: Clean up special command handling logic 2012-09-20 14:31:38 +02:00
genhd.c Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2012-10-02 09:54:49 -07:00
ioctl.c Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block 2012-10-11 09:04:23 +09:00
Kconfig block: remove CONFIG_EXPERIMENTAL 2012-10-23 22:30:34 +02:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile separate partition format handling from generic code 2012-01-03 22:54:06 -05:00
noop-iosched.c elevator: make elevator_init_fn() return 0/-errno 2012-03-06 21:27:21 +01:00
partition-generic.c block: add partition resize function to blkpg ioctl 2012-08-01 12:24:18 +02:00
scsi_ioctl.c scsi: Silence unnecessary warnings about ioctl to partition 2012-06-15 12:52:46 +02:00