openwrt/staging/blogic.git
6 years agommc: stop abusing the request queue_lock pointer
Christoph Hellwig [Fri, 16 Nov 2018 08:10:06 +0000 (09:10 +0100)]
mmc: stop abusing the request queue_lock pointer

Replace the lock in mmc_blk_data that is only used through a pointer
in struct mmc_queue and to protect fields in that structure with
an actual lock in struct mmc_queue.

Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: don't acquire queue_lock in ide_complete_pm_rq
Christoph Hellwig [Fri, 16 Nov 2018 08:10:05 +0000 (09:10 +0100)]
ide: don't acquire queue_lock in ide_complete_pm_rq

blk_mq_stop_hw_queues doesn't need any locking, and the ide
dev_flags field isn't protected by it either.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: don't acquire queue lock in ide_pm_execute_rq
Christoph Hellwig [Fri, 16 Nov 2018 08:10:04 +0000 (09:10 +0100)]
ide: don't acquire queue lock in ide_pm_execute_rq

There is nothing we can synchronize against over a call to
blk_queue_dying.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agopktcdvd: remove queue_lock around blk_queue_max_hw_sectors
Christoph Hellwig [Fri, 16 Nov 2018 08:10:03 +0000 (09:10 +0100)]
pktcdvd: remove queue_lock around blk_queue_max_hw_sectors

blk_queue_max_hw_sectors can't do anything with queue_lock protection
so don't hold it.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agofloppy: remove queue_lock around floppy_end_request
Christoph Hellwig [Fri, 16 Nov 2018 08:10:02 +0000 (09:10 +0100)]
floppy: remove queue_lock around floppy_end_request

There is nothing the queue_lock could protect inside floppy_end_request,
so remove it.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove the rq_alloc_data request_queue field
Christoph Hellwig [Fri, 16 Nov 2018 08:10:01 +0000 (09:10 +0100)]
block: remove the rq_alloc_data request_queue field

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: don't plug for aio/O_DIRECT HIPRI IO
Jens Axboe [Fri, 16 Nov 2018 02:56:53 +0000 (19:56 -0700)]
block: don't plug for aio/O_DIRECT HIPRI IO

Those will go straight to issue inside blk-mq, so don't bother
setting up a block plug for them.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: for async O_DIRECT, mark us as polling if asked to
Jens Axboe [Tue, 6 Nov 2018 21:29:11 +0000 (14:29 -0700)]
block: for async O_DIRECT, mark us as polling if asked to

Inherit the iocb IOCB_HIPRI flag, and pass on REQ_HIPRI for
those kinds of requests.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: add polled wakeup task helper
Jens Axboe [Wed, 14 Nov 2018 04:16:54 +0000 (21:16 -0700)]
block: add polled wakeup task helper

If we're polling for IO on a device that doesn't use interrupts, then
IO completion loop (and wake of task) is done by submitting task itself.
If that is the case, then we don't need to enter the wake_up_process()
function, we can simply mark ourselves as TASK_RUNNING.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-rq-qos: inline check for q->rq_qos functions
Jens Axboe [Thu, 15 Nov 2018 19:25:10 +0000 (12:25 -0700)]
blk-rq-qos: inline check for q->rq_qos functions

Put the short code in the fast path, where we don't have any
functions attached to the queue. This minimizes the impact on
the hot path in the core code.

Cc: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: add queue_is_mq() helper
Jens Axboe [Thu, 15 Nov 2018 19:22:51 +0000 (12:22 -0700)]
block: add queue_is_mq() helper

Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.

Where the ->mq_ops != NULL check is redundant, remove it.

Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonvme: provide optimized poll function for separate poll queues
Jens Axboe [Wed, 14 Nov 2018 16:38:28 +0000 (09:38 -0700)]
nvme: provide optimized poll function for separate poll queues

If we have separate poll queues, we know that they aren't using
interrupts. Hence we don't need to disable interrupts around
finding completions.

Provide a separate set of blk_mq_ops for such devices.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: clear ide_req()->special for non-passthrough requests
Jens Axboe [Fri, 16 Nov 2018 02:42:07 +0000 (19:42 -0700)]
ide: clear ide_req()->special for non-passthrough requests

The initial patch cleared this for all requests, which is wrong
since internal uses can't have this cleared as that's what they
are using to pass data. The fix moved the initialization to the
mq_ops->initialize_rq_fn(), but that's only a partial fix since
it only catches uses from blk_get_request(), not requests coming
from the file system.

Keep the non-fs initialization, and add the IDE entry clear
IFF RQF_DONTPREP isn't set and it's a passthrough request.

Fixes: d16a67667c61 ("ide: don't clear special on ide_queue_rq() entry")
Fixes: 22ce0a7ccf23 ("ide: don't use req->special")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonvme: fix handling of EINVAL on pci_alloc_irq_vectors_affinity()
Jens Axboe [Thu, 15 Nov 2018 23:05:02 +0000 (16:05 -0700)]
nvme: fix handling of EINVAL on pci_alloc_irq_vectors_affinity()

At least on SPARC, if MSI/MSI-X isn't supported, we get EINVAL if
we ask for more than one vector. This isn't covered by our ENOSPC
check.

If we get EINVAL, decrease our ask to just one vector, instead of
bailing out in error.

Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: add wbt_disable_default export for BFQ
Jens Axboe [Thu, 15 Nov 2018 19:31:27 +0000 (12:31 -0700)]
block: add wbt_disable_default export for BFQ

This isn't unused, if BFQ is modular we get into trouble.

Fixes: b6676f653f13 ("block: remove a few unused exports")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove the queue_lock indirection
Christoph Hellwig [Thu, 15 Nov 2018 19:17:28 +0000 (12:17 -0700)]
block: remove the queue_lock indirection

With the legacy request path gone there is no good reason to keep
queue_lock as a pointer, we can always use the embedded lock now.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixed floppy and blk-cgroup missing conversions and half done edits.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove the lock argument to blk_alloc_queue_node
Christoph Hellwig [Wed, 14 Nov 2018 16:02:18 +0000 (17:02 +0100)]
block: remove the lock argument to blk_alloc_queue_node

With the legacy request path gone there is no real need to override the
queue_lock.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agommc: stop abusing the request queue_lock pointer
Christoph Hellwig [Wed, 14 Nov 2018 16:02:17 +0000 (17:02 +0100)]
mmc: stop abusing the request queue_lock pointer

mmc uses the block layer struct request pointer to indirect their own
lock to the mmc_queue structure, given that the original lock isn't
reachable outside of block.c.  Add a lock pointer to struct mmc_queue
instead and stop overriding the block layer lock which protects fields
entirely separate from the mmc use.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agommc: simplify queue initialization
Christoph Hellwig [Wed, 14 Nov 2018 16:02:16 +0000 (17:02 +0100)]
mmc: simplify queue initialization

Merge three functions initializing the queue into a single one, and drop
an unused argument for it.

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoumem: don't override the queue_lock
Christoph Hellwig [Wed, 14 Nov 2018 16:02:15 +0000 (17:02 +0100)]
umem: don't override the queue_lock

The umem card->lock and the block layer queue_lock are used for entirely
different resources.  Stop using card->lock as the block layer
queue_lock.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agodrbd: don't override the queue_lock
Christoph Hellwig [Wed, 14 Nov 2018 16:02:14 +0000 (17:02 +0100)]
drbd: don't override the queue_lock

The DRBD req_lock and block layer queue_lock are used for entirely
different resources.  Stop using the req_lock as the block layer
queue_lock.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-cgroup: move locking into blkg_destroy_all
Christoph Hellwig [Wed, 14 Nov 2018 16:02:13 +0000 (17:02 +0100)]
blk-cgroup: move locking into blkg_destroy_all

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-cgroup: consolidate error handling in blkcg_init_queue
Christoph Hellwig [Wed, 14 Nov 2018 16:02:12 +0000 (17:02 +0100)]
blk-cgroup: consolidate error handling in blkcg_init_queue

Use a goto label to merge two identical pieces of error handling code.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove a few unused exports
Christoph Hellwig [Wed, 14 Nov 2018 16:02:11 +0000 (17:02 +0100)]
block: remove a few unused exports

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: update a few comments for the legacy request removal
Christoph Hellwig [Wed, 14 Nov 2018 16:02:10 +0000 (17:02 +0100)]
block: update a few comments for the legacy request removal

Only the mq locking is left in the flush state machine.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove the unused lock argument to rq_qos_throttle
Christoph Hellwig [Wed, 14 Nov 2018 16:02:09 +0000 (17:02 +0100)]
block: remove the unused lock argument to rq_qos_throttle

Unused now that the legacy request path is gone.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove queue_lockdep_assert_held
Christoph Hellwig [Wed, 14 Nov 2018 16:02:08 +0000 (17:02 +0100)]
block: remove queue_lockdep_assert_held

The only remaining user unconditionally drops and reacquires the lock,
which means we really don't need any additional (conditional) annotation.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: use atomic bitops for ->queue_flags
Christoph Hellwig [Wed, 14 Nov 2018 16:02:07 +0000 (17:02 +0100)]
block: use atomic bitops for ->queue_flags

->queue_flags is generally not set or cleared in the fast path, and also
generally set or cleared one flag at a time.  Make use of the normal
atomic bitops for it so that we don't need to take the queue_lock,
which is otherwise mostly unused in the core block layer now.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: don't hold the queue_lock over blk_abort_request
Christoph Hellwig [Wed, 14 Nov 2018 16:02:06 +0000 (17:02 +0100)]
block: don't hold the queue_lock over blk_abort_request

There is nothing it could synchronize against, so don't go through
the pains of acquiring the lock.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove deadline __deadline manipulation helpers
Christoph Hellwig [Wed, 14 Nov 2018 16:02:05 +0000 (17:02 +0100)]
block: remove deadline __deadline manipulation helpers

No users left since the removal of the legacy request interface, we can
remove all the magic bit stealing now and make it a normal field.

But use WRITE_ONCE/READ_ONCE on the new deadline field, given that we
don't seem to have any mechanism to guarantee a new value actually
gets seen by other threads.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove QUEUE_FLAG_BYPASS and ->bypass
Christoph Hellwig [Wed, 14 Nov 2018 16:02:04 +0000 (17:02 +0100)]
block: remove QUEUE_FLAG_BYPASS and ->bypass

Unused since the removal of the legacy request code.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: make blk_try_req_merge() static
Eric Biggers [Thu, 15 Nov 2018 01:19:46 +0000 (17:19 -0800)]
block: make blk_try_req_merge() static

blk_try_req_merge() is only used in block/blk-merge.c, so make it
static.

This addresses a gcc warning when -Wmissing-prototypes is enabled.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove dead queue members
Jens Axboe [Wed, 14 Nov 2018 22:22:49 +0000 (15:22 -0700)]
block: remove dead queue members

No more users of ->in_flight[] or ->nr_sorted, get rid of them.

Fixes: a1ce35fa4985 ("block: remove dead elevator code")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: clean up dead code that is now redundant
Colin Ian King [Wed, 14 Nov 2018 22:17:05 +0000 (22:17 +0000)]
block: clean up dead code that is now redundant

The boolean next_sorted is set to false and is never changed, hence
the code that checks if it is true is dead code and can now be
removed.  This dead code occurred from a previous commit that cleaned
up the elevator and removed the setting of next_sorted to true.

Detected by CoverityScan, CID#1475401 ("'Constant' variable guards
dead code")

Fixes: a1ce35fa4985 ("block: remove dead elevator code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonvme: fix boot hang with only being able to get one IRQ vector
Jens Axboe [Wed, 14 Nov 2018 17:13:50 +0000 (10:13 -0700)]
nvme: fix boot hang with only being able to get one IRQ vector

NVMe always asks for io_queues + 1 worth of IRQ vectors, which
means that even when we scale all the way down, we still ask
for 2 vectors and get -ENOSPC in return if the system can't
support more than 1.

Getting just 1 vector is fine, it just means that we'll have
1 IO queue and 1 admin queue, with a shared vector between them.
Check for this case and don't add our + 1 if it happens.

Fixes: 3b6592f70ad7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: don't clear special on ide_queue_rq() entry
Jens Axboe [Tue, 13 Nov 2018 00:19:32 +0000 (17:19 -0700)]
ide: don't clear special on ide_queue_rq() entry

We can't use RQF_DONTPREP to see if we should clear ->special,
as someone could have set that while inserting the request. Make
sure we clear it in our ->initialize_rq_fn() helper instead.

Fixes: 22ce0a7ccf23 ("ide: don't use req->special")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()
Tetsuo Handa [Mon, 12 Nov 2018 15:42:14 +0000 (08:42 -0700)]
loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()

Commit 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex") forgot to
remove mutex_unlock(&loop_ctl_mutex) from loop_control_ioctl() when
replacing loop_index_mutex with loop_ctl_mutex.

Fixes: 0a42e99b58a20883 ("loop: Get rid of loop_index_mutex")
Reported-by: syzbot <syzbot+c0138741c2290fc5e63f@syzkaller.appspotmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonull_blk: remove unused nullb device
Jens Axboe [Sat, 10 Nov 2018 20:03:52 +0000 (13:03 -0700)]
null_blk: remove unused nullb device

The compiler rightfully complains:

drivers/block/null_blk_main.c: In function ‘null_complete_rq’:
drivers/block/null_blk_main.c:647:16: warning: unused variable ‘nullb’ [-Wunused-variable]
  struct nullb *nullb = rq->q->queuedata;
                ^~~~~

Fixes: 49f6613632f9 ("nullb: remove leftover legacy request code")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: don't use req->special
Christoph Hellwig [Sat, 10 Nov 2018 08:30:49 +0000 (09:30 +0100)]
ide: don't use req->special

Just replace it with a field of the same name in struct ide_req.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agopd: replace ->special use with private data in the request
Christoph Hellwig [Sat, 10 Nov 2018 08:30:48 +0000 (09:30 +0100)]
pd: replace ->special use with private data in the request

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoaoe: replace ->special use with private data in the request
Christoph Hellwig [Sat, 10 Nov 2018 08:30:47 +0000 (09:30 +0100)]
aoe: replace ->special use with private data in the request

Makes the code a whole lot easier to read.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd_main: don't use req->special
Christoph Hellwig [Sat, 10 Nov 2018 08:30:46 +0000 (09:30 +0100)]
skd_main: don't use req->special

Add a retries field to the internal request structure instead, which gets
set to zero on the first submission.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: remove leftover legacy request code
Christoph Hellwig [Sat, 10 Nov 2018 08:30:45 +0000 (09:30 +0100)]
nullb: remove leftover legacy request code

null_softirq_done_fn is only used for the blk-mq path, so remove the
other branch.  Also rename the function to better match the method name.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agofnic: fix fnic_scsi_host_{start,end}_tag
Christoph Hellwig [Sat, 10 Nov 2018 08:30:44 +0000 (09:30 +0100)]
fnic: fix fnic_scsi_host_{start,end}_tag

The way these functions abuse ->special to try to store the dummy
request looks completely broken, given that it actually stores the
original scsi command.

Instead switch to ->host_scribble and store the actual dummy command.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove set but not used variable 'et'
YueHaibing [Sat, 10 Nov 2018 02:41:14 +0000 (02:41 +0000)]
block: remove set but not used variable 'et'

Fixes gcc '-Wunused-but-set-variable' warning:

block/blk-ioc.c: In function 'put_io_context_active':
block/blk-ioc.c:174:24: warning:
 variable 'et' set but not used [-Wunused-but-set-variable]

It not used any more after commit
a1ce35fa4985 ("block: remove dead elevator code")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove the BLKPREP_* values.
Christoph Hellwig [Fri, 9 Nov 2018 13:42:41 +0000 (14:42 +0100)]
block: remove the BLKPREP_* values.

Unused now.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoscsi: return blk_status_t from device handler ->prep_fn
Christoph Hellwig [Fri, 9 Nov 2018 13:42:40 +0000 (14:42 +0100)]
scsi: return blk_status_t from device handler ->prep_fn

Remove the last use of the old BLKPREP_* values, which get converted
to BLK_STS_* later anyway.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoscsi: return blk_status_t from scsi_init_io and ->init_command
Christoph Hellwig [Fri, 9 Nov 2018 13:42:39 +0000 (14:42 +0100)]
scsi: return blk_status_t from scsi_init_io and ->init_command

Replace the old BLKPREP_* values with the BLK_STS_ ones that they are
converted to later anyway.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoscsi: clean up error handling in scsi_init_io
Christoph Hellwig [Fri, 9 Nov 2018 13:42:38 +0000 (14:42 +0100)]
scsi: clean up error handling in scsi_init_io

There is no need to call scsi_mq_free_sgtables until we have actually
allocated sgtables.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoscsi: push blk_status_t up into scsi_setup_{fs,scsi}_cmnd
Christoph Hellwig [Fri, 9 Nov 2018 13:42:37 +0000 (14:42 +0100)]
scsi: push blk_status_t up into scsi_setup_{fs,scsi}_cmnd

This just moves the prep_to_mq calls up in preparation of further removal
of BLKPREP_* usage.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoscsi: simplify scsi_prep_state_check
Christoph Hellwig [Fri, 9 Nov 2018 13:42:36 +0000 (14:42 +0100)]
scsi: simplify scsi_prep_state_check

Return a blk_status_t directly, and make the code a little more compact
by handling the fast path in the caller.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide: cleanup ->prep_rq calling convention
Christoph Hellwig [Fri, 9 Nov 2018 13:42:35 +0000 (14:42 +0100)]
ide: cleanup ->prep_rq calling convention

The return value is just used as a binary yes/no decision, so switch
it to a bool instead of the old BLKPREP_* values returned as an int.

Also clean up a few related comments.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove req->timeout_list
Christoph Hellwig [Fri, 9 Nov 2018 18:37:44 +0000 (19:37 +0100)]
block: remove req->timeout_list

Unused now that the legacy request path is gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xxx: use for_each_sg
Christoph Hellwig [Fri, 9 Nov 2018 13:49:02 +0000 (14:49 +0100)]
mtip32xxx: use for_each_sg

Use the proper helper instead of manually iterating the scatterlist,
which is broken in the presence of chained S/G lists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: don't use req->special
Christoph Hellwig [Fri, 9 Nov 2018 13:49:01 +0000 (14:49 +0100)]
mtip32xx: don't use req->special

Instead create add to the icmd into struct mtip_cmd which can be unioned
with the scatterlist used for the normal I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: remove mtip_get_int_command
Christoph Hellwig [Fri, 9 Nov 2018 13:49:00 +0000 (14:49 +0100)]
mtip32xx: remove mtip_get_int_command

Merging this function into the only callers makes the code flow easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: remove mtip_init_cmd_header
Christoph Hellwig [Fri, 9 Nov 2018 13:48:59 +0000 (14:48 +0100)]
mtip32xx: remove mtip_init_cmd_header

There isn't much need for this helper - we can just calculate the offset
for the command header once late in the submission path and fill out
the ctba and ctbau fields there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: add missing endianess annotations on struct smart_attr
Christoph Hellwig [Fri, 9 Nov 2018 13:48:58 +0000 (14:48 +0100)]
mtip32xx: add missing endianess annotations on struct smart_attr

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: remove __force_bit2int
Christoph Hellwig [Fri, 9 Nov 2018 13:48:57 +0000 (14:48 +0100)]
mtip32xx: remove __force_bit2int

There is no good excuse not to use proper __le16/32 types.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: return a blk_status_t from mtip_send_trim
Christoph Hellwig [Fri, 9 Nov 2018 13:48:56 +0000 (14:48 +0100)]
mtip32xx: return a blk_status_t from mtip_send_trim

This allows for better error propagation and simpler code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: merge mtip_submit_request into mtip_queue_rq
Christoph Hellwig [Fri, 9 Nov 2018 13:48:55 +0000 (14:48 +0100)]
mtip32xx: merge mtip_submit_request into mtip_queue_rq

Factor out a new is_stopped helper that matches the existing
is_se_active helper, and merge the trivial amount of remaining code
into the only caller.  This also allows better error handling by
returning a BLK_STS_* directly instead of explicitly calling
blk_mq_end_request, and moving blk_mq_start_request closer to the
actual issue to hardware.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agomtip32xx: move the blk_rq_map_sg call to mtip_hw_submit_io
Christoph Hellwig [Fri, 9 Nov 2018 13:48:54 +0000 (14:48 +0100)]
mtip32xx: move the blk_rq_map_sg call to mtip_hw_submit_io

We have all arguments at hand in mtip_hw_submit_io, so keep the
rq to sg mapping close to the dma_map_sg call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agosx8: use a per-host tag_set
Christoph Hellwig [Fri, 9 Nov 2018 13:51:10 +0000 (14:51 +0100)]
sx8: use a per-host tag_set

The current sx8 code spends a lot of effort dealing with the fact that
tags are per-host, but there might be multiple queues.  Now that the
driver has been converted to blk-mq it can take care of the blk-mq
tag_set concept that has been designed just for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agosx8: cleanup queue and disk allocation / freeing
Christoph Hellwig [Fri, 9 Nov 2018 13:51:09 +0000 (14:51 +0100)]
sx8: cleanup queue and disk allocation / freeing

Make the disk/queue alloc and free helpers per-port by moving the
trivial loops into the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq-tag: document tag iteration helper return value
Jens Axboe [Thu, 8 Nov 2018 18:09:50 +0000 (11:09 -0700)]
blk-mq-tag: document tag iteration helper return value

Document the fact that the strategy function passed in can
control whether to continue iterating or not.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: provide a helper to check if a queue is busy
Jens Axboe [Thu, 8 Nov 2018 16:03:51 +0000 (09:03 -0700)]
blk-mq: provide a helper to check if a queue is busy

Returns true if the queue currently has requests pending,
false if not.

DM can use this to replace the atomic_inc/dec they do per device
to see if a device is busy.

Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq-tag: change busy_iter_fn to return whether to continue or not
Jens Axboe [Thu, 8 Nov 2018 17:24:07 +0000 (10:24 -0700)]
blk-mq-tag: change busy_iter_fn to return whether to continue or not

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.

Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Get rid of 'nested' acquisition of loop_ctl_mutex
Jan Kara [Thu, 8 Nov 2018 13:01:16 +0000 (14:01 +0100)]
loop: Get rid of 'nested' acquisition of loop_ctl_mutex

The nested acquisition of loop_ctl_mutex (->lo_ctl_mutex back then) has
been introduced by commit f028f3b2f987e "loop: fix circular locking in
loop_clr_fd()" to fix lockdep complains about bd_mutex being acquired
after lo_ctl_mutex during partition rereading. Now that these are
properly fixed, let's stop fooling lockdep.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex
Jan Kara [Thu, 8 Nov 2018 13:01:15 +0000 (14:01 +0100)]
loop: Avoid circular locking dependency between loop_ctl_mutex and bd_mutex

Code in loop_change_fd() drops reference to the old file (and also the
new file in a failure case) under loop_ctl_mutex. Similarly to a
situation in loop_set_fd() this can create a circular locking dependency
if this was the last reference holding the file open. Delay dropping of
the file reference until we have released loop_ctl_mutex.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Fix deadlock when calling blkdev_reread_part()
Jan Kara [Thu, 8 Nov 2018 13:01:14 +0000 (14:01 +0100)]
loop: Fix deadlock when calling blkdev_reread_part()

Calling blkdev_reread_part() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].

Move call to blkdev_reread_part() in __loop_clr_fd() from under
loop_ctl_mutex to finish fixing of the lockdep warning and the possible
deadlock.

[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588

Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Move loop_reread_partitions() out of loop_ctl_mutex
Jan Kara [Thu, 8 Nov 2018 13:01:13 +0000 (14:01 +0100)]
loop: Move loop_reread_partitions() out of loop_ctl_mutex

Calling loop_reread_partitions() under loop_ctl_mutex causes lockdep to
complain about circular lock dependency between bdev->bd_mutex and
lo->lo_ctl_mutex. The problem is that on loop device open or close
lo_open() and lo_release() get called with bdev->bd_mutex held and they
need to acquire loop_ctl_mutex. OTOH when loop_reread_partitions() is
called with loop_ctl_mutex held, it will call blkdev_reread_part() which
acquires bdev->bd_mutex. See syzbot report for details [1].

Move all calls of loop_rescan_partitions() out of loop_ctl_mutex to
avoid lockdep warning and fix deadlock possibility.

[1] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d1588

Reported-by: syzbot <syzbot+4684a000d5abdade83fac55b1e7d1f935ef1936e@syzkaller.appspotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Move special partition reread handling in loop_clr_fd()
Jan Kara [Thu, 8 Nov 2018 13:01:12 +0000 (14:01 +0100)]
loop: Move special partition reread handling in loop_clr_fd()

The call of __blkdev_reread_part() from loop_reread_partition() happens
only when we need to invalidate partitions from loop_release(). Thus
move a detection for this into loop_clr_fd() and simplify
loop_reread_partition().

This makes loop_reread_partition() safe to use without loop_ctl_mutex
because we use only lo->lo_number and lo->lo_file_name in case of error
for reporting purposes (thus possibly reporting outdate information is
not a big deal) and we are safe from 'lo' going away under us by
elevated lo->lo_refcnt.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push loop_ctl_mutex down to loop_change_fd()
Jan Kara [Thu, 8 Nov 2018 13:01:11 +0000 (14:01 +0100)]
loop: Push loop_ctl_mutex down to loop_change_fd()

Push loop_ctl_mutex down to loop_change_fd(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push loop_ctl_mutex down to loop_set_fd()
Jan Kara [Thu, 8 Nov 2018 13:01:10 +0000 (14:01 +0100)]
loop: Push loop_ctl_mutex down to loop_set_fd()

Push lo_ctl_mutex down to loop_set_fd(). We will need this to be able to
call loop_reread_partitions() without lo_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push loop_ctl_mutex down to loop_set_status()
Jan Kara [Thu, 8 Nov 2018 13:01:09 +0000 (14:01 +0100)]
loop: Push loop_ctl_mutex down to loop_set_status()

Push loop_ctl_mutex down to loop_set_status(). We will need this to be
able to call loop_reread_partitions() without loop_ctl_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push loop_ctl_mutex down to loop_get_status()
Jan Kara [Thu, 8 Nov 2018 13:01:08 +0000 (14:01 +0100)]
loop: Push loop_ctl_mutex down to loop_get_status()

Push loop_ctl_mutex down to loop_get_status() to avoid the unusual
convention that the function gets called with loop_ctl_mutex held and
releases it.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push loop_ctl_mutex down into loop_clr_fd()
Jan Kara [Thu, 8 Nov 2018 13:01:07 +0000 (14:01 +0100)]
loop: Push loop_ctl_mutex down into loop_clr_fd()

loop_clr_fd() has a weird locking convention that is expects
loop_ctl_mutex held, releases it on success and keeps it on failure.
Untangle the mess by moving locking of loop_ctl_mutex into
loop_clr_fd().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Split setting of lo_state from loop_clr_fd
Jan Kara [Thu, 8 Nov 2018 13:01:06 +0000 (14:01 +0100)]
loop: Split setting of lo_state from loop_clr_fd

Move setting of lo_state to Lo_rundown out into the callers. That will
allow us to unlock loop_ctl_mutex while the loop device is protected
from other changes by its special state.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Push lo_ctl_mutex down into individual ioctls
Jan Kara [Thu, 8 Nov 2018 13:01:05 +0000 (14:01 +0100)]
loop: Push lo_ctl_mutex down into individual ioctls

Push acquisition of lo_ctl_mutex down into individual ioctl handling
branches. This is a preparatory step for pushing the lock down into
individual ioctl handling functions so that they can release the lock as
they need it. We also factor out some simple ioctl handlers that will
not need any special handling to reduce unnecessary code duplication.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Get rid of loop_index_mutex
Jan Kara [Thu, 8 Nov 2018 13:01:04 +0000 (14:01 +0100)]
loop: Get rid of loop_index_mutex

Now that loop_ctl_mutex is global, just get rid of loop_index_mutex as
there is no good reason to keep these two separate and it just
complicates the locking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: Fold __loop_release into loop_release
Jan Kara [Thu, 8 Nov 2018 13:01:03 +0000 (14:01 +0100)]
loop: Fold __loop_release into loop_release

__loop_release() has a single call site. Fold it there. This is
currently not a huge win but it will make following replacement of
loop_index_mutex more obvious.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock/loop: Use global lock for ioctl() operation.
Tetsuo Handa [Thu, 8 Nov 2018 13:01:02 +0000 (14:01 +0100)]
block/loop: Use global lock for ioctl() operation.

syzbot is reporting NULL pointer dereference [1] which is caused by
race condition between ioctl(loop_fd, LOOP_CLR_FD, 0) versus
ioctl(other_loop_fd, LOOP_SET_FD, loop_fd) due to traversing other
loop devices at loop_validate_file() without holding corresponding
lo->lo_ctl_mutex locks.

Since ioctl() request on loop devices is not frequent operation, we don't
need fine grained locking. Let's use global lock in order to allow safe
traversal at loop_validate_file().

Note that syzbot is also reporting circular locking dependency between
bdev->bd_mutex and lo->lo_ctl_mutex [2] which is caused by calling
blkdev_reread_part() with lock held. This patch does not address it.

[1] https://syzkaller.appspot.com/bug?id=f3cfe26e785d85f9ee259f385515291d21bd80a3
[2] https://syzkaller.appspot.com/bug?id=bf154052f0eea4bc7712499e4569505907d15889

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+bf89c128e05dd6c62523@syzkaller.appspotmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock/loop: Don't grab "struct file" for vfs_getattr() operation.
Tetsuo Handa [Thu, 8 Nov 2018 13:01:01 +0000 (14:01 +0100)]
block/loop: Don't grab "struct file" for vfs_getattr() operation.

vfs_getattr() needs "struct path" rather than "struct file".
Let's use path_get()/path_put() rather than get_file()/fput().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoms_block: remove unused pointer 'set'
Colin Ian King [Thu, 8 Nov 2018 11:08:09 +0000 (11:08 +0000)]
ms_block: remove unused pointer 'set'

Pointer 'set' is declared but not used, remove it. Cleans up warning:

warning: unused variable ‘set’ [-Wunused-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agosunvdc: fix compiler warning
Jens Axboe [Thu, 8 Nov 2018 04:17:57 +0000 (21:17 -0700)]
sunvdc: fix compiler warning

Stephen reports:

After merging the block tree, today's linux-next build (sparc64 defconfig)
produced this warning:

/home/sfr/next/next/drivers/block/sunvdc.c: In function 'init_queue':
/home/sfr/next/next/drivers/block/sunvdc.c:788:6: warning: unused variable 'ret' [-Wunused-variable]
  int ret;
      ^~~

Kill the unused variable.

Fixes: fa182a1fa97d ("sunvdc: convert to blk-mq")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonvme: add separate poll queue map
Jens Axboe [Mon, 5 Nov 2018 19:44:33 +0000 (12:44 -0700)]
nvme: add separate poll queue map

Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: add REQ_HIPRI and inherit it from IOCB_HIPRI
Jens Axboe [Wed, 29 Aug 2018 16:36:56 +0000 (10:36 -0600)]
block: add REQ_HIPRI and inherit it from IOCB_HIPRI

We use IOCB_HIPRI to poll for IO in the caller instead of scheduling.
This information is not available for (or after) IO submission. The
driver may make different queue choices based on the type of IO, so
make the fact that we will poll for this IO known to the lower layers
as well.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonvme: utilize two queue maps, one for reads and one for writes
Jens Axboe [Wed, 31 Oct 2018 14:36:31 +0000 (08:36 -0600)]
nvme: utilize two queue maps, one for reads and one for writes

NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: initial support for multiple queue maps
Jens Axboe [Wed, 24 Oct 2018 19:16:11 +0000 (13:16 -0600)]
blk-mq: initial support for multiple queue maps

Add a queue offset to the tag map. This enables users to map
iteratively, for each queue map type they support.

Bump maximum number of supported maps to 2, we're now fully
able to support more than 1 map.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: improve plug list sorting
Jens Axboe [Tue, 30 Oct 2018 18:24:04 +0000 (12:24 -0600)]
blk-mq: improve plug list sorting

Currently we only look at the software queue, but with support
for multiple maps, we should also look at the hardware queue.
This is important since we'll flush out the request list if
either the software queue or hardware queue don't match.

This sorts by software queue first, then hardware queue if
that differs. Finally we sort by request location like before.
This minimizes the flush points per plug list.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: cleanup and improve list insertion
Jens Axboe [Tue, 30 Oct 2018 17:31:51 +0000 (11:31 -0600)]
blk-mq: cleanup and improve list insertion

It's somewhat strange to have a list insertion function that
relies on the fact that the caller has mapped things correctly.
Pass in the hardware queue directly for insertion, which makes
for a much cleaner interface and implementation.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: cache request hardware queue mapping
Jens Axboe [Mon, 29 Oct 2018 21:06:13 +0000 (15:06 -0600)]
blk-mq: cache request hardware queue mapping

We call blk_mq_map_queue() a lot, at least two times for each
request per IO, sometimes more. Since we now have an indirect
call as well in that function. cache the mapping so we don't
have to re-call blk_mq_map_queue() for the same request
multiple times.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: separate number of hardware queues from nr_cpu_ids
Jens Axboe [Mon, 29 Oct 2018 19:25:27 +0000 (13:25 -0600)]
blk-mq: separate number of hardware queues from nr_cpu_ids

With multiple maps, nr_cpu_ids is no longer the maximum number of
hardware queues we support on a given devices. The initializer of
the tag_set can have set ->nr_hw_queues larger than the available
number of CPUs, since we can exceed that with multiple queue maps.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: support multiple hctx maps
Jens Axboe [Tue, 30 Oct 2018 16:36:06 +0000 (10:36 -0600)]
blk-mq: support multiple hctx maps

Add support for the tag set carrying multiple queue maps, and
for the driver to inform blk-mq how many it wishes to support
through setting set->nr_maps.

This adds an mq_ops helper for drivers that support more than 1
map, mq_ops->rq_flags_to_type(). The function takes request/bio
flags and CPU, and returns a queue map index for that. We then
use the type information in blk_mq_map_queue() to index the map
set.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: add 'type' attribute to the sysfs hctx directory
Jens Axboe [Thu, 25 Oct 2018 14:58:14 +0000 (08:58 -0600)]
blk-mq: add 'type' attribute to the sysfs hctx directory

It can be useful for a user to verify what type a given hardware
queue is, expose this information in sysfs.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: allow software queue to map to multiple hardware queues
Jens Axboe [Mon, 29 Oct 2018 19:13:29 +0000 (13:13 -0600)]
blk-mq: allow software queue to map to multiple hardware queues

The mapping used to be dependent on just the CPU location, but
now it's a tuple of (type, cpu) instead. This is a prep patch
for allowing a single software queue to map to multiple hardware
queues. No functional changes in this patch.

This changes the software queue count to an unsigned short
to save a bit of space. We can still support 64K-1 CPUs,
which should be enough. Add a check to catch a wrap.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: pass in request/bio flags to queue mapping
Jens Axboe [Mon, 29 Oct 2018 19:11:38 +0000 (13:11 -0600)]
blk-mq: pass in request/bio flags to queue mapping

Prep patch for being able to place request based not just on
CPU location, but also on the type of request.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: provide dummy blk_mq_map_queue_type() helper
Jens Axboe [Mon, 29 Oct 2018 19:07:33 +0000 (13:07 -0600)]
blk-mq: provide dummy blk_mq_map_queue_type() helper

Doesn't do anything right now, but it's needed as a prep patch
to get the interfaces right.

While in there, correct the blk_mq_map_queue() CPU type to an unsigned
int.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: abstract out queue map
Jens Axboe [Mon, 29 Oct 2018 19:06:14 +0000 (13:06 -0600)]
blk-mq: abstract out queue map

This is in preparation for allowing multiple sets of maps per
queue, if so desired.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: kill q->mq_map
Jens Axboe [Tue, 16 Oct 2018 20:23:06 +0000 (14:23 -0600)]
blk-mq: kill q->mq_map

It's just a pointer to set->mq_map, use that instead. Move the
assignment a bit earlier, so we always know it's valid.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>