Jens Axboe [Thu, 21 Nov 2019 17:53:47 +0000 (10:53 -0700)]
Merge branch 'nvme-5.5' of git://git.infradead.org/nvme into for-5.5/drivers-post
Pull NVMe changes from Keith:
"- The only new feature is the optional hwmon support for nvme (Guenter
and Akinobu)
- A universal work-around for controllers reading discard payloads
beyond the range boundary (Eduard)
- Chaitanya graciously agreed to share the target driver maintenance"
* 'nvme-5.5' of git://git.infradead.org/nvme:
nvme: hwmon: add quirk to avoid changing temperature threshold
nvme: hwmon: provide temperature min and max values for each sensor
nvmet: add another maintainer
nvme: Discard workaround for non-conformant devices
nvme: Add hardware monitoring support
Akinobu Mita [Thu, 14 Nov 2019 15:40:01 +0000 (00:40 +0900)]
nvme: hwmon: add quirk to avoid changing temperature threshold
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing
the value of the temperature threshold feature for specific devices that
show undesirable behavior.
Guenter reported:
"On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the
Composite temperature sensor results in a temperature warning, and that
warning is sticky until I reset the controller.
It doesn't seem to matter which temperature I write; writing -273000 has
the same result."
The Intel NVMe has the latest firmware version installed, so this isn't
a problem that was ever fixed.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Akinobu Mita [Thu, 14 Nov 2019 15:40:00 +0000 (00:40 +0900)]
nvme: hwmon: provide temperature min and max values for each sensor
According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure. The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).
This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.
The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible. Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.
Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold. Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.
Example output from the "sensors" command:
nvme-pci-0100
Adapter: PCI adapter
Composite: +33.9°C (low = -273.1°C, high = +69.8°C)
(crit = +79.8°C)
Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C)
This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Christoph Hellwig [Sat, 16 Nov 2019 17:50:38 +0000 (18:50 +0100)]
nvmet: add another maintainer
Sagi and I have been pretty busy lately, and Chaitanya has been
helping a lot with target work and agreed to share the load.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Eduard Hasenleithner [Tue, 12 Nov 2019 20:55:01 +0000 (21:55 +0100)]
nvme: Discard workaround for non-conformant devices
Users observe IOMMU related errors when performing discard on nvme from
non-compliant nvme devices reading beyond the end of the DMA mapped
ranges to discard.
Two different variants of this behavior have been observed: SM22XX
controllers round up the read size to a multiple of 512 bytes, and Phison
E12 unconditionally reads the maximum discard size allowed by the spec
(256 segments or 4kB).
Make nvme_setup_discard unconditionally allocate the maximum DSM buffer
so the driver DMA maps a memory range that will always succeed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202665
Signed-off-by: Eduard Hasenleithner <eduard@hasenleithner.at>
[changelog, use existing define, kernel coding style]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Guenter Roeck [Wed, 6 Nov 2019 14:35:18 +0000 (06:35 -0800)]
nvme: Add hardware monitoring support
nvme devices report temperature information in the controller information
(for limits) and in the smart log. Currently, the only means to retrieve
this information is the nvme command line interface, which requires
super-user privileges.
At the same time, it would be desirable to be able to use NVMe temperature
information for thermal control.
This patch adds support to read NVMe temperatures from the kernel using the
hwmon API and adds temperature zones for NVMe drives. The thermal subsystem
can use this information to set thermal policies, and userspace can access
it using libsensors and/or the "sensors" command.
Example output from the "sensors" command:
nvme0-pci-0100
Adapter: PCI adapter
Composite: +39.0°C (high = +85.0°C, crit = +85.0°C)
Sensor 1: +39.0°C
Sensor 2: +41.0°C
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Ajay Joshi [Sun, 27 Oct 2019 14:05:47 +0000 (23:05 +0900)]
scsi: sd_zbc: add zone open, close, and finish support
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.
Contains contributions from Matias Bjorling, Hans Holmberg,
Keith Busch and Damien Le Moal.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 7 Nov 2019 13:45:53 +0000 (06:45 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/mkp/scsi into for-5.5/drivers-post
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: core: Handle drivers which set sg_tablesize to zero
scsi: qla2xxx: fix NPIV tear down process
scsi: sd_zbc: Fix sd_zbc_complete()
Jens Axboe [Thu, 7 Nov 2019 13:43:18 +0000 (06:43 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi into for-5.5/drivers-post
SCSI fixes on
20191101
Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
and one core change in sd that fixes an I/O failure on DIF type 3
devices.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
scsi: qla2xxx: stop timer in shutdown path
scsi: sd: define variable dif as unsigned int instead of bool
scsi: target: cxgbit: Fix cxgbit_fw4_ack()
scsi: qla2xxx: Fix partial flash write of MBI
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
scsi: lpfc: Honor module parameter lpfc_use_adisc
scsi: ufs-bsg: Wake the device before sending raw upiu commands
scsi: lpfc: Check queue pointer before use
scsi: qla2xxx: fixup incorrect usage of host_byte
scsi: lpfc: remove left-over BUILD_NVME defines
scsi: core: try to get module before removing device
scsi: hpsa: add missing hunks in reset-patch
scsi: target: core: Do not overwrite CDB byte 1
scsi: ch: Make it possible to open a ch device multiple times again
scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE
scsi: sni_53c710: fix compilation error
scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions
scsi: qla2xxx: fix a potential NULL pointer dereference
scsi: MAINTAINERS: Update qla2xxx driver
scsi: zfcp: fix reaction on bit error threshold notification
...
Ajay Joshi [Sun, 27 Oct 2019 14:05:49 +0000 (23:05 +0900)]
null_blk: add zone open, close, and finish support
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.
Contains contributions from Matias Bjorling, Hans Holmberg,
Keith Busch and Damien Le Moal.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ajay Joshi [Sun, 27 Oct 2019 14:05:48 +0000 (23:05 +0900)]
dm: add zone open, close and finish support
Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.
Contains contributions from Matias Bjorling, Hans Holmberg and
Damien Le Moal.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 7 Nov 2019 13:33:05 +0000 (06:33 -0700)]
Merge branch 'for-5.5/block' into for-5.5/drivers
Pull in dependencies for the new zoned open/close/finish support.
* for-5.5/block: (32 commits)
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
block: Warn if elevator= parameter is used
block: avoid blk_bio_segment_split for small I/O operations
blk-mq: make sure that line break can be printed
block: sed-opal: Introduce Opal Datastore UID
block: sed-opal: Add support to read/write opal tables generically
block: sed-opal: Generalizing write data to any opal table
bdev: Refresh bdev size for disks without partitioning
bdev: Factor out bdev revalidation into a common helper
blk-mq: avoid sysfs buffer overflow with too many CPU cores
blk-mq: Make blk_mq_run_hw_queue() return void
fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name
blk-mq: fill header with kernel-doc
blk-mq: remove needless goto from blk_mq_get_driver_tag
block: reorder bio::__bi_remaining for better packing
block: Reduce the amount of memory used for tag sets
block: Reduce the amount of memory required per request queue
...
Ajay Joshi [Sun, 27 Oct 2019 14:05:46 +0000 (23:05 +0900)]
block: add zone open, close and finish ioctl support
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and
BLKFINISHZONE to allow applications to control the condition of zones
on a zoned block device through the execution of the REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ajay Joshi [Sun, 27 Oct 2019 14:05:45 +0000 (23:05 +0900)]
block: add zone open, close and finish operations
Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
actively be written
* Close a zone: Transition to closed condition to release the drive
resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
condition to prevent write operations
To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.
Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Sun, 27 Oct 2019 14:05:43 +0000 (23:05 +0900)]
block: Simplify REQ_OP_ZONE_RESET_ALL handling
There is no need for the function __blkdev_reset_all_zones() as
REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones()
bio loop with an early break from the loop. This patch removes this
function and modifies blkdev_reset_zones(), simplifying the code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Sun, 27 Oct 2019 14:05:42 +0000 (23:05 +0900)]
block: Remove REQ_OP_ZONE_RESET plugging
REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests
do not have a size and are never sequential due to the zone start sector
position required for their execution. As a result, there is no point in
using a plug around blkdev_reset_zones() bio issuing loop. This patch
removes this unnecessary plugging.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Wed, 6 Nov 2019 10:48:57 +0000 (11:48 +0100)]
block: Warn if elevator= parameter is used
With transition to blk-mq, the elevator= kernel argument was removed as
it makes less and less sense with the current variety of devices. Since
this may surprise some users and there are advices on the Internet that
still suggest to use it, let's at least warn if the parameter is used.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michael Schmitz [Tue, 5 Nov 2019 02:49:10 +0000 (15:49 +1300)]
scsi: core: Handle drivers which set sg_tablesize to zero
In scsi_mq_setup_tags(), cmd_size is calculated based on zero size for the
scatter-gather list in case the low level driver uses SG_NONE in its host
template.
cmd_size is passed on to the block layer for calculation of the request
size, and we've seen NULL pointer dereference errors from the block layer
in drivers where SG_NONE is used and a mq IO scheduler is active,
apparently as a consequence of this (see commit
68ab2d76e4be ("scsi:
cxlflash: Set sg_tablesize to 1 instead of SG_NONE"), and a recent patch by
Finn Thain converting the three m68k NFR5380 drivers to avoid setting
SG_NONE).
Try to avoid these errors by accounting for at least one sg list entry when
calculating cmd_size, regardless of whether the low level driver set a zero
sg_tablesize.
Tested on 030 m68k with the atari_scsi driver - setting sg_tablesize to
SG_NONE no longer results in a crash when loading this driver.
CC: Finn Thain <fthain@telegraphics.com.au>
Link: https://lore.kernel.org/r/1572922150-4358-1-git-send-email-schmitzmic@gmail.com
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Martin Wilck [Tue, 5 Nov 2019 14:56:00 +0000 (14:56 +0000)]
scsi: qla2xxx: fix NPIV tear down process
Fix two issues with commit
f5187b7d1ac6 ("scsi: qla2xxx: Optimize NPIV
tear down process"): a missing negation in a wait_event_timeout()
condition, and a missing loop end condition.
Fixes: f5187b7d1ac6 ("scsi: qla2xxx: Optimize NPIV tear down process")
Link: https://lore.kernel.org/r/20191105145550.10268-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal [Sun, 27 Oct 2019 14:05:44 +0000 (23:05 +0900)]
scsi: sd_zbc: Fix sd_zbc_complete()
The ILLEGAL REQUEST/INVALID FIELD IN CDB error generated by an attempt to
reset a conventional zone does not apply to the reset write pointer command
with the ALL bit set, that is, to REQ_OP_ZONE_RESET_ALL requests. Fix
sd_zbc_complete() to be quiet only in the case of REQ_OP_ZONE_RESET,
excluding REQ_OP_ZONE_RESET_ALL.
Since REQ_OP_ZONE_RESET is the only request handled by sd_zbc_complete(),
also simplify the code using a simple if statement.
[mkp: applied by hand]
Fixes: d81e9d494354 ("scsi: implement REQ_OP_ZONE_RESET_ALL")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191027140549.26272-4-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Christoph Hellwig [Mon, 4 Nov 2019 18:05:43 +0000 (10:05 -0800)]
block: avoid blk_bio_segment_split for small I/O operations
__blk_queue_split() adds significant overhead for small I/O operations.
Add a shortcut to avoid it for cases where we know we never need to
split.
Based on a patch from Ming Lei.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Prabhath Sajeepa [Mon, 28 Oct 2019 22:56:48 +0000 (16:56 -0600)]
nvme: Fix parsing of ANA log page
Check validity of offset into ANA log buffer before accessing
nvme_ana_group_desc. This check ensures the size of ANA log buffer >=
offset + sizeof(nvme_ana_group_desc)
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Prabhath Sajeepa <psajeepa@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Tue, 29 Oct 2019 07:12:23 +0000 (08:12 +0100)]
nvmet: stop using bio_set_op_attrs
bio_set_op_attrs has been long deprecated, replace it with a direct
assignment of the flags to bio->bi_opf.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Mon, 28 Oct 2019 18:23:26 +0000 (11:23 -0700)]
nvmet: add plugging for read/write when ns is bdev
With reference to the following issue reported on the mailing list :-
http://lists.infradead.org/pipermail/linux-nvme/2019-October/027604.html
this patch adds plugging for the bdev-ns under nvmet_bdev_execute_rw().
We can see the following performance improvement in random write
workload I/Os with the setup described in the link when device_path
configured as /dev/md0.
Without this patch :-
write: IOPS=40.8k, BW=159MiB/s (167MB/s)(4777MiB/30002msec)
write: IOPS=41.2k, BW=161MiB/s (169MB/s)(4831MiB/30011msec)
slat (usec): min=8, max=10823, avg=15.64, stdev=16.85
slat (usec): min=8, max=401, avg=15.40, stdev= 9.56
clat (usec): min=54, max=2492, avg=759.07, stdev=172.62
clat (usec): min=56, max=1997, avg=768.06, stdev=178.72
With this patch :-
write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30011msec)
write: IOPS=123k, BW=481MiB/s (504MB/s)(14.1GiB/30002msec)
slat (usec): min=8, max=9941, avg=13.31, stdev= 8.04
slat (usec): min=8, max=289, avg=13.31, stdev= 3.37
clat (usec): min=43, max=17635, avg=245.46, stdev=171.23
clat (usec): min=44, max=17751, avg=245.25, stdev=183.14
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 25 Oct 2019 13:38:58 +0000 (15:38 +0200)]
nvmet: clean up command parsing a bit
Move the special cases for fabrics commands and the discovery controller
to nvmet_parse_admin_cmd in preparation for adding passthrough support.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Geert Uytterhoeven [Thu, 24 Oct 2019 15:24:00 +0000 (17:24 +0200)]
nvme-pci: Spelling s/resdicovered/rediscovered/
Fix misspelling of "rediscovered".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sagi Grimberg [Thu, 24 Oct 2019 16:55:58 +0000 (09:55 -0700)]
nvmet: fill discovery controller sn, fr and mn correctly
Discovery controllers need this information as well.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Oct 2019 16:35:45 +0000 (10:35 -0600)]
nvmet: Open code nvmet_req_execute()
Now that nvmet_req_execute does nothing, open code it.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Oct 2019 16:35:44 +0000 (10:35 -0600)]
nvmet: Remove the data_len field from the nvmet_req struct
Instead of storing the expected length and checking it when it's
executed, just check the length inside the command themselves.
A new helper, nvmet_check_data_len() is created to help with this
check.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, udpate changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Oct 2019 16:35:43 +0000 (10:35 -0600)]
nvmet: Introduce nvmet_dsm_len() helper
Similar to the nvmet_rw_len helper.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Oct 2019 16:35:42 +0000 (10:35 -0600)]
nvmet: Cleanup discovery execute handlers
Push the lid and cns check into their respective handlers and, while
we're at it, rename the functions to be consistent with other
discovery handlers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 23 Oct 2019 16:35:41 +0000 (10:35 -0600)]
nvmet: Introduce common execute function for get_log_page and identify
Instead of picking the sub-command handler to execute in a nested
switch statement introduce a landing functions that calls out
to the appropriate sub-command handler.
This will allow us to have a common place in the handler to check
the transfer length in a future patch.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[split patch, update change log]
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Logan Gunthorpe [Wed, 23 Oct 2019 16:35:40 +0000 (10:35 -0600)]
nvmet-tcp: Don't set the request's data_len
It's not apprporiate for the transports to set the data_len
field of the request which is only used by the core.
In this case, just use a variable on the stack to store the
length of the sgl for comparison.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Logan Gunthorpe [Wed, 23 Oct 2019 16:35:39 +0000 (10:35 -0600)]
nvmet-tcp: Don't check data_len in nvmet_tcp_map_data()
None of the other transports check data_len which is verified
in core code. The function should instead check that the sgl length
is non-zero.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Mon, 21 Oct 2019 03:40:04 +0000 (12:40 +0900)]
nvme: Introduce nvme_lba_to_sect()
Introduce the new helper function nvme_lba_to_sect() to convert a device
logical block number to a 512B sector number. Use this new helper in
obvious places, cleaning up the code.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Damien Le Moal [Mon, 21 Oct 2019 03:40:03 +0000 (12:40 +0900)]
nvme: Cleanup and rename nvme_block_nr()
Rename nvme_block_nr() to nvme_sect_to_lba() and use SECTOR_SHIFT
instead of its hard coded value 9. Also add a comment to decribe this
helper.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revanth Rajashekar [Mon, 14 Oct 2019 17:16:07 +0000 (11:16 -0600)]
nvme: resync include/linux/nvme.h with nvmecli
Update enumerations and structures in include/linux/nvme.h
to resync with the nvmecli.
All the updates are mentioned in the ratified NVMe 1.4 spec
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Max Gurtovoy [Sun, 13 Oct 2019 16:57:36 +0000 (19:57 +0300)]
nvme: move common call to nvme_cleanup_cmd to core layer
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Max Gurtovoy [Sun, 13 Oct 2019 16:57:35 +0000 (19:57 +0300)]
nvme: introduce "Command Aborted By host" status code
Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."
Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Israel Rukshin [Sun, 13 Oct 2019 16:57:34 +0000 (19:57 +0300)]
nvmet-rdma: add unlikely check at nvmet_rdma_map_sgl_keyed
The calls to nvmet_req_alloc_sgl and rdma_rw_ctx_init should usually
succeed, so add this simple optimization to the fast path.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Israel Rukshin [Sun, 13 Oct 2019 16:57:33 +0000 (19:57 +0300)]
nvmet: add unlikely check at nvmet_req_alloc_sgl
The call to sgl_alloc shouldn't fail so add this simple optimization to
the fast path.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Israel Rukshin [Sun, 13 Oct 2019 16:57:32 +0000 (19:57 +0300)]
nvmet: use bio_io_error instead of duplicating it
This commit doesn't change any logic.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Israel Rukshin [Sun, 13 Oct 2019 16:57:31 +0000 (19:57 +0300)]
nvme: introduce nvme_is_aen_req function
This function improves code readability and reduces code duplication.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Fri, 27 Sep 2019 21:27:22 +0000 (14:27 -0700)]
nvme-fc: ensure association_id is cleared regardless of a Disconnect LS
Code today only clears the association_id if a Disconnect LS is transmit.
Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Fri, 27 Sep 2019 22:27:11 +0000 (15:27 -0700)]
nvme-fc: clarify error messages
Change wording on a couple of messages to clarify what happened.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Fri, 27 Sep 2019 21:51:36 +0000 (14:51 -0700)]
nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu
Set the new category field in the FC-NVME CMND_IU based on queue number.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Fri, 27 Sep 2019 21:51:35 +0000 (14:51 -0700)]
nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes
Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.
Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
James Smart [Fri, 27 Sep 2019 21:51:34 +0000 (14:51 -0700)]
nvme-fc: Sync nvme-fc header to FC-NVME-2
Sync the header to FC-NVME-2 r1.06 (T11-2019-00210-v001).
Includes some minor mods where pre-release field names changed
by the time the spec was released.
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Mon, 4 Nov 2019 08:26:53 +0000 (16:26 +0800)]
blk-mq: make sure that line break can be printed
8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.
Fixes: 8962842ca5ab ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revanth Rajashekar [Thu, 31 Oct 2019 16:13:22 +0000 (10:13 -0600)]
block: sed-opal: Introduce Opal Datastore UID
This patch introduces Opal Datastore UID.
The generic read/write table ioctl can use this UID
to access the Opal Datastore.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revanth Rajashekar [Thu, 31 Oct 2019 16:13:21 +0000 (10:13 -0600)]
block: sed-opal: Add support to read/write opal tables generically
This feature gives the user RW access to any opal table with admin1
authority. The flags described in the new structure determines if the user
wants to read/write the data. Flags are checked for valid values in
order to allow future features to be added to the ioctl.
The user can provide the desired table's UID. Also, the ioctl provides a
size and offset field and internally will loop data accesses to return
the full data block. Read overrun is prevented by the initiator's
sec_send_recv() backend. The ioctl provides a private field with the
intention to accommodate any future expansions to the ioctl.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Revanth Rajashekar [Thu, 31 Oct 2019 16:13:20 +0000 (10:13 -0600)]
block: sed-opal: Generalizing write data to any opal table
This patch refactors the existing "write_shadowmbr" func and
creates a new generalized function "generic_table_write_data",
to write data to any opal table. Also, a few cleanups are included
in this patch.
Reviewed-by: Scott Bauer <sbauer@plzdonthack.me>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Mon, 21 Oct 2019 08:38:00 +0000 (10:38 +0200)]
bdev: Refresh bdev size for disks without partitioning
Currently, block device size in not updated on second and further open
for block devices where partition scan is disabled. This is particularly
annoying for example for DVD drives as that means block device size does
not get updated once the media is inserted into a drive if the device is
already open when inserting the media. This is actually always the case
for example when pktcdvd is in use.
Fix the problem by revalidating block device size on every open even for
devices with partition scan disabled.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Mon, 21 Oct 2019 08:37:59 +0000 (10:37 +0200)]
bdev: Factor out bdev revalidation into a common helper
Factor out code handling revalidation of bdev on disk change into a
common helper.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Sat, 2 Nov 2019 08:02:15 +0000 (16:02 +0800)]
blk-mq: avoid sysfs buffer overflow with too many CPU cores
It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.
Use snprintf to avoid the potential buffer overflow.
This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.
Cc: stable@vger.kernel.org
Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Darrick J. Wong [Thu, 31 Oct 2019 03:29:48 +0000 (20:29 -0700)]
loop: fix no-unmap write-zeroes request behavior
Currently, if the loop device receives a WRITE_ZEROES request, it asks
the underlying filesystem to punch out the range. This behavior is
correct if unmapping is allowed. However, a NOUNMAP request means that
the caller doesn't want us to free the storage backing the range, so
punching out the range is incorrect behavior.
To satisfy a NOUNMAP | WRITE_ZEROES request, loop should ask the
underlying filesystem to FALLOC_FL_ZERO_RANGE, which is (according to
the fallocate documentation) required to ensure that the entire range is
backed by real storage, which suffices for our purposes.
Fixes: 19372e2769179dd ("loop: implement REQ_OP_WRITE_ZEROES")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
John Garry [Tue, 29 Oct 2019 16:59:30 +0000 (00:59 +0800)]
blk-mq: Make blk_mq_run_hw_queue() return void
Since commit
97889f9ac24f ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()"), the return value of blk_mq_run_hw_queue()
is never checked, so make it return void, which very marginally simplifies
the code.
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nicholas Piggin [Thu, 24 Oct 2019 06:38:04 +0000 (16:38 +1000)]
scsi: qla2xxx: stop timer in shutdown path
In shutdown/reboot paths, the timer is not stopped:
qla2x00_shutdown
pci_device_shutdown
device_shutdown
kernel_restart_prepare
kernel_restart
sys_reboot
This causes lockups (on powerpc) when firmware config space access calls
are interrupted by smp_send_stop later in reboot.
Fixes: e30d1756480dc ("[SCSI] qla2xxx: Addition of shutdown callback handler.")
Link: https://lore.kernel.org/r/20191024063804.14538-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Geert Uytterhoeven [Thu, 24 Oct 2019 15:16:41 +0000 (17:16 +0200)]
block: mtip32xx: Spelling s/configration/configuration/
Fix misspelling of "configuration".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Eugene Syromiatnikov [Fri, 20 Sep 2019 15:58:21 +0000 (17:58 +0200)]
fcntl: fix typo in RWH_WRITE_LIFE_NOT_SET r/w hint name
According to commit message in the original commit
c75b1d9421f8 ("fs:
add fcntl() interface for setting/getting write life time hints"),
as well as userspace library[1] and man page update[2], R/W hint constants
are intended to have RWH_* prefix. However, RWF_WRITE_LIFE_NOT_SET retained
"RWF_*" prefix used in the early versions of the proposed patch set[3].
Rename it and provide the old name as a synonym for the new one
for backward compatibility.
[1] https://github.com/axboe/fio/commit/
bd553af6c849
[2] https://github.com/mkerrisk/man-pages/commit/
580082a186fd
[3] https://www.mail-archive.com/linux-block@vger.kernel.org/msg09638.html
Fixes: c75b1d9421f8 ("fs: add fcntl() interface for setting/getting write life time hints")
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
André Almeida [Tue, 22 Oct 2019 00:07:24 +0000 (21:07 -0300)]
blk-mq: fill header with kernel-doc
Insert documentation for structs, enums and functions at header file.
Format existing and new comments at struct blk_mq_ops as
kernel-doc comments.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
André Almeida [Fri, 25 Oct 2019 20:16:51 +0000 (14:16 -0600)]
blk-mq: remove needless goto from blk_mq_get_driver_tag
The only usage of the label "done" is when (rq->tag != -1) at the
beginning of the function. Rather than jumping to label, we can just
remove this label and execute the code at the "if". Besides that, the
code that would be executed after the label "done" is the return of the
logical expression (rq->tag != -1) but since we are already inside the
if, we now that this is true. Remove the label and replace the goto with
the proper result of the label.
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David Sterba [Thu, 24 Oct 2019 17:31:10 +0000 (19:31 +0200)]
block: reorder bio::__bi_remaining for better packing
Simple reordering of __bi_remaining can reduce bio size by 8 bytes that
are now wasted on padding (measured on x86_64):
struct bio {
struct bio * bi_next; /* 0 8 */
struct gendisk * bi_disk; /* 8 8 */
unsigned int bi_opf; /* 16 4 */
short unsigned int bi_flags; /* 20 2 */
short unsigned int bi_ioprio; /* 22 2 */
short unsigned int bi_write_hint; /* 24 2 */
blk_status_t bi_status; /* 26 1 */
u8 bi_partno; /* 27 1 */
/* XXX 4 bytes hole, try to pack */
struct bvec_iter bi_iter; /* 32 24 */
/* XXX last struct has 4 bytes of padding */
atomic_t __bi_remaining; /* 56 4 */
/* XXX 4 bytes hole, try to pack */
[...]
/* size: 104, cachelines: 2, members: 19 */
/* sum members: 96, holes: 2, sum holes: 8 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 40 bytes */
};
Now becomes:
struct bio {
struct bio * bi_next; /* 0 8 */
struct gendisk * bi_disk; /* 8 8 */
unsigned int bi_opf; /* 16 4 */
short unsigned int bi_flags; /* 20 2 */
short unsigned int bi_ioprio; /* 22 2 */
short unsigned int bi_write_hint; /* 24 2 */
blk_status_t bi_status; /* 26 1 */
u8 bi_partno; /* 27 1 */
atomic_t __bi_remaining; /* 28 4 */
struct bvec_iter bi_iter; /* 32 24 */
/* XXX last struct has 4 bytes of padding */
[...]
/* size: 96, cachelines: 2, members: 19 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 32 bytes */
};
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 25 Oct 2019 16:50:10 +0000 (09:50 -0700)]
block: Reduce the amount of memory used for tag sets
Instead of allocating an array of size nr_cpu_ids for set->tags, allocate
an array of size set->nr_hw_queues. This patch improves behavior that was
introduced by commit
868f2f0b7206 ("blk-mq: dynamic h/w context count").
Reallocating tag sets from inside __blk_mq_update_nr_hw_queues() is safe
because:
- All request queues that share the tag sets are frozen before the tag sets
are reallocated.
- blk_mq_queue_tag_busy_iter() holds q->q_usage_counter while active and
hence is serialized against __blk_mq_update_nr_hw_queues().
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 25 Oct 2019 16:50:09 +0000 (09:50 -0700)]
block: Reduce the amount of memory required per request queue
Instead of always allocating at least nr_cpu_ids hardware queues per request
queue, reallocate q->queue_hw_ctx if it has to grow. This patch improves
behavior that was introduced by commit
868f2f0b7206 ("blk-mq: dynamic h/w
context count").
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Fri, 25 Oct 2019 16:50:08 +0000 (09:50 -0700)]
block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()
Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues()
already serialize __blk_mq_update_nr_hw_queues() against
blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in
__blk_mq_update_nr_hw_queues() is not necessary. Hence remove it.
Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was
introduced by commit
f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter"). Commit
530ca2c9bd69 ("blk-mq: Allow blocking
queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that
correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues().
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Xiang Chen [Tue, 22 Oct 2019 06:27:08 +0000 (14:27 +0800)]
scsi: sd: define variable dif as unsigned int instead of bool
Variable dif in function sd_setup_read_write_cmnd() is the return value of
function scsi_host_dif_capable() which returns dif capability of disks. If
define it as bool, even for the disks which support DIF3, the function
still return dif=1, which causes IO error. So define variable dif as
unsigned int instead of bool.
Fixes: e249e42d277e ("scsi: sd: Clean up sd_setup_read_write_cmnd()")
Link: https://lore.kernel.org/r/1571725628-132736-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bart Van Assche [Wed, 23 Oct 2019 20:21:50 +0000 (13:21 -0700)]
scsi: target: cxgbit: Fix cxgbit_fw4_ack()
Use the pointer 'p' after having tested that pointer instead of before.
Fixes: 5cadafb236df ("target/cxgbit: Fix endianness annotations")
Cc: Varun Prakash <varun@chelsio.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191023202150.22173-1-bvanassche@acm.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Jens Axboe [Thu, 24 Oct 2019 22:31:57 +0000 (16:31 -0600)]
Merge branch 'md-next' of https://git./linux/kernel/git/song/md into for-5.5/drivers
Pull MD changes from Song.
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: no longer compare spare disk superblock events in super_load
md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
md/raid0: Fix an error message in raid0_make_request()
Yufen Yu [Wed, 16 Oct 2019 08:00:03 +0000 (16:00 +0800)]
md: no longer compare spare disk superblock events in super_load
We have a test case as follow:
mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] \
--assume-clean --bitmap=internal
mdadm -S /dev/md1
mdadm -A /dev/md1 /dev/sd[b-c] --run --force
mdadm --zero /dev/sda
mdadm /dev/md1 -a /dev/sda
echo offline > /sys/block/sdc/device/state
echo offline > /sys/block/sdb/device/state
sleep 5
mdadm -S /dev/md1
echo running > /sys/block/sdb/device/state
echo running > /sys/block/sdc/device/state
mdadm -A /dev/md1 /dev/sd[a-c] --run --force
When we readd /dev/sda to the array, it started to do recovery.
After offline the other two disks in md1, the recovery have
been interrupted and superblock update info cannot be written
to the offline disks. While the spare disk (/dev/sda) can continue
to update superblock info.
After stopping the array and assemble it, we found the array
run fail, with the follow kernel message:
[ 172.986064] md: kicking non-fresh sdb from array!
[ 173.004210] md: kicking non-fresh sdc from array!
[ 173.022383] md/raid1:md1: active with 0 out of 4 mirrors
[ 173.022406] md1: failed to create bitmap (-5)
[ 173.023466] md: md1 stopped.
Since both sdb and sdc have the value of 'sb->events' smaller than
that in sda, they have been kicked from the array. However, the only
remained disk sda is in 'spare' state before stop and it cannot be
added to conf->mirrors[] array. In the end, raid array assemble
and run fail.
In fact, we can use the older disk sdb or sdc to assemble the array.
That means we should not choose the 'spare' disk as the fresh disk in
analyze_sbs().
To fix the problem, we do not compare superblock events when it is
a spare disk, as same as validate_super.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
David Jeffery [Mon, 16 Sep 2019 17:15:14 +0000 (13:15 -0400)]
md: improve handling of bio with REQ_PREFLUSH in md_flush_request()
If pers->make_request fails in md_flush_request(), the bio is lost. To
fix this, pass back a bool to indicate if the original make_request call
should continue to handle the I/O and instead of assuming the flush logic
will push it to completion.
Convert md_flush_request to return a bool and no longer calls the raid
driver's make_request function. If the return is true, then the md flush
logic has or will complete the bio and the md make_request call is done.
If false, then the md make_request function needs to keep processing like
it is a normal bio. Let the original call to md_handle_request handle any
need to retry sending the bio to the raid driver's make_request function
should it be needed.
Also mark md_flush_request and the make_request function pointer as
__must_check to issue warnings should these critical return values be
ignored.
Fixes: 2bc13b83e629 ("md: batch flush requests.")
Cc: stable@vger.kernel.org # # v4.19+
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Thu, 26 Sep 2019 11:53:50 +0000 (13:53 +0200)]
md/bitmap: avoid race window between md_bitmap_resize and bitmap_file_clear_bit
We need to move "spin_lock_irq(&bitmap->counts.lock)" before unmap previous
storage, otherwise panic like belows could happen as follows.
[ 902.353802] sdl: detected capacity change from
1077936128 to
3221225472
[ 902.616948] general protection fault: 0000 [#1] SMP
[snip]
[ 902.618588] CPU: 12 PID: 33698 Comm: md0_raid1 Tainted: G O 4.14.144-1-pserver #4.14.144-1.1~deb10
[ 902.618870] Hardware name: Supermicro SBA-7142G-T4/BHQGE, BIOS 3.00 10/24/2012
[ 902.619120] task:
ffff9ae1860fc600 task.stack:
ffffb52e4c704000
[ 902.619301] RIP: 0010:bitmap_file_clear_bit+0x90/0xd0 [md_mod]
[ 902.619464] RSP: 0018:
ffffb52e4c707d28 EFLAGS:
00010087
[ 902.619626] RAX:
ffe8008b0d061000 RBX:
ffff9ad078c87300 RCX:
0000000000000000
[ 902.619792] RDX:
ffff9ad986341868 RSI:
0000000000000803 RDI:
ffff9ad078c87300
[ 902.619986] RBP:
ffff9ad0ed7a8000 R08:
0000000000000000 R09:
0000000000000000
[ 902.620154] R10:
ffffb52e4c707ec0 R11:
ffff9ad987d1ed44 R12:
ffff9ad0ed7a8360
[ 902.620320] R13:
0000000000000003 R14:
0000000000060000 R15:
0000000000000800
[ 902.620487] FS:
0000000000000000(0000) GS:
ffff9ad987d00000(0000) knlGS:
0000000000000000
[ 902.620738] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 902.620901] CR2:
000055ff12aecec0 CR3:
0000001005207000 CR4:
00000000000406e0
[ 902.621068] Call Trace:
[ 902.621256] bitmap_daemon_work+0x2dd/0x360 [md_mod]
[ 902.621429] ? find_pers+0x70/0x70 [md_mod]
[ 902.621597] md_check_recovery+0x51/0x540 [md_mod]
[ 902.621762] raid1d+0x5c/0xeb0 [raid1]
[ 902.621939] ? try_to_del_timer_sync+0x4d/0x80
[ 902.622102] ? del_timer_sync+0x35/0x40
[ 902.622265] ? schedule_timeout+0x177/0x360
[ 902.622453] ? call_timer_fn+0x130/0x130
[ 902.622623] ? find_pers+0x70/0x70 [md_mod]
[ 902.622794] ? md_thread+0x94/0x150 [md_mod]
[ 902.622959] md_thread+0x94/0x150 [md_mod]
[ 902.623121] ? wait_woken+0x80/0x80
[ 902.623280] kthread+0x119/0x130
[ 902.623437] ? kthread_create_on_node+0x60/0x60
[ 902.623600] ret_from_fork+0x22/0x40
[ 902.624225] RIP: bitmap_file_clear_bit+0x90/0xd0 [md_mod] RSP:
ffffb52e4c707d28
Because mdadm was running on another cpu to do resize, so bitmap_resize was
called to replace bitmap as below shows.
PID: 38801 TASK:
ffff9ad074a90e00 CPU: 0 COMMAND: "mdadm"
[exception RIP: queued_spin_lock_slowpath+56]
[snip]
-- <NMI exception stack> --
#5 [
ffffb52e60f17c58] queued_spin_lock_slowpath at
ffffffff9c0b27b8
#6 [
ffffb52e60f17c58] bitmap_resize at
ffffffffc0399877 [md_mod]
#7 [
ffffb52e60f17d30] raid1_resize at
ffffffffc0285bf9 [raid1]
#8 [
ffffb52e60f17d50] update_size at
ffffffffc038a31a [md_mod]
#9 [
ffffb52e60f17d70] md_ioctl at
ffffffffc0395ca4 [md_mod]
And the procedure to keep resize bitmap safe is allocate new storage
space, then quiesce, copy bits, replace bitmap, and re-start.
However the daemon (bitmap_daemon_work) could happen even the array is
quiesced, which means when bitmap_file_clear_bit is triggered by raid1d,
then it thinks it should be fine to access store->filemap since
counts->lock is held, but resize could change the storage without the
protection of the lock.
Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Dan Carpenter [Sat, 21 Sep 2019 06:00:31 +0000 (09:00 +0300)]
md/raid0: Fix an error message in raid0_make_request()
The first argument to WARN() is supposed to be a condition. The
original code will just print the mdname() instead of the full warning
message.
Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Quinn Tran [Tue, 22 Oct 2019 19:36:43 +0000 (12:36 -0700)]
scsi: qla2xxx: Fix partial flash write of MBI
For new adapters with multiple flash regions to write to, current code
allows FW & Boot regions to be written, while other regions are blocked via
sysfs. The fix is to block all flash read/write through sysfs interface.
Fixes: e81d1bcbde06 ("scsi: qla2xxx: Further limit FLASH region write access from SysFS")
Cc: stable@vger.kernel.org # 5.2
Link: https://lore.kernel.org/r/20191022193643.7076-3-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Girish Basrur <gbasrur@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Himanshu Madhani [Tue, 22 Oct 2019 19:36:42 +0000 (12:36 -0700)]
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
This patch fixes issue with Gen7 adapter in a blade environment where one
of the ports will not be detected by driver. Firmware expects mailbox 11 to
be set or cleared by driver for newer ISP.
Following message is seen in the log file:
[ 18.810892] qla2xxx [0000:d8:00.0]-1820:1: **** Failed=102 mb[0]=4005 mb[1]=37 mb[2]=20 mb[3]=8
[ 18.819596] cmd=2 ****
[mkp: typos]
Link: https://lore.kernel.org/r/20191022193643.7076-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daniel Wagner [Tue, 22 Oct 2019 07:21:12 +0000 (09:21 +0200)]
scsi: lpfc: Honor module parameter lpfc_use_adisc
The initial lpfc_desc_set_adisc implementation in commit
dea3101e0a5c ("lpfc: add Emulex FC driver version 8.0.28") enabled ADISC if
cfg_use_adisc && RSCN_MODE && FCP_2_DEVICE
In commit
92d7f7b0cde3 ("[SCSI] lpfc: NPIV: add NPIV support on top of
SLI-3") this changed to
(cfg_use_adisc && RSC_MODE) || FCP_2_DEVICE
and later in commit
ffc954936b13 ("[SCSI] lpfc 8.3.13: FC Discovery Fixes
and enhancements.") to
(cfg_use_adisc && RSC_MODE) || (FCP_2_DEVICE && FCP_TARGET)
A customer reports that after a devloss, an ADISC failure is logged. It
turns out the ADISC flag is set even the user explicitly set lpfc_use_adisc
= 0.
[Sat Dec 22 22:55:58 2018] lpfc 0000:82:00.0: 2:(0):0203 Devloss timeout on WWPN 50:01:43:80:12:8e:40:20 NPort x05df00 Data: x82000000 x8 xa
[Sat Dec 22 23:08:20 2018] lpfc 0000:82:00.0: 2:(0):2755 ADISC failure DID:05DF00 Status:x9/x70000
[mkp: fixed Hannes' email]
Fixes: 92d7f7b0cde3 ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Link: https://lore.kernel.org/r/20191022072112.132268-1-dwagner@suse.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avri Altman [Thu, 10 Oct 2019 08:31:07 +0000 (11:31 +0300)]
scsi: ufs-bsg: Wake the device before sending raw upiu commands
The scsi async probe process is calling blk_pm_runtime_init for each lun,
and then those request queues are monitored by the block layer pm
engine (blk-pm.c). This is however, not the case for scsi-passthrough
queues, created by bsg_setup_queue().
So the ufs-bsg driver might send various commands, disregarding the pm
status of the device. This is wrong, regardless if its request queue is
pm-aware or not.
Fixes: df032bf27a41 (scsi: ufs: Add a bsg endpoint that supports UPIUs)
Link: https://lore.kernel.org/r/1570696267-8487-1-git-send-email-avri.altman@wdc.com
Reported-by: Yuliy Izrailov <yuliy.izrailov@wdc.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Daniel Wagner [Fri, 18 Oct 2019 16:21:11 +0000 (18:21 +0200)]
scsi: lpfc: Check queue pointer before use
The queue pointer might not be valid. The rest of the code checks the
pointer before accessing it. lpfc_sli4_process_missed_mbox_completions is
the only place where the check is missing.
Fixes: 657add4e5e15 ("scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors")
Cc: James Smart <jsmart2021@gmail.com>
Link: https://lore.kernel.org/r/20191018162111.8798-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke [Fri, 18 Oct 2019 14:04:58 +0000 (16:04 +0200)]
scsi: qla2xxx: fixup incorrect usage of host_byte
DRIVER_ERROR is a a driver byte setting, not a host byte. The qla2xxx
driver should rather return DID_ERROR here to be in line with the other
drivers.
Link: https://lore.kernel.org/r/20191018140458.108278-1-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke [Thu, 17 Oct 2019 15:00:19 +0000 (17:00 +0200)]
scsi: lpfc: remove left-over BUILD_NVME defines
The BUILD_NVME define never got defined anywhere, causing NVMe commands to
be treated as SCSI commands when freeing the buffers. This was causing a
stuck discovery and a horrible crash in lpfc_set_rrq_active() later on.
Link: https://lore.kernel.org/r/20191017150019.75769-1-hare@suse.de
Fixes: c00f62e6c546 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Yufen Yu [Tue, 15 Oct 2019 13:05:56 +0000 (21:05 +0800)]
scsi: core: try to get module before removing device
We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.
And getting a invalid paging request BUG_ON as following:
[ 34.625854] BUG: unable to handle page fault for address:
ffffffffa0016bb8
[ 34.629189] Oops: 0000 [#1] SMP PTI
[ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473
[ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[ 34.643555] CR2:
ffffffffa0016bb8 CR3:
000000012cd88000 CR4:
00000000000006e0
[ 34.644545] Call Trace:
[ 34.644907] scsi_host_dev_release+0x6b/0x1f0
[ 34.645511] device_release+0x74/0x110
[ 34.646046] kobject_put+0x116/0x390
[ 34.646559] put_device+0x17/0x30
[ 34.647041] scsi_target_dev_release+0x2b/0x40
[ 34.647652] device_release+0x74/0x110
[ 34.648186] kobject_put+0x116/0x390
[ 34.648691] put_device+0x17/0x30
[ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360
[ 34.649953] execute_in_process_context+0x29/0x80
[ 34.650603] scsi_device_dev_release+0x20/0x30
[ 34.651221] device_release+0x74/0x110
[ 34.651732] kobject_put+0x116/0x390
[ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50
[ 34.652935] sdev_store_delete.cold.4+0x71/0x8f
[ 34.653579] dev_attr_store+0x1b/0x40
[ 34.654103] sysfs_kf_write+0x3d/0x60
[ 34.654603] kernfs_fop_write+0x174/0x250
[ 34.655165] __vfs_write+0x1f/0x60
[ 34.655639] vfs_write+0xc7/0x280
[ 34.656117] ksys_write+0x6d/0x140
[ 34.656591] __x64_sys_write+0x1e/0x30
[ 34.657114] do_syscall_64+0xb1/0x400
[ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 34.658335] RIP: 0033:0x7f156f337130
During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.
To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace [Mon, 14 Oct 2019 18:03:58 +0000 (13:03 -0500)]
scsi: hpsa: add missing hunks in reset-patch
Correct returning from reset before outstanding commands are completed
for the device.
Link: https://lore.kernel.org/r/157107623870.17997.11208813089704833029.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bodo Stroesser [Mon, 14 Oct 2019 18:29:04 +0000 (20:29 +0200)]
scsi: target: core: Do not overwrite CDB byte 1
passthrough_parse_cdb() - used by TCMU and PSCSI - attepts to reset the LUN
field of SCSI-2 CDBs (bits 5,6,7 of byte 1). The current code is wrong as
for newer commands not having the LUN field it overwrites relevant command
bits (e.g. for SECURITY PROTOCOL IN / OUT). We think this code was
unnecessary from the beginning or at least it is no longer useful. So we
remove it entirely.
Link: https://lore.kernel.org/r/12498eab-76fd-eaad-1316-c2827badb76a@ts.fujitsu.com
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ajay Joshi [Thu, 17 Oct 2019 21:19:43 +0000 (14:19 -0700)]
null_blk: return fixed zoned reads > write pointer
A zoned block device maintains a write pointer within a zone, and reads
beyond the write pointer are undefined. Fill data buffer returned above
the write pointer with 0xFF.
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Logan Gunthorpe [Thu, 10 Oct 2019 23:36:26 +0000 (17:36 -0600)]
block: account statistics for passthrough requests
Presently, passthrough requests are not accounted for because
blk_do_io_stat() expressly rejects them. Based on some digging
in the history, this doesn't seem like a concious decision but
one that evolved from the change from blk_fs_request() to
blk_rq_is_passthrough().
To support this, call blk_account_io_start() in blk_execute_rq_nowait()
and remove the passthrough check in blk_do_io_stat().
Link: https://lore.kernel.org/linux-block/20191010100526.GA27209@lst.de/
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 9 Oct 2019 17:35:36 +0000 (10:35 -0700)]
scsi: ch: Make it possible to open a ch device multiple times again
Clearing ch->device in ch_release() is wrong because that pointer must
remain valid until ch_remove() is called. This patch fixes the following
crash the second time a ch device is opened:
BUG: kernel NULL pointer dereference, address:
0000000000000790
RIP: 0010:scsi_device_get+0x5/0x60
Call Trace:
ch_open+0x4c/0xa0 [ch]
chrdev_open+0xa2/0x1c0
do_dentry_open+0x13a/0x380
path_openat+0x591/0x1470
do_filp_open+0x91/0x100
do_sys_open+0x184/0x220
do_syscall_64+0x5f/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 085e56766f74 ("scsi: ch: add refcounting")
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191009173536.247889-1-bvanassche@acm.org
Reported-by: Rob Turk <robtu@rtist.nl>
Suggested-by: Rob Turk <robtu@rtist.nl>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Thomas Bogendoerfer [Wed, 9 Oct 2019 15:11:28 +0000 (17:11 +0200)]
scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE
When building a kernel with SCSI_SNI_53C710 enabled, Kconfig warns:
WARNING: unmet direct dependencies detected for 53C700_LE_ON_BE
Depends on [n]: SCSI_LOWLEVEL [=y] && SCSI [=y] && SCSI_LASI700 [=n]
Selected by [y]:
- SCSI_SNI_53C710 [=y] && SCSI_LOWLEVEL [=y] && SNI_RM [=y] && SCSI [=y]
Add the missing depends SCSI_SNI_53C710 to 53C700_LE_ON_BE to fix it.
Link: https://lore.kernel.org/r/20191009151128.32411-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Thomas Bogendoerfer [Wed, 9 Oct 2019 15:11:18 +0000 (17:11 +0200)]
scsi: sni_53c710: fix compilation error
Drop out memory dev_printk() with wrong device pointer argument.
[mkp: typo]
Link: https://lore.kernel.org/r/20191009151118.32350-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke [Mon, 7 Oct 2019 13:57:01 +0000 (15:57 +0200)]
scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions
Some arrays are not capable of returning RTPG data during state
transitioning, but rather return an 'LUN not accessible, asymmetric access
state transition' sense code. In these cases we can set the state to
'transitioning' directly and don't need to evaluate the RTPG data (which we
won't have anyway).
Link: https://lore.kernel.org/r/20191007135701.32389-1-hare@suse.de
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allen Pais [Wed, 18 Sep 2019 16:36:58 +0000 (22:06 +0530)]
scsi: qla2xxx: fix a potential NULL pointer dereference
alloc_workqueue is not checked for errors and as a result a potential
NULL dereference could occur.
Link: https://lore.kernel.org/r/1568824618-4366-1-git-send-email-allen.pais@oracle.com
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pavel Begunkov [Mon, 7 Oct 2019 21:16:51 +0000 (00:16 +0300)]
blk-stat: Optimise blk_stat_add()
blk_stat_add() calls {get,put}_cpu_ptr() in a loop, which entails
overhead of disabling/enabling preemption. The loop is under RCU
(i.e.short) anyway, so do get_cpu() in advance.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:47 +0000 (16:00 -0700)]
null_blk: Enable modifying 'submit_queues' after an instance has been configured
This patch makes it possible to test blk_mq_update_nr_hw_queues() from
inside a VM.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:46 +0000 (16:00 -0700)]
null_blk: Improve nullb_device_##NAME##_store() readability
Introduce a local variable to make the code easier to read. This patch
does not change any functionality but makes the next patch in this
series easier to read.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 30 Sep 2019 18:55:34 +0000 (21:55 +0300)]
blk-mq: Embed counters into struct mq_inflight
Store inflight counters immediately in struct mq_inflight.
That's type-safer and removes extra indirection.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 30 Sep 2019 18:55:33 +0000 (21:55 +0300)]
blk-mq: Reuse callback in blk_mq_in_flight*()
Reuse a more generic callback in both blk_mq_in_flight() and
blk_mq_in_flight_rw().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Mon, 30 Sep 2019 08:25:49 +0000 (11:25 +0300)]
blk-mq: Inline status checkers
blk_mq_request_completed() and blk_mq_request_started() are
short, inline it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:45 +0000 (16:00 -0700)]
block: Document all members of blk_mq_tag_set and bkl_mq_queue_map
The meaning of several member variables of these two data structures is
nontrivial. Hence document all member variables using the kernel-doc
syntax.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:44 +0000 (16:00 -0700)]
block: Reduce sysfs_lock locking inside blk_cleanup_queue()
Since blk_cleanup_queue() is called after blk_unregister_queue() and
since that last function removes all sysfs attributes, serializing
any code in blk_cleanup_queue() against sysfs callback methods nor against
I/O scheduler changes is necessary. Hence remove the syfs_lock locking
calls from the start of blk_cleanup_queue().
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:43 +0000 (16:00 -0700)]
block: Remove "dying" checks from sysfs callbacks
Block drivers must call del_gendisk() before blk_cleanup_queue().
del_gendisk() calls kobject_del() and kobject_del() waits until any
ongoing sysfs callback functions have finished. In other words, the
sysfs callback functions won't be called for a queue in the dying
state. Hence remove the "dying" checks from the sysfs callback
functions.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Mon, 30 Sep 2019 23:00:42 +0000 (16:00 -0700)]
block: Remove request_queue.nr_queues
Commit
897bb0c7f1ea ("blk-mq: Use proper cpumask iterator"; v4.6)
removed the last use of request_queue.nr_queues from outside
blk_mq_init_allocate_queue(). Remove this member variable to make
struct request_queue smaller. This patch does not change any
functionality.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>