Bart Van Assche [Thu, 17 Aug 2017 20:12:56 +0000 (13:12 -0700)]
skd: Remove a set-but-not-used variable from struct skd_device
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:55 +0000 (13:12 -0700)]
skd: Remove set-but-not-used local variables
These variables have been detected by building with W=1. Declare
'acc' as __maybe_unused because most access_ok() implementations
ignore their first argument. This patch does not change any
functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:54 +0000 (13:12 -0700)]
skd: Fix a function name in a comment
There is no function skd_completion_posted_isr() in the skd driver
but there is a function called skd_isr_completion_posted(). Fix
the function name in the comment.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:53 +0000 (13:12 -0700)]
skd: Fix spelling in a source code comment
Change "ptimal" into "optimal" and remove the misleading reference
to sysfs.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:52 +0000 (13:12 -0700)]
skd: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:51 +0000 (13:12 -0700)]
skd: Remove unnecessary blank lines
This patch does not change any functionality but makes the skd
driver source code more uniform with that of other kernel drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:50 +0000 (13:12 -0700)]
skd: Remove ESXi code
Since the code guarded by #ifdef SKD_VMK_POLL_HANDLER / #endif
is never built on Linux systems, remove it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:49 +0000 (13:12 -0700)]
skd: Remove unneeded #include directives
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:48 +0000 (13:12 -0700)]
skd: Update maintainer information
E-mails sent to support@stec-inc.com bounce. Hence remove that
e-mail address from the driver. Add an entry to the MAINTAINERS
file instead.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:47 +0000 (13:12 -0700)]
skd: Switch to GPLv2
This change does not affect any skd driver version derived from a
dual licensed code base but makes all code derived from future
upstream skd driver versions GPLv2 only.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:46 +0000 (13:12 -0700)]
skd: Submit requests to firmware before triggering the doorbell
Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:
(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:45 +0000 (13:12 -0700)]
skd: Avoid that module unloading triggers a use-after-free
Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:
WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
dump_stack+0x63/0x84
__warn+0xcb/0xf0
warn_slowpath_fmt+0x5a/0x80
refcount_sub_and_test+0x70/0x80
refcount_dec_and_test+0x11/0x20
kobject_put+0x1f/0x50
blk_put_queue+0x15/0x20
disk_release+0xae/0xf0
device_release+0x32/0x90
kobject_release+0x67/0x170
kobject_put+0x2b/0x50
put_disk+0x17/0x20
skd_destruct+0x5c/0x890 [skd]
skd_pci_probe+0x124d/0x13a0 [skd]
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x19e/0x470
worker_thread+0x1dc/0x4a0
kthread+0x125/0x140
ret_from_fork+0x25/0x30
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 20:12:44 +0000 (13:12 -0700)]
block: Relax a check in blk_start_queue()
Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:
WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
skd_unquiesce_dev+0x12a/0x1d0 [skd]
skd_complete_internal+0x1e7/0x5a0 [skd]
skd_complete_other+0xc2/0xd0 [skd]
skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
skd_isr+0x14f/0x180 [skd]
irq_forced_thread_fn+0x2a/0x70
irq_thread+0x144/0x1a0
kthread+0x125/0x140
ret_from_fork+0x2a/0x40
Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:11 +0000 (16:23 -0700)]
xen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:10 +0000 (16:23 -0700)]
xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:09 +0000 (16:23 -0700)]
xen-blkback: Fix indentation
Avoid that smatch reports the following warning when building with
C=2 CHECK="smatch -p=kernel":
drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:08 +0000 (16:23 -0700)]
virtio_blk: Use blk_rq_is_scsi()
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:07 +0000 (16:23 -0700)]
ide-floppy: Use blk_rq_is_scsi()
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:06 +0000 (16:23 -0700)]
genhd: Annotate all part and part_tbl pointer dereferences
Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with
rcu_dereference_protected(). This patch does not change the behavior
of the modified code but ensures that sparse does not complain about
disk->part_tbl manipulations nor about part_tbl->part accesses.
Additionally, improve documentation of the locking requirements of
the modified functions.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:04 +0000 (16:23 -0700)]
blk-mq-debugfs: Declare a local symbol static
This was detected by sparse.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:03 +0000 (16:23 -0700)]
blk-mq: Make blk_mq_reinit_tagset() calls easier to read
Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:01 +0000 (16:23 -0700)]
block: Unexport blk_queue_end_tag()
This function is only used inside the block layer core. Hence
unexport it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Thu, 17 Aug 2017 23:23:00 +0000 (16:23 -0700)]
block: Fix two comments that refer to .queue_rq() return values
Since patch "blk-mq: switch .queue_rq return value to blk_status_t"
.queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_*
value. Hence refer to the former in comments about .queue_rq()
return values.
Fixes: commit 39a70c76b89b ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Mon, 14 Aug 2017 18:56:16 +0000 (18:56 +0000)]
nbd: change the default nbd partitions
There's no reason to have partitions disabled for nbd by default, it costs us
nothing to have it enabled and is just confusing/obnoxious to users who try to
use partitions with nbd.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Mon, 14 Aug 2017 18:25:33 +0000 (18:25 +0000)]
nbd: allow device creation at a specific index
If users really want to use a particular index for their nbd device and it
doesn't already exist there's no reason we can't just create it for them. Do
this instead of erroring out.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Anton Volkov [Mon, 7 Aug 2017 12:37:50 +0000 (15:37 +0300)]
loop: fix to a race condition due to the early registration of device
The early device registration made possible a race leading to allocations
of disks with wrong minors.
This patch moves the device registration further down the loop_init
function to make the race infeasible.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ritesh Harjani [Wed, 9 Aug 2017 12:58:32 +0000 (18:28 +0530)]
cfq: Give a chance for arming slice idle timer in case of group_idle
In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.
For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan 1 19:41:49 1970
write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan 1 19:41:49 1970
write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)
In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.
Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.
With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan 1 00:06:08 1970
write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan 1 00:06:08 1970
write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Fri, 4 Aug 2017 05:35:11 +0000 (07:35 +0200)]
block, bfq: boost throughput with flash-based non-queueing devices
When a queue associated with a process remains empty, there are cases
where throughput gets boosted if the device is idled to await the
arrival of a new I/O request for that queue. Currently, BFQ assumes
that one of these cases is when the device has no internal queueing
(regardless of the properties of the I/O being served). Unfortunately,
this condition has proved to be too general. So, this commit refines it
as "the device has no internal queueing and is rotational".
This refinement provides a significant throughput boost with random
I/O, on flash-based storage without internal queueing. For example, on
a HiKey board, throughput increases by up to 125%, growing, e.g., from
6.9MB/s to 15.6MB/s with two or three random readers in parallel.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Paolo Valente [Fri, 4 Aug 2017 05:35:10 +0000 (07:35 +0200)]
block,bfq: refactor device-idling logic
The logic that decides whether to idle the device is scattered across
three functions. Almost all of the logic is in the function
bfq_bfqq_may_idle, but (1) part of the decision is made in
bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
switch off idling regardless of the output of bfq_bfqq_may_idle. In
addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
their decisions as a function of parameters that are used, for similar
purposes, also in bfq_bfqq_may_idle. This commit addresses these
issues by moving all the logic into bfq_bfqq_may_idle.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 10 Aug 2017 14:25:38 +0000 (08:25 -0600)]
block: remove unused syncfull/asyncfull queue flags
We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:53:33 +0000 (17:53 -0600)]
blk-mq: enable checking two part inflight counts at the same time
Modify blk_mq_in_flight() to count both a partition and root at
the same time. Then we only have to call it once, instead of
potentially looping the tags twice.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:51:45 +0000 (17:51 -0600)]
blk-mq: provide internal in-flight variant
We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 8 Aug 2017 23:49:47 +0000 (17:49 -0600)]
block: make part_in_flight() take an array of two ints
Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root inflight count, if the partition passed in is not the
root.
This is in preparation for being able to calculate both in one
go.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 1 Jul 2017 03:55:08 +0000 (21:55 -0600)]
block: pass in queue to inflight accounting
No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 4 Aug 2017 19:37:03 +0000 (13:37 -0600)]
blk-mq-tag: check for NULL rq when iterating tags
Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.
This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 9 Aug 2017 15:48:09 +0000 (17:48 +0200)]
dm-crypt: don't mess with BIP_BLOCK_INTEGRITY
This flag is never set right after calling bio_integrity_alloc,
so don't clear it and confuse the reader.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 9 Aug 2017 15:48:08 +0000 (17:48 +0200)]
bio-integrity: move the bio integrity profile check earlier in bio_integrity_prep
This makes the code more obvious, and moves the most likely branch first
in the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
weiping zhang [Wed, 2 Aug 2017 16:27:37 +0000 (00:27 +0800)]
null_blk: make sure submit_queues > 0
set submit_queues to 1 by default, and make sure it's value > 0.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
weiping zhang [Wed, 2 Aug 2017 16:26:39 +0000 (00:26 +0800)]
null_blk: simplify logic for use_per_node_hctx
make sure submit_queues equal nr_online_nodes.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Wed, 2 Aug 2017 08:25:21 +0000 (10:25 +0200)]
block: Add comment to submit_bio_wait()
submit_bio_wait() does not consume bio reference. Add comment about
that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 1 Aug 2017 15:28:24 +0000 (09:28 -0600)]
blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled
We recently had a bug in the IPR SCSI driver, where it would end up
making the SCSI mid layer run the mq hardware queue with interrupts
disabled. This isn't legal, since the software queue locking relies
on never being grabbed from interrupt context. Additionally, drivers
that set BLK_MQ_F_BLOCKING may schedule from this context.
Add a WARN_ON_ONCE() to catch bad users up front.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 27 Jul 2017 14:03:57 +0000 (08:03 -0600)]
blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags
We know we're in process context, so don't bother using the
IRQ safe versions of the spin lock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Arnd Bergmann [Fri, 14 Jul 2017 12:07:11 +0000 (14:07 +0200)]
block: DAC960: shut up format-overflow warning
gcc-7 points out that a large controller number would overflow the
string length for the procfs name and the firmware version string:
drivers/block/DAC960.c: In function 'DAC960_Probe':
drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration':
drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=]
drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255]
drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12
Both of these seem appropriately sized, and using snprintf()
instead of sprintf() improves this by ensuring that even
incorrect data won't cause undefined behavior here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:56 +0000 (11:49 -0700)]
block: use standard blktrace API to output cgroup info for debug notes
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled. cgroup
info isn't output as a string by default too, we only do this with
'blk_cgname' option enabled. Also cgroup info is output in different
position of the note string. I think these behavior changes aren't a big
issue (actually we make trace data shorter which is good), since the
blktrace note is solely for debugging.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:55 +0000 (11:49 -0700)]
blktrace: add an option to allow displaying cgroup path
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2 293.015252: 8,0 /test/level D R 24 + 8 [dd]
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:54 +0000 (11:49 -0700)]
block: always attach cgroup info into bio
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:53 +0000 (11:49 -0700)]
blktrace: export cgroup info in trace
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from flusher thread but
the BIOs are for different blk cgroups. Request could be requeued and
dispatched from completely different tasks. MD/DM are another examples.
This patch tries to fix the gap. We print out cgroup fhandle info in
blktrace. Userspace can use open_by_handle_at() syscall to find the
cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to
find fhandle for a cgroup and use a BPF program to filter out blktrace
for a specific cgroup.
We add a new 'blk_cgroup' trace option for blk tracer. It's default off.
Application which doesn't know the new option isn't affected. When it's
on, we output fhandle info right after blk_io_trace with an extra bit
set in event action. So from application point of view, blktrace with
the option will output new actions.
I didn't change blk trace event yet, since I'm not sure if changing the
trace event output is an ABI issue. If not, I'll do it later.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:52 +0000 (11:49 -0700)]
cgroup: export fhandle info for a cgroup
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle,
we only need export the inode number and generation number just like
what generic_fh_to_dentry does. And we can avoid the overhead of getting
an inode too, since kernfs_node_id (ino and generation) has all the info
required.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:51 +0000 (11:49 -0700)]
kernfs: add exportfs operations
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for the
cgroup.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:50 +0000 (11:49 -0700)]
kernfs: introduce kernfs_node_id
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:49 +0000 (11:49 -0700)]
kernfs: don't set dentry->d_fsdata
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode->i_private
already points to kernfs_node, and we can get inode from a dentry. So
this patch just delete the d_fsdata usage.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:48 +0000 (11:49 -0700)]
kernfs: add an API to get kernfs node from inode number
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
This API will be used in blktrace too later, so it should be as fast as
possible. To make the API lock free, kernfs node is freed in RCU
context. And we depend on kernfs_node count/ino number to filter out
stale kernfs nodes.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:47 +0000 (11:49 -0700)]
kernfs: implement i_generation
Set i_generation for kernfs inode. This is required to implement
exportfs operations. The generation is 32-bit, so it's possible the
generation wraps up and we find stale files. To reduce the posssibility,
we don't reuse inode numer immediately. When the inode number allocation
wraps, we increase generation number. In this way generation/inode
number consist of a 64-bit number which is unlikely duplicated. This
does make the idr tree more sparse and waste some memory. Since idr
manages 32-bit keys, idr uses a 6-level radix tree, each level covers 6
bits of the key. In a 100k inode kernfs, the worst case will have around
300k radix tree node. Each node is 576bytes, so the tree will use about
~150M memory. Sounds not too bad, if this really is a problem, we should
find better data structure.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Shaohua Li [Wed, 12 Jul 2017 18:49:46 +0000 (11:49 -0700)]
kernfs: use idr instead of ida to manage inode number
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sat, 29 Jul 2017 00:21:41 +0000 (17:21 -0700)]
Merge tag 'devicetree-fixes-for-4.13' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Two small DT fixes:
- Fix error handling in of_irq_to_resource_table() due to
of_irq_to_resource() error return changes.
- Fix dtx_diff script due to dts include path changes"
* tag 'devicetree-fixes-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: irq: fix of_irq_to_resource() error check
scripts/dtc: dtx_diff - update include dts paths to match build
Linus Torvalds [Fri, 28 Jul 2017 21:44:56 +0000 (14:44 -0700)]
Merge tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"More NFS client bugfixes for 4.13.
Most of these fix locking bugs that Ben and Neil noticed, but I also
have a patch to fix one more access bug that was reported after last
week.
Stable fixes:
- Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
- Invalidate file size when taking a lock to prevent corruption
Other fixes:
- Don't excessively generate tiny writes with fallocate
- Use the raw NFS access mask in nfs4_opendata_access()"
* tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
NFS: Optimize fallocate by refreshing mapping when needed.
NFS: invalidate file size when taking a lock.
NFS: Use raw NFS access mask in nfs4_opendata_access()
Linus Torvalds [Fri, 28 Jul 2017 21:29:48 +0000 (14:29 -0700)]
Merge tag 'xfs-4.13-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix firstfsb variables that we left uninitialized, which could lead
to locking problems.
- check for NULL metadata buffer pointers before using them.
- don't allow btree cursor manipulation if the btree block is corrupt.
Better to just shut down.
- fix infinite loop problems in quotacheck.
- fix buffer overrun when validating directory blocks.
- fix deadlock problem in bunmapi.
* tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix multi-AG deadlock in xfs_bunmapi
xfs: check that dir block entries don't off the end of the buffer
xfs: fix quotacheck dquot id overflow infinite loop
xfs: check _alloc_read_agf buffer pointer before using
xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write
xfs: check _btree_check_block value
Linus Torvalds [Fri, 28 Jul 2017 20:36:56 +0000 (13:36 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
Linus Torvalds [Fri, 28 Jul 2017 20:35:12 +0000 (13:35 -0700)]
Merge tag 'for-linus-4.13b-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Three minor cleanups for xen related drivers"
* tag 'for-linus-4.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: dont fiddle with event channel masking in suspend/resume
xen: selfballoon: remove unnecessary static in frontswap_selfshrink()
xen: Drop un-informative message during boot
Linus Torvalds [Fri, 28 Jul 2017 20:29:36 +0000 (13:29 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'd been collecting these whilst we debugged a CPU hotplug failure,
but we ended up diagnosing that one to tglx, who has taken a fix via
the -tip tree separately.
We're seeing some NFS issues that we haven't gotten to the bottom of
yet, and we've uncovered some issues with our backtracing too so there
might be another fixes pull before we're done.
Summary:
- Ensure we have a guard page after the kernel image in vmalloc
- Fix incorrect prefetch stride in copy_page
- Ensure irqs are disabled in die()
- Fix for event group validation in QCOM L2 PMU driver
- Fix requesting of PMU IRQs on AMD Seattle
- Minor cleanups and fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mmu: Place guard page after mapping of kernel image
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
perf: qcom_l2: fix column exclusion check
arm64/lib: copy_page: use consistent prefetch stride
arm64/numa: Drop duplicate message
perf: Convert to using %pOF instead of full_name
arm64: Convert to using %pOF instead of full_name
arm64: traps: disable irq in die()
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
arm64: uaccess: Remove redundant __force from addr cast in __range_ok
Linus Torvalds [Fri, 28 Jul 2017 20:25:15 +0000 (13:25 -0700)]
Merge tag 'powerpc-4.13-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The highlight is Ben's patch to work around a host killing bug when
running KVM guests with the Radix MMU on Power9. See the long change
log of that commit for more detail.
And then three fairly minor fixes:
- fix of_node_put() underflow during reconfig remove, using old DLPAR
tools.
- fix recently introduced ld version check with 64-bit LE-only
toolchain.
- free the subpage_prot_table correctly, avoiding a memory leak.
Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier"
* tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Free the subpage_prot_table correctly
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
powerpc/mm/radix: Workaround prefetch issue with KVM
Benjamin Coddington [Fri, 28 Jul 2017 16:33:54 +0000 (12:33 -0400)]
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.
Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.
Fixes: a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Fri, 28 Jul 2017 19:31:49 +0000 (12:31 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- remove broken dt bindings in inside-secure
- fix authencesn crash when used with digest_null
- fix cavium/nitrox firmware path
- fix SHA3 failure in brcm
- fix Kconfig dependency for brcm
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: authencesn - Fix digest_null crash
crypto: brcm - remove BCM_PDC_MBOX dependency in Kconfig
Documentation/bindings: crypto: remove the dma-mask property
crypto: inside-secure - do not parse the dma mask from dt
crypto: cavium/nitrox - Change in firmware path.
crypto: brcm - Fix SHA3-512 algorithm failure
Linus Torvalds [Fri, 28 Jul 2017 19:26:59 +0000 (12:26 -0700)]
Merge branch 'for-4.13-part3' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fixes addressing problems reported by users, and there's one more
regression fix"
* 'for-4.13-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: round down size diff when shrinking/growing device
Btrfs: fix early ENOSPC due to delalloc
btrfs: fix lockup in find_free_extent with read-only block groups
Btrfs: fix dir item validation when replaying xattr deletes
Linus Torvalds [Fri, 28 Jul 2017 19:24:21 +0000 (12:24 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This fixes several bugs, three of them are marked for stable:
- an initialization issue fixed by Ming
- a bio clone race issue fixed by me
- an async tx flush issue fixed by Ofer
- other cleanups"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
MD: fix warnning for UP case
md/raid5: add thread_group worker async_tx_issue_pending_all
md: simplify code with bio_io_error
md/raid1: fix writebehind bio clone
md: raid1-10: move raid1/raid10 common code into raid1-10.c
md: raid1/raid10: initialize bvec table via bio_add_page()
md: remove 'idx' from 'struct resync_pages'
Linus Torvalds [Fri, 28 Jul 2017 19:17:17 +0000 (12:17 -0700)]
Merge tag 'for-4.13/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout. Another that
makes use of the block layer's on-stack plugging when writing the
journal.
- a dm-bufio fix for the blk_status_t conversion that went in during
the merge window.
- a few DM raid fixes that address correctness when suspending the
device and a validation fix for validation that occurs during device
activation.
- a couple DM zoned target fixes. Important one being the fix to not
use GFP_KERNEL in the IO path due to concerns about deadlock in
low-memory conditions (e.g. swap over a DM zoned device, etc).
- a DM DAX device fix to make sure dm_dax_flush() is called if the
underlying DAX device is operating as a write cache.
* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm, dax: Make sure dm_dax_flush() is called if device supports it
dm verity fec: fix GFP flags used with mempool_alloc()
dm zoned: use GFP_NOIO in I/O path
dm zoned: remove test for impossible REQ_OP_FLUSH conditions
dm raid: bump target version
dm raid: avoid mddev->suspended access
dm raid: fix activation check in validate_raid_redundancy()
dm raid: remove WARN_ON() in raid10_md_layout_to_format()
dm bufio: fix error code in dm_bufio_write_dirty_buffers()
dm integrity: test for corrupted disk format during table load
dm integrity: WARN_ON if variables representing journal usage get out of sync
dm integrity: use plugging when writing the journal
dm integrity: fix inefficient allocation of journal space
Linus Torvalds [Fri, 28 Jul 2017 19:13:34 +0000 (12:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A small collection of fixes that should go into this series. This
contains:
- NVMe pull request from Christoph, with various fixes for nvme
proper and nvme-fc.
- disable runtime PM for blk-mq for now.
With scsi now defaulting to using blk-mq, this reared its head as
an issue. Longer term we'll fix up runtime PM for blk-mq, for now
just disable it to prevent a hang on laptop resume for some folks.
- blk-mq CPU <-> hw queue map fix from Christoph.
- xen/blkfront pull request from Konrad, with two small fixes for the
blkfront driver.
- a few fixups for nbd from Joseph.
- a stable fix for pblk from Javier"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: pblk: advance bio according to lba index
nvme: validate admin queue before unquiesce
nbd: clear disconnected on reconnect
nvme-pci: fix HMB size calculation
nvme-fc: revise TRADDR parsing
nvme-fc: address target disconnect race conditions in fcp io submit
nvme: fabrics commands should use the fctype field for data direction
nvme: also provide a UUID in the WWID sysfs attribute
xen/blkfront: always allocate grants first from per-queue persistent grants
xen-blkfront: fix mq start/stop race
blk-mq: map queues to all present CPUs
block: disable runtime-pm for blk-mq
xen-blkfront: Fix handling of non-supported operations
nbd: only set sndtimeo if we have a timeout set
nbd: take tx_lock before disconnecting
nbd: allow multiple disconnects to be sent
Linus Torvalds [Fri, 28 Jul 2017 19:04:36 +0000 (12:04 -0700)]
Merge tag 'mmc-v4.13-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13-rc1.
I have also included a couple of cleanup patches in this pull request
for OMAP2+, related to the omap_hsmmc driver. The reason is because of
the changes are also depending on OMAP SoC specific code, so this
simplifies how to deal with this.
Summary:
MMC host:
- sunxi: Correct time phase settings
- omap_hsmmc: Clean up some dead code
- dw_mmc: Fix message printed for deprecated num-slots DT binding
- dw_mmc: Fix DT documentation"
* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Documentation: dw-mshc: deprecate num-slots
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
mmc: host: omap_hsmmc: remove unused platform callbacks
ARM: OMAP2+: hsmmc.c: Remove dead code
mmc: sunxi: Keep default timing phase settings for new timing mode
Javier González [Fri, 28 Jul 2017 13:13:16 +0000 (15:13 +0200)]
lightnvm: pblk: advance bio according to lba index
When a lba either hits the cache or corresponds to an empty entry in the
L2P table, we need to advance the bio according to the position in which
the lba is located. Otherwise, we will copy data in the wrong page, thus
causing data corruption for the application.
In case of a cache hit, we assumed that bio->bi_iter.bi_idx would
contain the correct index, but this is no necessarily true. Instead, use
the local bio advance counter and iterator. This guarantees that lbas
hitting the cache are copied into the right bv_page.
In case of an empty L2P entry, we omitted to advance the bio. In the
cases when the same I/O also contains a cache hit, data corresponding
to this lba will be copied to the wrong bv_page. Fix this by advancing
the bio as we do in the case of a cache hit.
Fixes: a4bd217b4326 lightnvm: physical block device (pblk) target
Signed-off-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Will Deacon [Mon, 24 Jul 2017 10:46:09 +0000 (11:46 +0100)]
arm64: mmu: Place guard page after mapping of kernel image
The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.
This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Fri, 28 Jul 2017 02:54:53 +0000 (19:54 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"These iare the fixes for 4.13-rc3: vmwgfx, exynos, i915, amdgpu,
nouveau, host1x and displayport fixes.
As expected people woke up this week, i915 didn't do an -rc2 pull so
got a bumper -rc3 pull, and Ben resurfaced on nouveau and fixed a
bunch of major crashers seen on Fedora 26, and there are a few vmwgfx
fixes as well.
Otherwise exynos had some regression fixes/cleanups, and amdgpu has an
rcu locking regression fix and a couple of minor fixes"
* tag 'drm-fixes-for-v4.13-rc3' of git://people.freedesktop.org/~airlied/linux: (44 commits)
drm/i915: Fix bad comparison in skl_compute_plane_wm.
drm/i915: Force CPU synchronisation even if userspace requests ASYNC
drm/i915: Only skip updating execobject.offset after error
drm/i915: Only mark the execobject as pinned on success
drm/i915: Remove assertion from raw __i915_vma_unpin()
drm/i915/cnl: Fix loadgen select programming on ddi vswing sequence
drm/i915: Fix scaler init during CRTC HW state readout
drm/i915/selftests: Fix an error handling path in 'mock_gem_device()'
drm/i915: Unbreak gpu reset vs. modeset locking
gpu: host1x: Free the IOMMU domain when there is no device to attach
drm/i915: Fix cursor updates on some platforms
drm/i915: Fix user ptr check size in eb_relocate_vma()
drm: exynos: mark pm functions as __maybe_unused
drm/exynos: select CEC_CORE if CEC_NOTIFIER
drm/exynos/hdmi: fix disable sequence
drm/exynos: mic: add a bridge at probe
drm/exynos/dsi: Remove error handling for bridge_node DT parsing
drm/exynos: dsi: do not try to find bridge
drm: exynos: hdmi: make of_device_ids const.
drm: exynos: constify mixer_match_types and *_mxr_drv_data.
...
Dave Airlie [Fri, 28 Jul 2017 02:32:59 +0000 (12:32 +1000)]
Merge tag 'exynos-drm-fixes-for-v4.13-rc3' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
Summary:
- fix probing fail issue of dsi driver without bridge device.
- fix disable sequence of hdmi driver.
- trivial cleanups.
* tag 'exynos-drm-fixes-for-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
drm: exynos: mark pm functions as __maybe_unused
drm/exynos: select CEC_CORE if CEC_NOTIFIER
drm/exynos/hdmi: fix disable sequence
drm/exynos: mic: add a bridge at probe
drm/exynos/dsi: Remove error handling for bridge_node DT parsing
drm/exynos: dsi: do not try to find bridge
drm: exynos: hdmi: make of_device_ids const.
drm: exynos: constify mixer_match_types and *_mxr_drv_data.
exynos_drm: Clean up duplicated assignment in exynos_drm_driver
Dave Airlie [Fri, 28 Jul 2017 00:19:08 +0000 (10:19 +1000)]
Merge tag 'drm-intel-fixes-2017-07-27' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
i915 fixes for -rc3
Bit more than usual since we missed -rc2. 4x cc: stable, 2 gvt
patches, but all fairly minor stuff. Last minute rebase was to add a
few missing cc: stable, I did prep the pull this morning already and
made sure CI approves.
* tag 'drm-intel-fixes-2017-07-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Fix bad comparison in skl_compute_plane_wm.
drm/i915: Force CPU synchronisation even if userspace requests ASYNC
drm/i915: Only skip updating execobject.offset after error
drm/i915: Only mark the execobject as pinned on success
drm/i915: Remove assertion from raw __i915_vma_unpin()
drm/i915/cnl: Fix loadgen select programming on ddi vswing sequence
drm/i915: Fix scaler init during CRTC HW state readout
drm/i915/selftests: Fix an error handling path in 'mock_gem_device()'
drm/i915: Unbreak gpu reset vs. modeset locking
drm/i915: Fix cursor updates on some platforms
drm/i915: Fix user ptr check size in eb_relocate_vma()
drm/i915/gvt: Extend KBL platform support in GVT-g
drm/i915/gvt: Fix the vblank timer close issue after shutdown VMs in reverse
Dave Airlie [Fri, 28 Jul 2017 00:14:08 +0000 (10:14 +1000)]
Merge tag 'drm-misc-fixes-2017-07-27' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Core Changes:
- dp: A few fixes in drm_dp_downstream_debug() (Chris)
- rockchip: sanitize the Kconfig dependencies (fallout from EXTCON) (Arnd)
- host1x: Free the iommu domain when attach_device fails (Paul)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Kocialkowski <contact@paulk.fr>
* tag 'drm-misc-fixes-2017-07-27' of git://anongit.freedesktop.org/git/drm-misc:
gpu: host1x: Free the IOMMU domain when there is no device to attach
drm/rockchip: fix Kconfig dependencies
drm/dp: Don't trust drm_dp_downstream_id()
drm/dp: Fix read pointer for drm_dp_downsteam_debug()
Linus Torvalds [Thu, 27 Jul 2017 22:30:08 +0000 (15:30 -0700)]
Merge tag 'acpi-4.13-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These are two fixups for the suspend-to-idle handling in the ACPI
subsystem after recent changes in that area and two simple fixes of
the ACPI NUMA code.
Specifics:
- Add an ACPI module parameter to allow users to override the new
default behavior on some systems where the EC GPE is not disabled
during suspend-to-idle in case the EC on their systems generates
excessive wakeup events and they want to sacrifice some
functionality (like power button wakeups) for extra battery life
while suspended (Rafael Wysocki).
- Fix flushing of the outstanding EC work in the ACPI core
suspend-to-idle code (Rafael Wysocki).
- Add a missing include and fix a messed-up comment in the ACPI NUMA
code (Ross Zwisler)"
* tag 'acpi-4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: NUMA: Fix typo in the full name of SRAT
ACPI: NUMA: add missing include in acpi_numa.h
ACPI / PM / EC: Flush all EC work in acpi_freeze_sync()
ACPI / EC: Add parameter to force disable the GPE on suspend
Rafael J. Wysocki [Thu, 27 Jul 2017 21:14:08 +0000 (23:14 +0200)]
Merge branches 'acpi-pm' and 'acpi-numa'
* acpi-pm:
ACPI / PM / EC: Flush all EC work in acpi_freeze_sync()
ACPI / EC: Add parameter to force disable the GPE on suspend
* acpi-numa:
ACPI: NUMA: Fix typo in the full name of SRAT
ACPI: NUMA: add missing include in acpi_numa.h
Daniel Vetter [Thu, 27 Jul 2017 20:07:52 +0000 (22:07 +0200)]
Merge tag 'gvt-fixes-2017-07-26' of https://github.com/01org/gvt-linux into drm-intel-fixes
gvt-fixes-2017-07-26
- Turn on KBL support for more SKUs (Jianjun)
- Fix vblank timer close bug (Fred)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726075621.hrauvik62gi2jecj@zhen-hp.sh.intel.com
Maarten Lankhorst [Mon, 17 Jul 2017 11:13:55 +0000 (13:13 +0200)]
drm/i915: Fix bad comparison in skl_compute_plane_wm.
ddb_allocation && ddb_allocation / blocks_per_line >= 1 is the same
as ddb_allocation >= blocks_per_line, so use the latter to simplify
this.
This fixes the following compiler warning:
drivers/gpu/drm/i915/intel_pm.c:4467]: (warning) Comparison of a
boolean expression with an integer other than 0 or 1.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: d555cb5827d6 ("drm/i915/skl+: use linetime latency if ddb size is not available")
Cc: "Mahesh Kumar" <mahesh1.kumar@intel.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.13-rc1+
Reviewed-by: Mahesh Kumar <mahesh1.kumar@intel.com>
(cherry picked from commit
54d20ed1fff23c7d2633f01fc788111bf9c51c5d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170717111355.4523-1-maarten.lankhorst@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Chris Wilson [Fri, 21 Jul 2017 14:50:37 +0000 (15:50 +0100)]
drm/i915: Force CPU synchronisation even if userspace requests ASYNC
The goal here was to minimise doing any thing or any check inside the
kernel that was not strictly required. For a userspace that assumes
complete control over the cache domains, the kernel is usually using
outdated information and may trigger clflushes where none were
required.
However, swapping is a situation where userspace has no knowledge of the
domain transfer, and will leave the object in the CPU cache. The kernel
must flush this out to the backing storage prior to use with the GPU. As
we use an asynchronous task tracked by an implicit fence for this, we
also need to cancel the ASYNC flag on the object so that the object will
wait for the clflush to complete before being executed. This also absolves
userspace of the responsibility imposed by commit
77ae9957897d ("drm/i915:
Enable userspace to opt-out of implicit fencing") that its needed to ensure
that the object was out of the CPU cache prior to use on the GPU.
Fixes: 77ae9957897d ("drm/i915: Enable userspace to opt-out of implicit fencing")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101571
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
0f46daa1a273779a0b73d768a788ca3f04238f9c)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Fri, 21 Jul 2017 14:50:36 +0000 (15:50 +0100)]
drm/i915: Only skip updating execobject.offset after error
I was being overly paranoid in not updating the execobject.offset after
performing the fallback copy where we set reloc.presumed_offset to -1.
The thinking was to ensure that a subsequent NORELOC execbuf would be
forced to process the invalid relocations. However this is overkill so
long as we *only* update the execobject.offset following a successful
update of the relocation value witin the batch. If we have to repeat the
execbuf due to a later interruption, then we may skip the relocations on
the second pass (honouring NORELOC) since the execobject.offset match
the actual offsets (even though reloc.presumed_offset is garbage).
Subsequent calls to execbuf with NORELOC should themselves ensure that
the reloc.presumed_offset have been corrected in case of future
migration.
Reporting back the actual execobject.offset, even when
reloc.presumed_offset is garbage, ensures that reuse of those objects
use the latest information to avoid relocations.
Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101635
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
1f727d9e725a408ef58d159c20fb2e51818ff153)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Fri, 21 Jul 2017 14:50:35 +0000 (15:50 +0100)]
drm/i915: Only mark the execobject as pinned on success
If we fail to acquire a fence (for old school fenced GPU access) then we
unwind the vma reservation, including its pin. However, we were making
the execobject as holding the pin before erring out, leading to a double
unpin:
[ 3193.991802] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:287!
[ 3193.998131] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3194.002816] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich mei_me e1000e mei prime_numbers ptp pps_core [last unloaded: i915]
[ 3194.022841] CPU: 0 PID: 8123 Comm: kms_flip Tainted: G U 4.13.0-rc1-CI-CI_DRM_471+ #1
[ 3194.031765] Hardware name: Dell Inc. OptiPlex 755 /0PU052, BIOS A04 11/05/2007
[ 3194.040343] task:
ffff8800785d4c40 task.stack:
ffffc90001768000
[ 3194.046339] RIP: 0010:eb_release_vmas.isra.6+0x119/0x180 [i915]
[ 3194.052234] RSP: 0018:
ffffc9000176ba80 EFLAGS:
00010246
[ 3194.057439] RAX:
00000000000003c0 RBX:
ffff8800710fc2d8 RCX:
ffff8800588e4f48
[ 3194.064546] RDX:
ffffffff1fffffff RSI:
00000000ffffffff RDI:
ffff8800588e00d0
[ 3194.071654] RBP:
ffffc9000176bab0 R08:
0000000000000000 R09:
0000000000000000
[ 3194.078761] R10:
0000000000000040 R11:
0000000000000001 R12:
ffff880060822f00
[ 3194.085867] R13:
0000000000000310 R14:
00000000000003b8 R15:
ffffc9000176bbb0
[ 3194.092975] FS:
00007fd2b94aba40(0000) GS:
ffff88007d200000(0000) knlGS:
0000000000000000
[ 3194.101033] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 3194.106754] CR2:
00007ffbec3ff000 CR3:
0000000074e67000 CR4:
00000000000006f0
[ 3194.113861] Call Trace:
[ 3194.116321] eb_relocate_slow+0x67/0x4e0 [i915]
[ 3194.120861] i915_gem_do_execbuffer+0x429/0x1260 [i915]
[ 3194.126070] ? lock_acquire+0xb5/0x210
[ 3194.129803] ? __might_fault+0x39/0x90
[ 3194.133563] i915_gem_execbuffer2+0x9b/0x1b0 [i915]
[ 3194.138447] ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
[ 3194.143478] drm_ioctl_kernel+0x64/0xb0
[ 3194.147298] drm_ioctl+0x2cd/0x390
[ 3194.150710] ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
[ 3194.155741] ? finish_task_switch+0xa5/0x210
[ 3194.159993] ? finish_task_switch+0x6a/0x210
[ 3194.164247] do_vfs_ioctl+0x90/0x670
[ 3194.167806] ? entry_SYSCALL_64_fastpath+0x5/0xb1
[ 3194.172492] ? __this_cpu_preempt_check+0x13/0x20
[ 3194.177176] ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 3194.181946] SyS_ioctl+0x3c/0x70
[ 3194.185159] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 3194.189756] RIP: 0033:0x7fd2b76a8587
[ 3194.193314] RSP: 002b:
00007fff074845b8 EFLAGS:
00000246 ORIG_RAX:
0000000000000010
[ 3194.200855] RAX:
ffffffffffffffda RBX:
ffffffff8146da43 RCX:
00007fd2b76a8587
[ 3194.207962] RDX:
00007fff074846e0 RSI:
0000000040406469 RDI:
0000000000000003
[ 3194.215068] RBP:
ffffc9000176bf88 R08:
0000000000000000 R09:
0000000000000003
[ 3194.222175] R10:
00007fd2b796bb58 R11:
0000000000000246 R12:
00007fff07484880
[ 3194.229280] R13:
0000000000000003 R14:
0000000040406469 R15:
0000000000000000
[ 3194.236386] ? __this_cpu_preempt_check+0x13/0x20
[ 3194.241070] Code: 24 b0 00 00 00 48 85 c9 0f 84 6c ff ff ff 8b 41 20 85 c0 7e 73 83 e8 01 89 41 20 41 8b 84 24 e8 00 00 00 a8 0f 0f 85 5f ff ff ff <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 49 8b 84
[ 3194.259943] RIP: eb_release_vmas.isra.6+0x119/0x180 [i915] RSP:
ffffc9000176ba80
[ 3194.268047] ---[ end trace
1d7348c6575d8800 ]---
[ 3673.658819] softdog: Initiating panic
[ 3673.662471] Kernel panic - not syncing: Software Watchdog Timer expired
[ 3673.669066] Kernel Offset: disabled
[ 3673.672541] Rebooting in 1 seconds..
Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
1da7b54c46bcfe5484af0b27d8c9003b238031b0)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Chris Wilson [Fri, 21 Jul 2017 14:50:34 +0000 (15:50 +0100)]
drm/i915: Remove assertion from raw __i915_vma_unpin()
After we detect a i915_vma pin overflow, we call __i915_vma_unpin to
cleanup. However, on an overflow the pin_count bitfield will be zero,
triggering an assertion, even though we the intention is to merely warn
and report the error back to the user (as historically the culprit has
be a leak in the display code).
Fixes: 20dfbde463c8 ("drm/i915: Wrap vma->pin_count accessors with small inline helpers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
67fddd902b8e37b15a905c287ce4e40f52a564af)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Navare, Manasi D [Mon, 17 Jul 2017 22:05:22 +0000 (15:05 -0700)]
drm/i915/cnl: Fix loadgen select programming on ddi vswing sequence
The condition for setting the Loadgen Select bit of
PORT_TX_DW4 register during DDI Vswing Sequence should be
Bit rate <=6 GHz whereas the existing code checks only
Bit Rate < 6GHz. This patch fixes this condition.
While at it also remove the redundant paranthesis.
Fixes: cf54ca8bc567 ("drm/i915/cnl: Implement voltage swing sequence.")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1500329122-32662-1-git-send-email-manasi.d.navare@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
a8e45a1c42d11597e975f3e5f2fe182f90cdaa7f)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Imre Deak [Thu, 20 Jul 2017 11:28:20 +0000 (14:28 +0300)]
drm/i915: Fix scaler init during CRTC HW state readout
The scaler allocation code depends on a non-zero default value for the
crtc scaler_id, so make sure we initialize the scaler state accordingly
even if the crtc is off. This fixes at least an initial YUV420 modeset
(added in a follow-up patchset by Shashank) when booting with the screen
off: after the initial HW readout and modeset which enables the scaler a
subsequent modeset will disable the scaler which isn't properly
allocated. This results in a funky HW state where the pipe scaler HW
registers can't be modified and the normally black screen is grey and
shifted to the right or jitters.
The problem was revealed by Shashank's YUV420 patchset and first
reported by Ville.
v2:
- In the stable tag also include versions which need backporting (Jani)
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chandra Konduru <chandra.konduru@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: <stable@vger.kernel.org> # 4.2.x
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a1b2278e4dfc ("drm/i915: skylake panel fitting using shared scalers")
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170720112820.26816-1-imre.deak@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit
5fb9dadf336f3590c799e8cbde348215dccc2aa2)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Christophe JAILLET [Wed, 19 Jul 2017 22:35:03 +0000 (00:35 +0200)]
drm/i915/selftests: Fix an error handling path in 'mock_gem_device()'
Goto the right label in case of error, otherwise there is a leak.
This has been introduced by
c5cf9a9147ff. In this patch a goto has not been
updated.
Fixes: c5cf9a9147ff ("drm/i915: Create a kmem_cache to allocate struct i915_priolist from")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://patchwork.freedesktop.org/patch/msgid/20170719223503.30580-1-christophe.jaillet@wanadoo.fr
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit
a5ec7fe81a6ec38cb8b8a798d0552cbcadce7aa9)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Daniel Vetter [Wed, 19 Jul 2017 12:54:55 +0000 (14:54 +0200)]
drm/i915: Unbreak gpu reset vs. modeset locking
Taking the modeset locks unconditionally isn't the greatest idea,
because atm that part is still broken and times out (and then atomic
keels over). And there's really no reason to do so, the old code
didn't do that either.
To make the patch a bit simpler let's also nuke 2 cases that are only
around for the old mmioflip paths. Atomic nonblocking workers will not
die (minus bugs) when a gpu reset happens.
And of course this doesn't fix any of the gpu reset vs. modeset
deadlock fun, but it at least stop modern CI machines from keeling
over all over the place for no reason at all.
And we still have the explicit testcases to run the fake gpu reset, so
coverage isn't that much worse.
v2: Split out additional changes on top, restrict this to purely reducing
the critical section of modeset locks.
v2: Review from Maarten
- update comments
- don't oops when state is NULL in intel_finish_reset, but try to at
least still drop locks properly. The hw is going to be toast anyway.
Fixes: 739748939974 ("drm/i915: Fix modeset handling during gpu reset, v5.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170719125502.25696-3-daniel.vetter@ffwll.ch
(cherry picked from commit
ce87ea15ebc60a9f8f156b2549f7b2cf7fe48d04)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Linus Torvalds [Thu, 27 Jul 2017 19:44:05 +0000 (12:44 -0700)]
Merge branch 'parisc-4.13-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- The majority of lines changed are due to regenerated defconfig files.
- The support for the Page Deallocation Table (PDT) which was merged in
the merge window for 4.13 contained a bug which crashes the kernel if
a bad page is reported by firmware. This is now fixed and the kernel
messages will show which memory slot holds the broken DIMM.
- Commit
3a166fc2d4ef ("kbuild: handle libs-y archives separately from
built-in.o archives") broke linking the parisc kernel due to
millicode symbols which can't be reached then any longer. This was
fixed by modifying the parisc vmlinux.lds linker script.
- If the stack checker panics on stack overflow, avoid recursive
panics.
- Some parisc machines can't physically power off and thus instead
start after some time to flood the console by presumably detected
soft lockups. Avoid this by disabling the lockup detectors before
entering the endless for-next loop.
- Dave Anglin provided fixes which prevents TLB speculation on flushed
pages on PA8800/PA9000 CPUs.
- Arvind Yadav sent a trivial patch to constify the attribute_group
structure in our firmware on-board-flash storage driver
(pdc_stable.c)
* 'parisc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Extend disabled preemption in copy_user_page
parisc: Prevent TLB speculation on flushed pages on CPUs that only support equivalent aliases
parisc: Suspend lockup detectors before system halt
parisc: Show DIMM slot number which holds broken memory module
parisc: Add function to return DIMM slot of physical address
parisc: Fix crash when calling PDC_PAT_MEM PDT firmware function
parisc: regenerate defconfig files
parisc: pdc_stable: constify attribute_group structures.
parisc: Merge millicode routines via linker script
parisc: Disable further stack checks when panic occurs during stack check
Juergen Gross [Mon, 17 Jul 2017 17:47:03 +0000 (19:47 +0200)]
xen: dont fiddle with event channel masking in suspend/resume
Instead of fiddling with masking the event channels during suspend
and resume handling let do the irq subsystem do its job. It will do
the mask and unmask operations as needed.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Gustavo A. R. Silva [Tue, 4 Jul 2017 18:34:05 +0000 (13:34 -0500)]
xen: selfballoon: remove unnecessary static in frontswap_selfshrink()
Remove unnecessary static on local variables last_frontswap_pages and
tgt_frontswap_pages. Such variables are initialized before being used,
on every execution path throughout the function. The statics have no
benefit and, removing them reduce the code size.
This issue was detected using Coccinelle and the following semantic patch:
@bad exists@
position p;
identifier x;
type T;
@@
static T x@p;
...
x = <+...x...+>
@@
identifier x;
expression e;
type T;
position p != bad.p;
@@
-static
T x@p;
... when != x
when strict
?x = e;
You can see a significant difference in the code size after executing
the size command, before and after the code change:
before:
text data bss dec hex filename
5633 3452 384 9469 24fd drivers/xen/xen-selfballoon.o
after:
text data bss dec hex filename
5576 3308 256 9140 23b4 drivers/xen/xen-selfballoon.o
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Punit Agrawal [Thu, 20 Jul 2017 11:04:02 +0000 (12:04 +0100)]
xen: Drop un-informative message during boot
On systems that are not booted as a Xen domain, the xenfs driver prints
the following message during boot.
[ 3.460595] xenfs: not registering filesystem on non-xen platform
As the user chose not to boot a Xen domain, this message does not
provide useful information. Drop this message.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Linus Torvalds [Thu, 27 Jul 2017 17:44:28 +0000 (10:44 -0700)]
Merge tag 'sound-4.13-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This is a pretty boring pull request, containing a few HD-audio quirks
and ID updates as usual suspects, as well as a fix for a regression of
FM801 chip on ia64 (what a legacy combination!)"
* tag 'sound-4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add mute led support for HP ProBook 440 G4
ALSA: hda/realtek - No loopback on ALC225/ALC295 codec
ALSA: hda/realtek - Update headset mode for ALC225
ALSA: fm801: Initialize chip after IRQ handler is registered
ALSA: hda/realtek - Update headset mode for ALC298
ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table
Linus Torvalds [Thu, 27 Jul 2017 17:35:07 +0000 (10:35 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Two areas addressed by these fixes:
- Fixes from Dave Martin for the signal frames that were broken with
certain configurations. No one noticed until recently.
- More kexec fixes to ensure that the crashkernel region is correctly
allocated, and a fix for the location of the device tree when
several kexec kernels are loaded"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[]
ARM: 8686/1: iwmmxt: Add missing __user annotations to sigframe accessors
ARM: kexec: fix failure to boot crash kernel
ARM: kexec: avoid allocating crashkernel region outside lowmem
NeilBrown [Mon, 24 Jul 2017 03:18:50 +0000 (13:18 +1000)]
NFS: Optimize fallocate by refreshing mapping when needed.
posix_fallocate() will allocate space in an NFS file by considering
the last byte of every 4K block. If it is before EOF, it will read
the byte and if it is zero, a zero is written out. If it is after EOF,
the zero is unconditionally written.
For the blocks beyond EOF, if NFS believes its cache is valid, it will
expand these writes to write full pages, and then will merge the pages.
This results if (typically) 1MB writes. If NFS believes its cache is
not valid (particularly if NFS_INO_INVALID_DATA or
NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
send the individual 1-byte writes. This results in (typically) 256 times
as many RPC requests, and can be substantially slower.
Currently nfs_revalidate_mapping() is only used when reading a file or
mmapping a file, as these are times when the content needs to be
up-to-date. Writes don't generally need the cache to be up-to-date, but
writes beyond EOF can benefit, particularly in the posix_fallocate()
case.
So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
i.e. when there is a gap between the end of the file and the start of
the write. If the cache is thought to be out of date (as happens after
taking a file lock), this will cause a GETATTR, and the two flags
mentioned above will be cleared. With this, posix_fallocate() on a
newly locked file does not generate excessive tiny writes.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Mon, 24 Jul 2017 03:18:50 +0000 (13:18 +1000)]
NFS: invalidate file size when taking a lock.
Prior to commit
ca0daa277aca ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock. Since that commit it only invalidates the file content.
If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data. This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().
If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was. If EOF had moved, the pre-existing part of the file
will be over-written. Locking should have protected against this,
but it doesn't.
This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes. When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.
cc: stable@vger.kernel.org (v4.8+)
Fixes: ca0daa277aca ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Paul Kocialkowski [Mon, 10 Jul 2017 19:33:05 +0000 (21:33 +0200)]
gpu: host1x: Free the IOMMU domain when there is no device to attach
When there is no device to attach to the IOMMU domain, as may be the
case when the device-tree does not contain the proper iommu node, it is
best to keep going without IOMMU support rather than failing.
This allows the driver to probe and function instead of taking down
all of the tegra drm driver, leading to missing display support.
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
Fixes: 404bfb78daf3 ("gpu: host1x: Add IOMMU support")
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170710193305.5987-1-contact@paulk.fr
Shawn Lin [Tue, 18 Jul 2017 08:31:38 +0000 (16:31 +0800)]
Documentation: dw-mshc: deprecate num-slots
dwmmc host driver already deprecate it in the driver
but didn't modify the documentation to reflect the fact.
This patch deprecates it and clean up num-slots from the
examples of all variant host drivers.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Acked-by: Rob Herring <robh@kernel.org>
Fixes: d30a8f7bdf64 ("mmc: dw_mmc: deprecated the "num-slots" property")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Jens Axboe [Thu, 27 Jul 2017 14:08:22 +0000 (08:08 -0600)]
Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph
Shawn Lin [Fri, 21 Jul 2017 08:39:56 +0000 (16:39 +0800)]
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
Change to print the information about when the deprecated "num-slots" DT
binding is being used, as to avoid confusion when browsing the log:
dwmmc_rockchip
fe320000.dwmmc: 'num-slots' was deprecated.
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: d30a8f7bdf64 ("mmc: dw_mmc: deprecated the "num-slots" property")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Will Deacon [Tue, 25 Jul 2017 15:30:34 +0000 (16:30 +0100)]
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
Since the PMU register interface is banked per CPU, CPU PMU interrrupts
cannot be handled by a CPU other than the one with the PMU asserting the
interrupt. This means that migrating PMU SPIs, as we do during a CPU
hotplug operation doesn't make any sense and can lead to the IRQ being
disabled entirely if we route a spurious IRQ to the new affinity target.
This has been observed in practice on AMD Seattle, where CPUs on the
non-boot cluster appear to take a spurious PMU IRQ when coming online,
which is routed to CPU0 where it cannot be handled.
This patch passes IRQF_PERCPU for PMU SPIs and forcefully sets their
affinity prior to requesting them, ensuring that they cannot
be migrated during hotplug events. This interacts badly with the DB8500
erratum workaround that ping-pongs the interrupt affinity from the handler,
so we avoid passing IRQF_PERCPU in that case by allowing the IRQ flags
to be overridden in the platdata.
Fixes: 3cf7ee98b848 ("drivers/perf: arm_pmu: move irq request/free into probe")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ville Syrjälä [Fri, 14 Jul 2017 15:52:27 +0000 (18:52 +0300)]
drm/i915: Fix cursor updates on some platforms
Turns out that just writing CURPOS isn't sufficient to move the cursor
on some platforms. My 830 works just fine, but eg. 945 and PNV don't.
On those platforms we need to arm even the CURPOS update with a
CURBASE write.
Even worse, a write to any of the cursor register apart from CURBASE
will cancel an already pending cursor update. So if we have armed a
CURCNTR/CURBASE update, a subsequent CURPOS write prior to vblank
would cancel that armed update. Thus we're left with a cursor that
doesn't appear to move, or even change shape.
Fix the problem by always performing the CURBASE write after a
CURPOS write. Bspec is somewhat unclear which platforms actually
require this CURBASE write and which don't. So to keep it simple
and to make sure we really fix the problem across all supported
devices, let's just perform the CURBASE write unconditionally.
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101790
Fixes: 75343a44c901 ("drm/i915: Drop useless posting reads from cursor commit")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170714155227.6089-1-ville.syrjala@linux.intel.com
(cherry picked from commit
8753d2bc5e49daad301ce65f5dada57ed924fad6)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>