openwrt/staging/blogic.git
4 years agoblock/diskstats: replace time_in_queue with sum of request times
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:08 +0000 (16:07 +0300)]
block/diskstats: replace time_in_queue with sum of request times

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/diskstats: accumulate all per-cpu counters in one pass
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:06 +0000 (16:07 +0300)]
block/diskstats: accumulate all per-cpu counters in one pass

Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.

Hammering /proc/diskstats with fio shows 2x performance improvement:

fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
    --size=1k --bs=1k --fallocate=none --create_on_open=1 \
    --time_based=1 --runtime=10 --invalidate=0 --group_report

  JOBS=1 JOBS=10
Before:   7k iops 64k iops
After:  18k iops      120k iops

Also this way code is more compact:

add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function                                     old     new   delta
part_stat_read_all                             -     194    +194
diskstats_show                              1344     631    -713
part_stat_show                              1219     392    -827
Total: Before=14966947, After=14965601, chg -0.01%

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/diskstats: more accurate approximation of io_ticks for slow disks
Konstantin Khlebnikov [Wed, 25 Mar 2020 13:07:04 +0000 (16:07 +0300)]
block/diskstats: more accurate approximation of io_ticks for slow disks

Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.

If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.

Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.

Example: common HDD executes random read 4k requests around 12ms.

fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb

Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:

Before:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0,00     0,00   82,60    0,00   330,40     0,00     8,00     0,96   12,09   12,09    0,00   1,02   8,43

After:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0,00     0,00   82,50    0,00   330,00     0,00     8,00     1,00   12,10   12,10    0,00  12,12  99,99

Now io_ticks does not loose time between start and end of requests, but
for queue-depth > 1 some I/O time between adjacent starts might be lost.

For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.

Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: merge partition-generic.c and check.c
Christoph Hellwig [Tue, 24 Mar 2020 07:25:30 +0000 (08:25 +0100)]
block: merge partition-generic.c and check.c

Merge block/partition-generic.c and block/partitions/check.c into
a single block/partitions/core.c as the content is closely related
and both files are tiny.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move the various x86 Unix label formats out of genhd.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:29 +0000 (08:25 +0100)]
block: move the various x86 Unix label formats out of genhd.h

All these are just used in block/partitions/msdos.c, so move them out of the
genhd.h driver included by every driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agopartitions/msdos: remove LINUX_SWAP_PARTITION
Christoph Hellwig [Tue, 24 Mar 2020 07:25:28 +0000 (08:25 +0100)]
partitions/msdos: remove LINUX_SWAP_PARTITION

Just always use NEW_SOLARIS_X86_PARTITION and explain the situation,
as that is less confusing than two names for a single value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move the *_PARTITION enum out of genhd.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:27 +0000 (08:25 +0100)]
block: move the *_PARTITION enum out of genhd.h

The enum containing the *_PARTITION symbolic names is only relevant
for the partition parser.  More specifically most values are MSDOS
partition table system indicators and thus should go straight into
msdos.c.  One value is only used by the sun partition parser, and the
sun and sgi partition parsers use the same value as the x86 Linux
RAID indicator to also indicate RAID autodetection.  Duplicate them
in sun.c and sgi.c given that the different partition types use
entirely different values otherwise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move struct partition out of genhd.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:26 +0000 (08:25 +0100)]
block: move struct partition out of genhd.h

struct partition is the on-disk format of a MSDOS partition table entry.
Move it out of genhd.h into a new msdos_partition.h header and give it
a msdos_ prefix to avoid confusion.
Also move the magic number from block/partitions/msdos.h to the new
header so that it can be used by the SCSI drivers looking at the DOS
partition tables.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove block/partitions/sun.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:25 +0000 (08:25 +0100)]
block: remove block/partitions/sun.h

Just move the two defines to block/partitions/sun.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove block/partitions/sgi.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:24 +0000 (08:25 +0100)]
block: remove block/partitions/sgi.h

Just move the single define to block/partitions/sgi.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove block/partitions/osf.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:23 +0000 (08:25 +0100)]
block: remove block/partitions/osf.h

Just move the single define to block/partitions/osf.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove block/partitions/karma.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:22 +0000 (08:25 +0100)]
block: remove block/partitions/karma.h

Just move the single define to block/partitions/karma.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: declare all partition detection routines in check.h
Christoph Hellwig [Tue, 24 Mar 2020 07:25:21 +0000 (08:25 +0100)]
block: declare all partition detection routines in check.h

There is no good reason to include one header per partition type in
core.c.  Instead move the prototypes for the detection routins to
check.h, and remove all now empty headers in block/partitions/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove warn_no_part
Christoph Hellwig [Tue, 24 Mar 2020 07:25:20 +0000 (08:25 +0100)]
block: remove warn_no_part

The warn_no_part is initialized to 1 and never changed.  Remove
it and execute the code keyed off from it unconditionally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: cleanup how md_autodetect_dev is called
Christoph Hellwig [Tue, 24 Mar 2020 07:25:19 +0000 (08:25 +0100)]
block: cleanup how md_autodetect_dev is called

Add a new include/linux/raid/detect.h header to declare the
md_autodetect_dev prototype which can be shared between md and
the partition code.  Then use IS_BUILTIN to call it instead of the
ifdef magic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: unexport read_dev_sector and put_dev_sector
Christoph Hellwig [Tue, 24 Mar 2020 07:25:18 +0000 (08:25 +0100)]
block: unexport read_dev_sector and put_dev_sector

read_dev_sector and put_dev_sector are now only used by the partition
parsing code.  Remove the export for read_dev_sector and merge it into
the only caller.  Clean the mess up a bit by using goto labels and
the SECTOR_SHIFT constant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoscsi: simplify scsi_partsize
Christoph Hellwig [Tue, 24 Mar 2020 07:25:17 +0000 (08:25 +0100)]
scsi: simplify scsi_partsize

Call scsi_bios_ptable from scsi_partsize instead of requiring boilerplate
code in the callers.  Also switch the calling convention to match that
of the ->bios_param instances calling this function, and use true/false
for the return value instead of the weird -1 convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoscsi: move scsicam_bios_param to the end of scsicam.c
Christoph Hellwig [Tue, 24 Mar 2020 07:25:16 +0000 (08:25 +0100)]
scsi: move scsicam_bios_param to the end of scsicam.c

This avoids the need for a forward declaration and generally keeps the
file in the lower level first, high level last order.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoscsi: simplify scsi_bios_ptable
Christoph Hellwig [Tue, 24 Mar 2020 07:25:15 +0000 (08:25 +0100)]
scsi: simplify scsi_bios_ptable

Use read_mapping_page and kmemdup instead of the odd read_dev_sector and
put_dev_sector helpers from the partitioning code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove alloc_part_info and free_part_info
Christoph Hellwig [Tue, 24 Mar 2020 07:25:14 +0000 (08:25 +0100)]
block: remove alloc_part_info and free_part_info

There isn't any good reason not to simply open code the allocation and
freeing of the partition_meta_info structure.  Especially as one of
the branches in alloc_part_info is entirely dead code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move sysfs methods shared by disks and partitions to genhd.c
Christoph Hellwig [Tue, 24 Mar 2020 07:25:13 +0000 (08:25 +0100)]
block: move sysfs methods shared by disks and partitions to genhd.c

Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code.  Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: move disk_name and related helpers out of partition-generic.c
Christoph Hellwig [Tue, 24 Mar 2020 07:25:12 +0000 (08:25 +0100)]
block: move disk_name and related helpers out of partition-generic.c

Thes functions aren't really related to partition support, so move them
to a more suitable place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove __bdevname
Christoph Hellwig [Tue, 24 Mar 2020 07:25:11 +0000 (08:25 +0100)]
block: remove __bdevname

There is no good reason for __bdevname to exist.  Just open code
printing the string in the callers.  For three of them the format
string can be trivially merged into existing printk statements,
and in init/do_mounts.c we can at least do the scnprintf once at
the start of the function, and unconditional of CONFIG_BLOCK to
make the output for tiny configfs a little more helpful.

Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove the blk_lookup_devt export
Christoph Hellwig [Tue, 24 Mar 2020 07:25:10 +0000 (08:25 +0100)]
block: remove the blk_lookup_devt export

This function is only used by init/do_mounts.c, which can't be modular.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline
Paolo Valente [Sat, 21 Mar 2020 09:45:21 +0000 (10:45 +0100)]
block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline

In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
flush the rb tree that contains all idle entities belonging to the pd
(cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
invoked before bfq_reparent_active_queues(). Yet the latter may happen
to add some entities to the idle tree. It happens if, in some of the
calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
the queue to move is empty and gets expired.

This commit simply reverses the invocation order between
bfq_flush_idle_tree() and bfq_reparent_active_queues().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, bfq: make reparent_leaf_entity actually work only on leaf entities
Paolo Valente [Sat, 21 Mar 2020 09:45:20 +0000 (10:45 +0100)]
block, bfq: make reparent_leaf_entity actually work only on leaf entities

bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
entity represents just a bfq_queue in an entity tree). Yet, the input
entity is guaranteed to always be a leaf entity only in two-level
entity trees. In this respect, because of the error fixed by
commit 14afc5936197 ("block, bfq: fix overwrite of bfq_group pointer
in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
to actually have only two levels. After the latter commit, this does not
hold any longer.

This commit fixes this problem by modifying
bfq_reparent_leaf_entity(), so that it searches an active leaf entity
down the path that stems from the input entity. Such a leaf entity is
guaranteed to exist when bfq_reparent_leaf_entity() is invoked.

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup
Paolo Valente [Sat, 21 Mar 2020 09:45:19 +0000 (10:45 +0100)]
block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup

A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
goal of this put is to release a process reference to a bfq_queue. But
process-reference releases may trigger also some extra operation, and,
to this goal, are handled through bfq_release_process_ref(). So, turn
the invocation of bfq_put_queue() into an invocation of
bfq_release_process_ref().

Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, bfq: move forward the getting of an extra ref in bfq_bfqq_move
Paolo Valente [Sat, 21 Mar 2020 09:45:18 +0000 (10:45 +0100)]
block, bfq: move forward the getting of an extra ref in bfq_bfqq_move

Commit ecedd3d7e199 ("block, bfq: get extra ref to prevent a queue
from being freed during a group move") gets an extra reference to a
bfq_queue before possibly deactivating it (temporarily), in
bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
being reactivated in its new group.

Yet, the bfq_queue may also be expired (i.e., its service may be
stopped) before the bfq_queue is deactivated. And also an expiration
may lead to a premature freeing. This commit fixes this issue by
simply moving forward the getting of the extra reference already
introduced by commit ecedd3d7e199 ("block, bfq: get extra ref to
prevent a queue from being freed during a group move").

Reported-by: cki-project@redhat.com
Tested-by: cki-project@redhat.com
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, bfq: fix use-after-free in bfq_idle_slice_timer_body
Zhiqiang Liu [Thu, 19 Mar 2020 11:18:13 +0000 (19:18 +0800)]
block, bfq: fix use-after-free in bfq_idle_slice_timer_body

In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
not in bfqd-lock critical section. The bfqq, which is not
equal to NULL in bfq_idle_slice_timer, may be freed after passing
to bfq_idle_slice_timer_body. So we will access the freed memory.

In addition, considering the bfqq may be in race, we should
firstly check whether bfqq is in service before doing something
on it in bfq_idle_slice_timer_body func. If the bfqq in race is
not in service, it means the bfqq has been expired through
__bfq_bfqq_expire func, and wait_request flags has been cleared in
__bfq_bfqd_reset_in_service func. So we do not need to re-clear the
wait_request of bfqq which is not in service.

KASAN log is given as follows:
[13058.354613] ==================================================================
[13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
[13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
[13058.354646]
[13058.354655] CPU: 96 PID: 19767 Comm: fork13
[13058.354661] Call trace:
[13058.354667]  dump_backtrace+0x0/0x310
[13058.354672]  show_stack+0x28/0x38
[13058.354681]  dump_stack+0xd8/0x108
[13058.354687]  print_address_description+0x68/0x2d0
[13058.354690]  kasan_report+0x124/0x2e0
[13058.354697]  __asan_load8+0x88/0xb0
[13058.354702]  bfq_idle_slice_timer+0xac/0x290
[13058.354707]  __hrtimer_run_queues+0x298/0x8b8
[13058.354710]  hrtimer_interrupt+0x1b8/0x678
[13058.354716]  arch_timer_handler_phys+0x4c/0x78
[13058.354722]  handle_percpu_devid_irq+0xf0/0x558
[13058.354731]  generic_handle_irq+0x50/0x70
[13058.354735]  __handle_domain_irq+0x94/0x110
[13058.354739]  gic_handle_irq+0x8c/0x1b0
[13058.354742]  el1_irq+0xb8/0x140
[13058.354748]  do_wp_page+0x260/0xe28
[13058.354752]  __handle_mm_fault+0x8ec/0x9b0
[13058.354756]  handle_mm_fault+0x280/0x460
[13058.354762]  do_page_fault+0x3ec/0x890
[13058.354765]  do_mem_abort+0xc0/0x1b0
[13058.354768]  el0_da+0x24/0x28
[13058.354770]
[13058.354773] Allocated by task 19731:
[13058.354780]  kasan_kmalloc+0xe0/0x190
[13058.354784]  kasan_slab_alloc+0x14/0x20
[13058.354788]  kmem_cache_alloc_node+0x130/0x440
[13058.354793]  bfq_get_queue+0x138/0x858
[13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
[13058.354801]  bfq_init_rq+0x1f4/0x1180
[13058.354806]  bfq_insert_requests+0x264/0x1c98
[13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
[13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
[13058.354826]  blk_flush_plug_list+0x230/0x548
[13058.354830]  blk_finish_plug+0x60/0x80
[13058.354838]  read_pages+0xec/0x2c0
[13058.354842]  __do_page_cache_readahead+0x374/0x438
[13058.354846]  ondemand_readahead+0x24c/0x6b0
[13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
[13058.354858]  generic_file_buffered_read+0x588/0xc58
[13058.354862]  generic_file_read_iter+0x1b4/0x278
[13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
[13058.354972]  __vfs_read+0x238/0x320
[13058.354976]  vfs_read+0xbc/0x1c0
[13058.354980]  ksys_read+0xdc/0x1b8
[13058.354984]  __arm64_sys_read+0x50/0x60
[13058.354990]  el0_svc_common+0xb4/0x1d8
[13058.354994]  el0_svc_handler+0x50/0xa8
[13058.354998]  el0_svc+0x8/0xc
[13058.354999]
[13058.355001] Freed by task 19731:
[13058.355007]  __kasan_slab_free+0x120/0x228
[13058.355010]  kasan_slab_free+0x10/0x18
[13058.355014]  kmem_cache_free+0x288/0x3f0
[13058.355018]  bfq_put_queue+0x134/0x208
[13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
[13058.355026]  bfq_exit_icq+0x28/0x40
[13058.355030]  ioc_exit_icq+0xa0/0x150
[13058.355035]  put_io_context_active+0x250/0x438
[13058.355038]  exit_io_context+0xd0/0x138
[13058.355045]  do_exit+0x734/0xc58
[13058.355050]  do_group_exit+0x78/0x220
[13058.355054]  __wake_up_parent+0x0/0x50
[13058.355058]  el0_svc_common+0xb4/0x1d8
[13058.355062]  el0_svc_handler+0x50/0xa8
[13058.355066]  el0_svc+0x8/0xc
[13058.355067]
[13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
[13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70ffffa02cf3e64040)
[13058.355077] The buggy address belongs to the page:
[13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
[13058.366175] flags: 0x2ffffe0000008100(slab|head)
[13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
[13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
[13058.370789] page dumped because: kasan: bad access detected
[13058.370791]
[13058.370792] Memory state around the buggy address:
[13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
[13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370808]                                                                 ^
[13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[13058.370817] ==================================================================
[13058.370820] Disabling lock debugging due to kernel taint

Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
--
V2->V3: rewrite the comment as suggested by Paolo Valente
V1->V2: add one comment, and add Fixes and Reported-by tag.

Fixes: aee69d78d ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Reported-by: Wang Wang <wangwang2@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoscsi: Convert to use set_capacity_revalidate_and_notify
Balbir Singh [Fri, 13 Mar 2020 05:30:09 +0000 (05:30 +0000)]
scsi: Convert to use set_capacity_revalidate_and_notify

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents. This notification is newly added to scsi sd.

Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonvme: Convert to use set_capacity_revalidate_and_notify
Balbir Singh [Fri, 13 Mar 2020 05:30:08 +0000 (05:30 +0000)]
nvme: Convert to use set_capacity_revalidate_and_notify

block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents. This notification is
newly added to NVME devices

Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoxen-blkfront.c: Convert to use set_capacity_revalidate_and_notify
Balbir Singh [Fri, 13 Mar 2020 05:30:07 +0000 (05:30 +0000)]
xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify

block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents.

Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agovirtio_blk.c: Convert to use set_capacity_revalidate_and_notify
Balbir Singh [Fri, 13 Mar 2020 05:30:06 +0000 (05:30 +0000)]
virtio_blk.c: Convert to use set_capacity_revalidate_and_notify

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents.

Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock/genhd: Notify udev about capacity change
Balbir Singh [Fri, 13 Mar 2020 05:30:05 +0000 (05:30 +0000)]
block/genhd: Notify udev about capacity change

Allow block/genhd to notify user space (via udev) about disk size changes
using a new helper set_capacity_revalidate_and_notify(), which is a wrapper
on top of set_capacity(). set_capacity_revalidate_and_notify() will only
notify via udev if the current capacity or the target capacity is not zero
and iff the capacity changes.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Someswarudu Sangaraju <ssomesh@amazon.com>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: Prevent hung_check firing during long sync IO
Ming Lei [Wed, 18 Mar 2020 03:43:36 +0000 (11:43 +0800)]
block: Prevent hung_check firing during long sync IO

submit_bio_wait() can be called from ioctl(BLKSECDISCARD), which
may take long time to complete, as Salman mentioned, 4K BLKSECDISCARD
takes up to 100 second on some devices. Also any block I/O operation
that occurs after the BLKSECDISCARD is submitted will also potentially
be affected by the hung task timeouts.

Another report is that task hang can be observed when running mkfs
over raid10 which takes a small max discard sectors limit because
of chunk size.

So prevent hung_check from firing by taking same approach used
in blk_execute_rq(), and the wake-up interval is set as half the
hung_check timer period, which keeps overhead low enough.

Cc: Salman Qazi <sqazi@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lkml.org/lkml/2020/2/12/1193
Reported-by: Salman Qazi <sqazi@google.com>
Reviewed-by: Jesse Barnes <jsbarnes@google.com>
Reviewed-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: fix a device invalidation regression
Christoph Hellwig [Wed, 18 Mar 2020 08:12:06 +0000 (09:12 +0100)]
block: fix a device invalidation regression

Historically we only set the capacity to zero for devices that support
partitions (independ of actually having partitions created).  Doing that
is rather inconsistent, but changing it broke legacy udisks polling for
legacy ide-cdrom devices.  Use the crude a crude check for devices that
either are non-removable or partitionable to get the sane behavior for
most device while not breaking userspace for this particular setup.

Fixes: a1548b674403 ("block: move rescan_partitions to fs/block_dev.c")
Reported-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock, zoned: fix integer overflow with BLKRESETZONE et al
Alexey Dobriyan [Wed, 12 Feb 2020 17:40:27 +0000 (20:40 +0300)]
block, zoned: fix integer overflow with BLKRESETZONE et al

Check for overflow in addition before checking for end-of-block-device.

Steps to reproduce:

#define _GNU_SOURCE 1
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef unsigned long long __u64;

struct blk_zone_range {
        __u64 sector;
        __u64 nr_sectors;
};

#define BLKRESETZONE    _IOW(0x12, 131, struct blk_zone_range)

int main(void)
{
        int fd = open("/dev/nullb0", O_RDWR|O_DIRECT);
        struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL};
        ioctl(fd, BLKRESETZONE, &zr);
        return 0;
}

BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0
Write of size 8 at addr 0000000000000040 by task a.out/1590

CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c02bfa #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
Call Trace:
 dump_stack+0x76/0xa0
 __kasan_report.cold+0x5/0x3e
 kasan_report+0xe/0x20
 submit_bio_wait+0x74/0xe0
 blkdev_zone_mgmt+0x26f/0x2a0
 blkdev_zone_mgmt_ioctl+0x14b/0x1b0
 blkdev_ioctl+0xb28/0xe60
 block_ioctl+0x69/0x80
 ksys_ioctl+0x3af/0xa50

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-iocost: remove duplicated lines in comments
Weiping Zhang [Thu, 27 Feb 2020 01:38:46 +0000 (09:38 +0800)]
blk-iocost: remove duplicated lines in comments

Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: sed-opal: Change the check condition for regular session validity
Revanth Rajashekar [Tue, 3 Mar 2020 19:17:00 +0000 (12:17 -0700)]
block: sed-opal: Change the check condition for regular session validity

This patch changes the check condition for the validity/authentication
of the session.

1. The Host Session Number(HSN) in the response should match the HSN for
   the session.
2. The TPER Session Number(TSN) can never be less than 4096 for a regular
   session.

Reference:
Section 3.2.2.1   of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Opal_SSC_Application_Note_1-00_1-00-Final.pdf
Section 3.3.7.1.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage_Architecture_Core_Spec_v2.01_r1.00.pdf

Co-developed-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Signed-off-by: Andrzej Jakowski <andrzej.jakowski@linux.intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: Document genhd capability flags
Stephen Kitt [Sat, 7 Mar 2020 14:56:59 +0000 (15:56 +0100)]
block: Document genhd capability flags

The kernel documentation includes a brief section about genhd
capabilities, but it turns out that the only documented
capability (GENHD_FL_MEDIA_CHANGE_NOTIFY) isn't used any more.

This patch removes that flag, and documents the rest, based on my
understanding of the current uses of these flags in the kernel. The
documentation is kept in the header file, alongside the declarations,
in the hope that it will be kept up-to-date in future; the kernel
documentation is changed to include the documentation generated from
the header file.

Because the ultimate goal is to provide some end-user
documentation (or end-administrator documentation), the comments are
perhaps more user-oriented than might be expected. Since the values
are shown to users in hexadecimal, the documentation lists them in
hexadecimal, and the constant declarations are adjusted to match.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Stephen Kitt <steve@sk2.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: cleanup comment for blk_flush_complete_seq
Guoqing Jiang [Mon, 9 Mar 2020 21:41:38 +0000 (22:41 +0100)]
block: cleanup comment for blk_flush_complete_seq

Remove the comment about return value, since it is not valid after
commit 404b8f5a03d84 ("block: cleanup kick/queued handling").

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove unneeded argument from blk_alloc_flush_queue
Guoqing Jiang [Mon, 9 Mar 2020 21:41:37 +0000 (22:41 +0100)]
block: remove unneeded argument from blk_alloc_flush_queue

Remove 'q' from arguments since it is not used anymore after
commit 7e992f847a08e ("block: remove non mq parts from the
flush code").

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: cleanup for _blk/blk_rq_prep_clone
Guoqing Jiang [Mon, 9 Mar 2020 21:41:36 +0000 (22:41 +0100)]
block: cleanup for _blk/blk_rq_prep_clone

Both cmd and sense had been moved to scsi_request, so remove
the related comments to avoid confusion.

And as Bart suggested, move _blk_rq_prep_clone into the only
caller (blk_rq_prep_clone).

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: remove redundant setting of QUEUE_FLAG_DYING
Guoqing Jiang [Mon, 9 Mar 2020 21:41:35 +0000 (22:41 +0100)]
block: remove redundant setting of QUEUE_FLAG_DYING

Previously, blk_cleanup_queue has called blk_set_queue_dying to set the
flag, no need to do it again.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: use bio_{wouldblock,io}_error in direct_make_request
Guoqing Jiang [Mon, 9 Mar 2020 21:41:34 +0000 (22:41 +0100)]
block: use bio_{wouldblock,io}_error in direct_make_request

Use the two functions to simplify code.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: fix comment for blk_cloned_rq_check_limits
Guoqing Jiang [Mon, 9 Mar 2020 21:41:33 +0000 (22:41 +0100)]
block: fix comment for blk_cloned_rq_check_limits

Since the later description mentioned "checked against the new queue
limits", so make the change to avoid confusion.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblock: Fix use-after-free issue accessing struct io_cq
Sahitya Tummala [Wed, 11 Mar 2020 10:37:50 +0000 (16:07 +0530)]
block: Fix use-after-free issue accessing struct io_cq

There is a potential race between ioc_release_fn() and
ioc_clear_queue() as shown below, due to which below kernel
crash is observed. It also can result into use-after-free
issue.

context#1: context#2:
ioc_release_fn() __ioc_clear_queue() gets the same icq
->spin_lock(&ioc->lock); ->spin_lock(&ioc->lock);
->ioc_destroy_icq(icq);
  ->list_del_init(&icq->q_node);
  ->call_rcu(&icq->__rcu_head,
   icq_free_icq_rcu);
->spin_unlock(&ioc->lock);
->ioc_destroy_icq(icq);
  ->hlist_del_init(&icq->ioc_node);
  This results into below crash as this memory
  is now used by icq->__rcu_head in context#1.
  There is a chance that icq could be free'd
  as well.

22150.386550:   <6> Unable to handle kernel write to read-only memory
at virtual address ffffffaa8d31ca50
...
Call trace:
22150.607350:   <2>  ioc_destroy_icq+0x44/0x110
22150.611202:   <2>  ioc_clear_queue+0xac/0x148
22150.615056:   <2>  blk_cleanup_queue+0x11c/0x1a0
22150.619174:   <2>  __scsi_remove_device+0xdc/0x128
22150.623465:   <2>  scsi_forget_host+0x2c/0x78
22150.627315:   <2>  scsi_remove_host+0x7c/0x2a0
22150.631257:   <2>  usb_stor_disconnect+0x74/0xc8
22150.635371:   <2>  usb_unbind_interface+0xc8/0x278
22150.639665:   <2>  device_release_driver_internal+0x198/0x250
22150.644897:   <2>  device_release_driver+0x24/0x30
22150.649176:   <2>  bus_remove_device+0xec/0x140
22150.653204:   <2>  device_del+0x270/0x460
22150.656712:   <2>  usb_disable_device+0x120/0x390
22150.660918:   <2>  usb_disconnect+0xf4/0x2e0
22150.664684:   <2>  hub_event+0xd70/0x17e8
22150.668197:   <2>  process_one_work+0x210/0x480
22150.672222:   <2>  worker_thread+0x32c/0x4c8

Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
indicate this icq is once marked as destroyed. Also, ensure
__ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
that icq doesn't get free'd up while it is still using it.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Co-developed-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Pradeep P V K <ppvk@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonull_blk: Add support for init_hctx() fault injection
Bart Van Assche [Tue, 10 Mar 2020 04:26:23 +0000 (21:26 -0700)]
null_blk: Add support for init_hctx() fault injection

This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
and also several error paths in null_blk.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonull_blk: Handle null_add_dev() failures properly
Bart Van Assche [Tue, 10 Mar 2020 04:26:22 +0000 (21:26 -0700)]
null_blk: Handle null_add_dev() failures properly

If null_add_dev() fails then null_del_dev() is called with a NULL argument.
Make null_del_dev() handle this scenario correctly. This patch fixes the
following KASAN complaint:

null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
Read of size 8 at addr 0000000000000000 by task find/1062

Call Trace:
 dump_stack+0xa5/0xe6
 __kasan_report.cold+0x65/0x99
 kasan_report+0x16/0x20
 __asan_load8+0x58/0x90
 null_del_dev+0x28/0x280 [null_blk]
 nullb_group_drop_item+0x7e/0xa0 [null_blk]
 client_drop_item+0x53/0x80 [configfs]
 configfs_rmdir+0x395/0x4e0 [configfs]
 vfs_rmdir+0xb6/0x220
 do_rmdir+0x238/0x2c0
 __x64_sys_unlinkat+0x75/0x90
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonull_blk: Fix the null_add_dev() error path
Bart Van Assche [Tue, 10 Mar 2020 04:26:21 +0000 (21:26 -0700)]
null_blk: Fix the null_add_dev() error path

If null_add_dev() fails, clear dev->nullb.

This patch fixes the following KASAN complaint:

BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
Read of size 8 at addr ffff88803280fc30 by task check/8409

Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x26/0x260
 __kasan_report.cold+0x7b/0x99
 kasan_report+0x16/0x20
 __asan_load8+0x58/0x90
 nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff370926317
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0

Allocated by task 8409:
 save_stack+0x23/0x90
 __kasan_kmalloc.constprop.0+0xcf/0xe0
 kasan_kmalloc+0xd/0x10
 kmem_cache_alloc_node_trace+0x129/0x4c0
 null_add_dev+0x24a/0xe90 [null_blk]
 nullb_device_power_store+0x1b6/0x270 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 8409:
 save_stack+0x23/0x90
 __kasan_slab_free+0x112/0x160
 kasan_slab_free+0x12/0x20
 kfree+0xdf/0x250
 null_add_dev+0xaf3/0xe90 [null_blk]
 nullb_device_power_store+0x1b6/0x270 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 2984c8684f96 ("nullb: factor disk parameters")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonull_blk: Fix changing the number of hardware queues
Bart Van Assche [Tue, 10 Mar 2020 04:26:20 +0000 (21:26 -0700)]
null_blk: Fix changing the number of hardware queues

Instead of initializing null_blk hardware queues explicitly after the
request queue has been created, provide .init_hctx() and .exit_hctx()
callback functions. The latter functions are not only called during
request queue allocation but also when the number of hardware queues
changes. Allocate nr_cpu_ids queues during initialization to support
increasing the number of hardware queues above the initial hardware
queue count.

This change fixes increasing the number of hardware queues above the
initial number of hardware queues and also keeps nullb->nr_queues in
sync with the number of hardware queues.

Fixes: 45919fbfe1c4 ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agonull_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed'
Bart Van Assche [Tue, 10 Mar 2020 04:26:19 +0000 (21:26 -0700)]
null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed'

Although it is not clear to me why UBSAN complains when 'memory_backed'
is set, this patch suppresses the UBSAN complaint that is triggered when
setting that configfs attribute.

UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
load of value 16 is not a valid value for type '_Bool'
CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 dump_stack+0xa5/0xe6
 ubsan_epilogue+0x9/0x26
 __ubsan_handle_load_invalid_value+0x6d/0x76
 nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs()
Bart Van Assche [Tue, 10 Mar 2020 04:26:18 +0000 (21:26 -0700)]
blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs()

q->nr_hw_queues must only be updated once it is known that
blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that
reallocation fails and that q->nr_hw_queues is larger than the number of
allocated hardware queues. This patch fixes the following crash if
increasing the number of hardware queues fails:

BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810
Write of size 8 at addr 0000000000000118 by task check/977

CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 dump_stack+0xa5/0xe6
 __kasan_report.cold+0x65/0x99
 kasan_report+0x16/0x20
 check_memory_region+0x140/0x1b0
 memset+0x28/0x40
 blk_mq_map_swqueue+0x775/0x810
 blk_mq_update_nr_hw_queues+0x468/0x710
 nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: ac0d6b926e74 ("block: Reduce the amount of memory required per request queue")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync
Bart Van Assche [Tue, 10 Mar 2020 04:26:17 +0000 (21:26 -0700)]
blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync

blk_mq_map_queues() and multiple .map_queues() implementations expect that
set->map[HCTX_TYPE_DEFAULT].nr_queues is set to the number of hardware
queues. Hence set .nr_queues before calling these functions. This patch
fixes the following kernel warning:

WARNING: CPU: 0 PID: 2501 at include/linux/cpumask.h:137
Call Trace:
 blk_mq_run_hw_queue+0x19d/0x350 block/blk-mq.c:1508
 blk_mq_run_hw_queues+0x112/0x1a0 block/blk-mq.c:1525
 blk_mq_requeue_work+0x502/0x780 block/blk-mq.c:775
 process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x361/0x430 kernel/kthread.c:255

Fixes: ed76e329d74a ("blk-mq: abstract out queue map") # v5.0
Reported-by: syzbot+d44e1b26ce5c3e77458d@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoblk-mq: Fix a comment in include/linux/blk-mq.h
Bart Van Assche [Tue, 10 Mar 2020 04:26:16 +0000 (21:26 -0700)]
blk-mq: Fix a comment in include/linux/blk-mq.h

The 'hctx_list' member of struct blk_mq_hw_ctx is not a list head but
instead an entry in q->unused_hctx_list. Fix the comment above this
struct member.

Fixes: d386732bc142 ("blk-mq: fill header with kernel-doc")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoLinux 5.6-rc5
Linus Torvalds [Mon, 9 Mar 2020 00:44:44 +0000 (17:44 -0700)]
Linux 5.6-rc5

4 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Mon, 9 Mar 2020 00:36:22 +0000 (17:36 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Olof Johansson:
 "We've been accruing these for a couple of weeks, so the batch is a bit
  bigger than usual.

  Largest delta is due to a led-bl driver that is added -- there was a
  miscommunication before the merge window and the driver didn't make it
  in. Due to this, the platforms needing it regressed. At this point, it
  seemed easier to add the new driver than unwind the changes.

  Besides that, there are a handful of various fixes:

   - AMD tee memory leak fix

   - A handful of fixlets for i.MX SCU communication

   - A few maintainers woke up and realized DEBUG_FS had been missing
     for a while, so a few updates of that.

  ... and the usual collection of smaller fixes to various platforms"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (37 commits)
  ARM: socfpga_defconfig: Add back DEBUG_FS
  arm64: dts: socfpga: agilex: Fix gmac compatible
  ARM: bcm2835_defconfig: Explicitly restore CONFIG_DEBUG_FS
  arm64: dts: meson: fix gxm-khadas-vim2 wifi
  arm64: dts: meson-sm1-sei610: add missing interrupt-names
  ARM: meson: Drop unneeded select of COMMON_CLK
  ARM: dts: bcm2711: Add pcie0 alias
  ARM: dts: bcm283x: Add missing properties to the PWR LED
  tee: amdtee: fix memory leak in amdtee_open_session()
  ARM: OMAP2+: Fix compile if CONFIG_HAVE_ARM_SMCCC is not set
  arm: dts: dra76x: Fix mmc3 max-frequency
  ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
  bus: ti-sysc: Fix 1-wire reset quirk
  ARM: dts: r8a7779: Remove deprecated "renesas, rcar-sata" compatible value
  soc: imx-scu: Align imx sc msg structs to 4
  firmware: imx: Align imx_sc_msg_req_cpu_start to 4
  firmware: imx: scu-pd: Align imx sc msg structs to 4
  firmware: imx: misc: Align imx sc msg structs to 4
  firmware: imx: scu: Ensure sequential TX
  ARM: dts: imx7-colibri: Fix frequency for sd/mmc
  ...

4 years agoMerge tag 'edac_urgent-2020-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 9 Mar 2020 00:33:52 +0000 (17:33 -0700)]
Merge tag 'edac_urgent-2020-03-08' of git://git./linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:
 "Error reporting fix for synopsys_edac: do not overwrite partial
  decoded error message (Sherry Sun)"

* tag 'edac_urgent-2020-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/synopsys: Do not print an error with back-to-back snprintf() calls

4 years agoMerge tag 'char-misc-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 8 Mar 2020 15:49:44 +0000 (10:49 -0500)]
Merge tag 'char-misc-5.6-rc5' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are four small char/misc driver fixes for reported issues for
  5.6-rc5.

  These fixes are:

   - binder fix for a potential use-after-free problem found (took two
     tries to get it right)

   - interconnect core fix

   - altera-stapl driver fix

  All four of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  binder: prevent UAF for binderfs devices II
  interconnect: Handle memory allocation errors
  altera-stapl: altera_get_note: prevent write beyond end of 'key'
  binder: prevent UAF for binderfs devices

4 years agoMerge tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 8 Mar 2020 15:39:40 +0000 (10:39 -0500)]
Merge tag 'driver-core-5.6-rc5' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core and debugfs fixes from Greg KH:
 "Here are four small driver core / debugfs patches for 5.6-rc3:

   - debugfs api cleanup now that all debugfs_create_regset32() callers
     have been fixed up. This was waiting until after the -rc1 merge as
     these fixes came in through different trees

   - driver core sync state fixes based on reports of minor issues found
     in the feature

  All of these have been in linux-next with no reported issues"

* tag 'driver-core-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: Skip unnecessary work when device doesn't have sync_state()
  driver core: Add dev_has_sync_state()
  driver core: Call sync_state() even if supplier has no consumers
  debugfs: remove return value of debugfs_create_regset32()

4 years agoMerge tag 'tty-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 8 Mar 2020 15:35:04 +0000 (10:35 -0500)]
Merge tag 'tty-5.6-rc5' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small tty/serial fixes for 5.6-rc5

  Just some small serial driver fixes, and a vt core fixup, full details
  are:

   - vt fixes for issues found by syzbot

   - serdev fix for Apple boxes

   - fsl_lpuart serial driver fixes

   - MAINTAINER update for incorrect serial files

   - new device ids for 8250_exar driver

   - mvebu-uart fix

  All of these have been in linux-next with no reported issues"

* tag 'tty-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: fsl_lpuart: free IDs allocated by IDA
  Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"
  serdev: Fix detection of UART devices on Apple machines.
  MAINTAINERS: Add missed files related to Synopsys DesignWare UART
  serial: 8250_exar: add support for ACCES cards
  tty:serial:mvebu-uart:fix a wrong return
  vt: selection, push sel_lock up
  vt: selection, push console lock down

4 years agoMerge tag 'usb-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 8 Mar 2020 15:32:23 +0000 (10:32 -0500)]
Merge tag 'usb-5.6-rc5' of git://git./linux/kernel/git/gregkh/usb

Pull USB/PHY fixes from Greg KH:
 "Here are some small USB and PHY driver fixes for reported issues for
  5.6-rc5.

  Included in here are:

   - phy driver fixes

   - new USB quirks

   - USB cdns3 gadget driver fixes

   - USB hub core fixes

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: dwc3: gadget: Update chain bit correctly when using sg list
  usb: core: port: do error out if usb_autopm_get_interface() fails
  usb: core: hub: do error out if usb_autopm_get_interface() fails
  usb: core: hub: fix unhandled return by employing a void function
  usb: storage: Add quirk for Samsung Fit flash
  usb: quirks: add NO_LPM quirk for Logitech Screen Share
  usb: usb251xb: fix regulator probe and error handling
  phy: allwinner: Fix GENMASK misuse
  usb: cdns3: gadget: toggle cycle bit before reset endpoint
  usb: cdns3: gadget: link trb should point to next request
  phy: mapphone-mdm6600: Fix timeouts by adding wake-up handling
  phy: brcm-sata: Correct MDIO operations for 40nm platforms
  phy: ti: gmii-sel: do not fail in case of gmii
  phy: ti: gmii-sel: fix set of copy-paste errors
  phy: core: Fix phy_get() to not return error on link creation failure
  phy: mapphone-mdm6600: Fix write timeouts with shorter GPIO toggle interval

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Sun, 8 Mar 2020 01:52:55 +0000 (19:52 -0600)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Nothing particularly exciting, some small ODP regressions from the mmu
  notifier rework, another bunch of syzkaller fixes, and a bug fix for a
  botched syzkaller fix in the first rc pull request.

   - Fix busted syzkaller fix in 'get_new_pps' - this turned out to
     crash on certain HW configurations

   - Bug fixes for various missed things in error unwinds

   - Add a missing rcu_read_lock annotation in hfi/qib

   - Fix two ODP related regressions from the recent mmu notifier
     changes

   - Several more syzkaller bugs in siw, RDMA netlink, verbs and iwcm

   - Revert an old patch in CMA as it is now shown to not be allocating
     port numbers properly"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/iwcm: Fix iwcm work deallocation
  RDMA/siw: Fix failure handling during device creation
  RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missing
  RDMA/odp: Ensure the mm is still alive before creating an implicit child
  RDMA/core: Fix protection fault in ib_mr_pool_destroy
  IB/mlx5: Fix implicit ODP race
  IB/hfi1, qib: Ensure RCU is locked when accessing list
  RDMA/core: Fix pkey and port assignment in get_new_pps
  RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()
  RDMA/rw: Fix error flow during RDMA context initialization
  RDMA/core: Fix use of logical OR in get_new_pps
  Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"

4 years agoMerge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 7 Mar 2020 20:20:29 +0000 (14:20 -0600)]
Merge tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Here are a few io_uring fixes that should go into this release. This
  contains:

   - Removal of (now) unused io_wq_flush() and associated flag (Pavel)

   - Fix cancelation lockup with linked timeouts (Pavel)

   - Fix for potential use-after-free when freeing percpu ref for fixed
     file sets

   - io-wq cancelation fixups (Pavel)"

* tag 'io_uring-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
  io_uring: fix lockup with timeouts
  io_uring: free fixed_file_data after RCU grace period
  io-wq: remove io_wq_flush and IO_WQ_WORK_INTERNAL
  io-wq: fix IO_WQ_WORK_NO_CANCEL cancellation

4 years agoMerge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 7 Mar 2020 20:14:38 +0000 (14:14 -0600)]
Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Here are a few fixes that should go into this release. This contains:

   - Revert of a bad bcache patch from this merge window

   - Removed unused function (Daniel)

   - Fixup for the blktrace fix from Jan from this release (Cengiz)

   - Fix of deeper level bfqq overwrite in BFQ (Carlo)"

* tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
  block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
  blktrace: fix dereference after null check
  Revert "bcache: ignore pending signals when creating gc and allocator thread"
  block: Remove used kblockd_schedule_work_on()

4 years agoMerge tag 'media/v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 7 Mar 2020 18:00:13 +0000 (12:00 -0600)]
Merge tag 'media/v5.6-2' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - a fix for the media controller links in both hantro driver and in
   v4l2-mem2mem core

 - some fixes for the pulse8-cec driver

 - vicodec: handle alpha channel for RGB32 formats, as it may be used

 - mc-entity.c: fix handling of pad flags

* tag 'media/v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: hantro: Fix broken media controller links
  media: mc-entity.c: use & to check pad flags, not ==
  media: v4l2-mem2mem.c: fix broken links
  media: vicodec: process all 4 components for RGB32 formats
  media: pulse8-cec: close serio in disconnect, not adap_free
  media: pulse8-cec: INIT_DELAYED_WORK was called too late

4 years agoio_uring: fix lockup with timeouts
Pavel Begunkov [Fri, 6 Mar 2020 22:15:22 +0000 (01:15 +0300)]
io_uring: fix lockup with timeouts

There is a recipe to deadlock the kernel: submit a timeout sqe with a
linked_timeout (e.g.  test_single_link_timeout_ception() from liburing),
and SIGKILL the process.

Then, io_kill_timeouts() takes @ctx->completion_lock, but the timeout
isn't flagged with REQ_F_COMP_LOCKED, and will try to double grab it
during io_put_free() to cancel the linked timeout. Probably, the same
can happen with another io_kill_timeout() call site, that is
io_commit_cqring().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge tag 's390-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Sat, 7 Mar 2020 14:12:47 +0000 (08:12 -0600)]
Merge tag 's390-5.6-5' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix panic in gup_fast on large pud by providing an implementation of
   pud_write. This has been overlooked during migration to common gup
   code.

 - Fix unexpected write combining on PCI stores.

* tag 's390-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: Fix unexpected write combine on resource
  s390/mm: fix panic in gup_fast on large pud

4 years agoMerge tag 'powerpc-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 7 Mar 2020 14:10:34 +0000 (08:10 -0600)]
Merge tag 'powerpc-5.6-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.6:

   - One fix for a recent regression to our breakpoint/watchpoint code.

   - Another fix for our KUAP support, this time a missing annotation in
     a rarely used path in signal handling.

   - A fix for our handling of a CPU feature that effects the PMU, when
     booting guests in some configurations.

   - A minor fix to our linker script to explicitly include the .BTF
     section.

  Thanks to: Christophe Leroy, Desnes A. Nunes do Rosario, Leonardo
  Bras, Naveen N. Rao, Ravi Bangoria, Stefan Berger"

* tag 'powerpc-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()
  powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
  powerpc: Include .BTF section
  powerpc/watchpoint: Don't call dar_within_range() for Book3S

4 years agoMerge tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 7 Mar 2020 14:04:54 +0000 (08:04 -0600)]
Merge tag 'for-linus-5.6b-rc5-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Four fixes and a small cleanup patch:

   - two fixes by Dongli Zhang fixing races in the xenbus driver

   - two fixes by me fixing issues introduced in 5.6

   - a small cleanup by Gustavo Silva replacing a zero-length array with
     a flexible-array"

* tag 'for-linus-5.6b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/blkfront: fix ring info addressing
  xen/xenbus: fix locking
  xenbus: req->err should be updated before req->state
  xenbus: req->body should be updated before req->state
  xen: Replace zero-length array with flexible-array member

4 years agoMerge tag 'for-linus-2020-03-07' of gitolite.kernel.org:pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 7 Mar 2020 14:01:43 +0000 (08:01 -0600)]
Merge tag 'for-linus-2020-03-07' of gitolite.pub/scm/linux/kernel/git/brauner/linux

Pull thread fixes from Christian Brauner:
 "Here are a few hopefully uncontroversial fixes:

   - Use RCU_INIT_POINTER() when initializing rcu protected members in
     task_struct to fix sparse warnings.

   - Add pidfd_fdinfo_test binary to .gitignore file"

* tag 'for-linus-2020-03-07' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  selftests: pidfd: Add pidfd_fdinfo_test in .gitignore
  exit: Fix Sparse errors and warnings
  fork: Use RCU_INIT_POINTER() instead of rcu_access_pointer()

4 years agoMerge tag 'sound-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 7 Mar 2020 13:59:30 +0000 (07:59 -0600)]
Merge tag 'sound-5.6-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "The regular "bump-in-the-middle" updates, containing mostly ASoC-
  related fixes at this time. All changes are reasonably small.

  A few entries are for ASoC and ALSA core parts (DAPM, PCM, topology)
  for followups of the recent changes and potential buffer overflow by
  snprintf(), while the rest are (both new and old) device-specific
  fixes for Intel, meson, tas2562, rt1015, as well as the usual HD-audio
  quirks"

* tag 'sound-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: sgio2audio: Remove usage of dropped hw_params/hw_free functions
  ALSA: hda/realtek - Enable the headset of ASUS B9450FA with ALC294
  ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master
  ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1
  ALSA: hda/realtek - Add Headset Mic supported
  ASoC: wm8741: Fix typo in Kconfig prompt
  ASoC: stm32: sai: manage rebind issue
  ASoC: SOF: Fix snd_sof_ipc_stream_posn()
  ASoC: rt1015: modify pre-divider for sysclk
  ASoC: rt1015: add operation callback function for rt1015_dai[]
  ASoC: soc-component: tidyup snd_soc_pcm_component_sync_stop()
  ASoC: dapm: Correct DAPM handling of active widgets during shutdown
  ASoC: tas2562: Fix sample rate error message
  ASoC: Intel: Skylake: Fix available clock counter incrementation
  ASoC: soc-pcm/soc-compress: don't use snd_soc_dapm_stream_stop()
  ASoC: meson: g12a: add tohdmitx reset
  ASoC: pcm512x: Fix unbalanced regulator enable call in probe error path
  ASoC: soc-core: fix for_rtd_codec_dai_rollback() macro
  ASoC: topology: Fix memleak in soc_tplg_manifest_load()
  ASoC: topology: Fix memleak in soc_tplg_link_elems_load()
  ...

4 years agoMerge tag 'asoc-fix-v5.6-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git...
Takashi Iwai [Sat, 7 Mar 2020 06:24:36 +0000 (07:24 +0100)]
Merge tag 'asoc-fix-v5.6-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.6

More fixes that have arrived since the merge window, spread out all
over.  There's a few things like the operation callback addition for
rt1015 and the meson reset addition which add small new bits of
functionality to fix non-working systems, they're all very small and for
parts of newly added functionality.

4 years agoMerge tag 'linux-kselftest-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 6 Mar 2020 23:03:37 +0000 (17:03 -0600)]
Merge tag 'linux-kselftest-5.6-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull kselftest update from Shuah Khan:
 "This consists of a cleanup patch to undo changes to global .gitignore
  that added selftests/lkdtm objects and add them to a local
  selftests/lkdtm/.gitignore.

  Summary of Linus's comments on local vs. global gitignore scope:

   - Keep local gitignore patterns in local files.

   - Put only global gitignore patterns in the top-level gitignore file.

  Local scope keeps things much better separated. It also incidentally
  means that if a directory gets renamed, the gitignore file continues
  to work unless in the case of renaming the actual files themselves
  that are named in the gitignore"

* tag 'linux-kselftest-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftest/lkdtm: Use local .gitignore

4 years agoMerge tag 'riscv-for-linus-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 6 Mar 2020 22:38:33 +0000 (16:38 -0600)]
Merge tag 'riscv-for-linus-5.6-rc5' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains a handful of fixes that I would like to target for 5.6:

   - A pair of fixes to module loading, which we hope solve the last of
     the issues with module text being loaded too sparsely for our call
     relocations.

   - A Kconfig fix that disallows selecting memory models not supported
     by NOMMU.

   - A series of Kconfig updates to ease selecting the drivers necessary
     to run on QEMU's virt platform.

   - DTS updates for SiFive's HiFive Unleashed.

   - A fix to our seccomp support that avoids mangling restartable
     syscalls"

* tag 'riscv-for-linus-5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: fix seccomp reject syscall code path
  riscv: dts: Add GPIO reboot method to HiFive Unleashed DTS file
  RISC-V: Select Goldfish RTC driver for QEMU virt machine
  RISC-V: Select SYSCON Reboot and Poweroff for QEMU virt machine
  RISC-V: Enable QEMU virt machine support in defconfigs
  RISC-V: Add kconfig option for QEMU virt machine
  riscv: Fix range looking for kernel image memblock
  riscv: Force flat memory model with no-mmu
  riscv: Change code model of module to medany to improve data accessing
  riscv: avoid the PIC offset of static percpu data in module beyond 2G limits

4 years agoparse-maintainers: Mark as executable
Jonathan Neuschäfer [Fri, 6 Mar 2020 22:13:11 +0000 (23:13 +0100)]
parse-maintainers: Mark as executable

This makes the script more convenient to run.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'devicetree-fixes-for-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 6 Mar 2020 22:11:34 +0000 (16:11 -0600)]
Merge tag 'devicetree-fixes-for-5.6-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:
 "Another batch of DT fixes. I think this should be the last of it, but
  sending pull requests seems to cause people to send more fixes.

  Summary:

   - Fixes for warnings introduced by hierarchical PSCI binding changes

   - Fixes for broken doc references due to DT schema conversions

   - Several grammar and typo fixes

   - Fix a bunch of dtc warnings in examples"

* tag 'devicetree-fixes-for-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
  dt-bindings: power: Extend nodename pattern for power-domain providers
  MAINTAINERS: update ALLWINNER CPUFREQ DRIVER entry
  dt-bindings: bus: Drop empty compatible string in example
  dt-bindings: power: Convert domain-idle-states bindings to json-schema
  dt-bindings: arm: Fix cpu compatibles in the hierarchical example for PSCI
  dt-bindings: arm: Correct links to idle states definitions
  dt-bindings: mfd: Fix typo in file name of twl-familly.txt
  dt-bindings: mfd: tps65910: Improve grammar
  dt-bindings: mfd: zii,rave-sp: Fix a typo ("onborad")
  dt-bindings: arm: fsl: fix APF6Dev compatible
  dt-bindings: Fix dtc warnings in examples
  docs: dt: fix several broken doc references
  docs: dt: fix several broken references due to renames
  MAINTAINERS: clean up PCIE DRIVER FOR CAVIUM THUNDERX

4 years agoMerge tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 6 Mar 2020 22:08:48 +0000 (16:08 -0600)]
Merge tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm

Pull vgacon fix from Daniel Vetter:
 "One vgacon input check for stable"

* tag 'drm-fixes-2020-03-06-1' of git://anongit.freedesktop.org/drm/drm:
  vgacon: Fix a UAF in vgacon_invert_region

4 years agoMerge tag 'for-5.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Fri, 6 Mar 2020 20:56:46 +0000 (14:56 -0600)]
Merge tag 'for-5.6-rc4-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One fixup for DIO when in use with the new checksums, a missed case
  where the checksum size was still assuming u32"

* tag 'for-5.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix RAID direct I/O reads with alternate csums

4 years agoMerge tag 'filelock-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton...
Linus Torvalds [Fri, 6 Mar 2020 20:55:27 +0000 (14:55 -0600)]
Merge tag 'filelock-v5.6-1' of git://git./linux/kernel/git/jlayton/linux

Pull file locking fixes from Jeff Layton:
 "Just a couple of late-breaking patches for the file locking code. The
  second patch (from yangerkun) fixes a rather nasty looking potential
  use-after-free that should go to stable.

  The other patch could technically wait for 5.7, but it's fairly
  innocuous so I figured we might as well take it"

* tag 'filelock-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: fix a potential use-after-free problem when wakeup a waiter
  fcntl: Distribute switch variables for initialization

4 years agoMerge tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Fri, 6 Mar 2020 20:50:16 +0000 (14:50 -0600)]
Merge tag 'spi-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A selection of small fixes, mostly for drivers, that have arrived
  since the merge window. None of them are earth shattering in
  themselves but all useful for affected systems"

* tag 'spi-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi_register_controller(): free bus id on error paths
  spi: bcm63xx-hsspi: Really keep pll clk enabled
  spi: atmel-quadspi: fix possible MMIO window size overrun
  spi/zynqmp: remove entry that causes a cs glitch
  spi: pxa2xx: Add CS control clock quirk
  spi: spidev: Fix CS polarity if GPIO descriptors are used
  spi: qup: call spi_qup_pm_resume_runtime before suspending
  spi: spi-omap2-mcspi: Support probe deferral for DMA channels
  spi: spi-omap2-mcspi: Handle DMA size restriction on AM65x

4 years agoMerge tag 'regulator-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 6 Mar 2020 20:48:30 +0000 (14:48 -0600)]
Merge tag 'regulator-fix-v5.6-rc4' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of small fixes, one for a minor issue in the stm32-vrefbuf
  driver and a documentation fix in the Qualcomm code"

* tag 'regulator-fix-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: stm32-vrefbuf: fix a possible overshoot when re-enabling
  regulator: qcom_spmi: Fix docs for PM8004

4 years agoMerge tag 'hwmon-for-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groec...
Linus Torvalds [Fri, 6 Mar 2020 20:47:06 +0000 (14:47 -0600)]
Merge tag 'hwmon-for-v5.6-rc5' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fix an error return in the adt7462 driver, bad voltage limits reported
  by the xdpe12284 driver, and a broken documentation reference in the
  adm1177 driver documentation"

* tag 'hwmon-for-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (adt7462) Fix an error return in ADT7462_REG_VOLT()
  hwmon: (pmbus/xdpe12284) Add callback for vout limits conversion
  docs: adm1177: fix a broken reference

4 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 6 Mar 2020 20:35:47 +0000 (14:35 -0600)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Here are another three arm64 fixes for 5.6, all pretty minor. Main
  thing is fixing a silly bug in the fsl_imx8_ddr PMU driver where we
  would zero the counters when disabling them.

   - Fix misreporting of ASID limit when KPTI is enabled

   - Fix busted NULL pointer checks for GICC structure in ACPI PMU code

   - Avoid nobbling the "fsl_imx8_ddr" PMU counters when disabling them"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: context: Fix ASID limit in boot messages
  drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
  drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition

4 years agovgacon: Fix a UAF in vgacon_invert_region
Zhang Xiaoxu [Wed, 4 Mar 2020 02:24:29 +0000 (10:24 +0800)]
vgacon: Fix a UAF in vgacon_invert_region

When syzkaller tests, there is a UAF:
  BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr
    ffff880000100000
  Read of size 2 by task syz-executor.1/16489
  page:ffffea0000004000 count:0 mapcount:-127 mapping:          (null)
  index:0x0
  page flags: 0xfffff00000000()
  page dumped because: kasan: bad access detected
  CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
  rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  Call Trace:
    [<ffffffffb119f309>] dump_stack+0x1e/0x20
    [<ffffffffb04af957>] kasan_report+0x577/0x950
    [<ffffffffb04ae652>] __asan_load2+0x62/0x80
    [<ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110
    [<ffffffffb0a39d95>] invert_screen+0xe5/0x470
    [<ffffffffb0a21dcb>] set_selection+0x44b/0x12f0
    [<ffffffffb0a3bfae>] tioclinux+0xee/0x490
    [<ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670
    [<ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10
    [<ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40
    [<ffffffffb052e2f2>] SyS_ioctl+0x132/0x170
    [<ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27
    Memory state around the buggy address:
     ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00
     ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00
     00 00 00
    >ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff
     ff ff ff

It can be reproduce in the linux mainline by the program:
  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <fcntl.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <sys/ioctl.h>
  #include <linux/vt.h>

  struct tiocl_selection {
    unsigned short xs;      /* X start */
    unsigned short ys;      /* Y start */
    unsigned short xe;      /* X end */
    unsigned short ye;      /* Y end */
    unsigned short sel_mode; /* selection mode */
  };

  #define TIOCL_SETSEL    2
  struct tiocl {
    unsigned char type;
    unsigned char pad;
    struct tiocl_selection sel;
  };

  int main()
  {
    int fd = 0;
    const char *dev = "/dev/char/4:1";

    struct vt_consize v = {0};
    struct tiocl tioc = {0};

    fd = open(dev, O_RDWR, 0);

    v.v_rows = 3346;
    ioctl(fd, VT_RESIZEX, &v);

    tioc.type = TIOCL_SETSEL;
    ioctl(fd, TIOCLINUX, &tioc);

    return 0;
  }

When resize the screen, update the 'vc->vc_size_row' to the new_row_size,
but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base'
for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe
smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc
the offset, it maybe larger than the vga_vram_size in vgacon driver, then
bad access.
Also, if set an larger screenbuf firstly, then set an more larger
screenbuf, when copy old_origin to new_origin, a bad access may happen.

So, If the screen size larger than vga_vram, resize screen should be
failed. This alse fix CVE-2020-8649 and CVE-2020-8647.

Linus pointed out that overflow checking seems absent. We're saved by
the existing bounds checks in vc_do_resize() with rather strict
limits:

if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW)
return -EINVAL;

Fixes: 0aec4867dca14 ("[PATCH] SVGATextMode fix")
Reference: CVE-2020-8647 and CVE-2020-8649
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
[danvet: augment commit message to point out overflow safety]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxiaoxu5@huawei.com
4 years agodt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states
Ulf Hansson [Tue, 3 Mar 2020 15:07:47 +0000 (16:07 +0100)]
dt-bindings: arm: Fixup the DT bindings for hierarchical PSCI states

The hierarchical topology with power-domain should be described through
child nodes, rather than as currently described in the PSCI root node. Fix
this by adding a patternProperties with a corresponding reference to the
power-domain DT binding.

Additionally, update the example to conform to the new pattern, but also to
the adjusted domain-idle-state DT binding.

Fixes: a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: Add missing allOf, tweak power-domain node name]
Signed-off-by: Rob Herring <robh@kernel.org>
4 years agodt-bindings: power: Extend nodename pattern for power-domain providers
Ulf Hansson [Tue, 3 Mar 2020 15:07:46 +0000 (16:07 +0100)]
dt-bindings: power: Extend nodename pattern for power-domain providers

The existing binding requires the nodename to have a '@', which is a bit
limiting for the wider use case. Therefore, let's extend the pattern to
allow either '@' or '-'.

Fixes: a3f048b5424e ("dt: psci: Update DT bindings to support hierarchical PSCI states")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[robh: drop example change]
Signed-off-by: Rob Herring <robh@kernel.org>
4 years agoio_uring: free fixed_file_data after RCU grace period
Jens Axboe [Wed, 4 Mar 2020 14:25:50 +0000 (07:25 -0700)]
io_uring: free fixed_file_data after RCU grace period

The percpu refcount protects this structure, and we can have an atomic
switch in progress when exiting. This makes it unsafe to just free the
struct normally, and can trigger the following KASAN warning:

BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
Read of size 1 at addr ffff888181a19a30 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc4+ #5747
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 dump_stack+0x76/0xa0
 print_address_description.constprop.0+0x3b/0x60
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 __kasan_report.cold+0x1a/0x3d
 ? percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 percpu_ref_switch_to_atomic_rcu+0xfa/0x1b0
 rcu_core+0x370/0x830
 ? percpu_ref_exit+0x50/0x50
 ? rcu_note_context_switch+0x7b0/0x7b0
 ? run_rebalance_domains+0x11d/0x140
 __do_softirq+0x10a/0x3e9
 irq_exit+0xd5/0xe0
 smp_apic_timer_interrupt+0x86/0x200
 apic_timer_interrupt+0xf/0x20
 </IRQ>
RIP: 0010:default_idle+0x26/0x1f0

Fix this by punting the final exit and free of the struct to RCU, then
we know that it's safe to do so. Jann suggested the approach of using a
double rcu callback to achieve this. It's important that we do a nested
call_rcu() callback, as otherwise the free could be ordered before the
atomic switch, even if the latter was already queued.

Reported-by: syzbot+e017e49c39ab484ac87a@syzkaller.appspotmail.com
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agolocks: fix a potential use-after-free problem when wakeup a waiter
yangerkun [Wed, 4 Mar 2020 07:25:56 +0000 (15:25 +0800)]
locks: fix a potential use-after-free problem when wakeup a waiter

'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the
logic to check waiter->fl_blocker without blocked_lock_lock. And it will
trigger a UAF when we try to wakeup some waiter:

Thread 1 has create a write flock a on file, and now thread 2 try to
unlock and delete flock a, thread 3 try to add flock b on the same file.

Thread2                         Thread3
                                flock syscall(create flock b)
                        ...flock_lock_inode_wait
    flock_lock_inode(will insert
    our fl_blocked_member list
    to flock a's fl_blocked_requests)
   sleep
flock syscall(unlock)
...flock_lock_inode_wait
    locks_delete_lock_ctx
    ...__locks_wake_up_blocks
        __locks_delete_blocks(
b->fl_blocker = NULL)
...
                                   break by a signal
   locks_delete_block
    b->fl_blocker == NULL &&
    list_empty(&b->fl_blocked_requests)
                            success, return directly
 locks_free_lock b
wake_up(&b->fl_waiter)
trigger UAF

Fix it by remove this logic, and this patch may also fix CVE-2019-19769.

Cc: stable@vger.kernel.org
Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
4 years agoblock, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
Carlo Nonato [Fri, 6 Mar 2020 12:27:31 +0000 (13:27 +0100)]
block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()

The bfq_find_set_group() function takes as input a blkcg (which represents
a cgroup) and retrieves the corresponding bfq_group, then it updates the
bfq internal group hierarchy (see comments inside the function for why
this is needed) and finally it returns the bfq_group.
In the hierarchy update cycle, the pointer holding the correct bfq_group
that has to be returned is mistakenly used to traverse the hierarchy
bottom to top, meaning that in each iteration it gets overwritten with the
parent of the current group. Since the update cycle stops at root's
children (depth = 2), the overwrite becomes a problem only if the blkcg
describes a cgroup at a hierarchy level deeper than that (depth > 2). In
this case the root's child that happens to be also an ancestor of the
correct bfq_group is returned. The main consequence is that processes
contained in a cgroup at depth greater than 2 are wrongly placed in the
group described above by BFQ.

This commits fixes this problem by using a different bfq_group pointer in
the update cycle in order to avoid the overwrite of the variable holding
the original group reference.

Reported-by: Kwon Je Oh <kwonje.oh2@gmail.com>
Signed-off-by: Carlo Nonato <carlo.nonato95@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 6 Mar 2020 13:18:36 +0000 (07:18 -0600)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "7 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
  mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
  mm/z3fold.c: do not include rwlock.h directly
  fat: fix uninit-memory access for partial initialized inode
  mm: avoid data corruption on CoW fault into PFN-mapped VMA
  mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
  mm, numa: fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa

4 years agotty: serial: fsl_lpuart: free IDs allocated by IDA
Michael Walle [Tue, 3 Mar 2020 17:42:59 +0000 (18:42 +0100)]
tty: serial: fsl_lpuart: free IDs allocated by IDA

Since commit 3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node
dependence") the port line number can also be allocated by IDA, but in
case of an error the ID will no be removed again. More importantly, any
ID will be freed in remove(), even if it wasn't allocated but instead
fetched by of_alias_get_id(). If it was not allocated by IDA there will
be a warning:
  WARN(1, "ida_free called for id=%d which is not allocated.\n", id);

Move the ID allocation more to the end of the probe() so that we still
can use plain return in the first error cases.

Fixes: 3bc3206e1c0f ("serial: fsl_lpuart: Remove the alias node dependence")
Signed-off-by: Michael Walle <michael@walle.cc>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200303174306.6015-3-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoRevert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"
Michael Walle [Tue, 3 Mar 2020 17:42:58 +0000 (18:42 +0100)]
Revert "tty: serial: fsl_lpuart: drop EARLYCON_DECLARE"

This reverts commit a659652f6169240a5818cb244b280c5a362ef5a4.

This broke the earlycon on LS1021A processors because the order of the
earlycon_setup() functions were changed. Before the commit the normal
lpuart32_early_console_setup() was called. After the commit the
lpuart32_imx_early_console_setup() is called instead.

Fixes: a659652f6169 ("tty: serial: fsl_lpuart: drop EARLYCON_DECLARE")
Signed-off-by: Michael Walle <michael@walle.cc>
Link: https://lore.kernel.org/r/20200303174306.6015-2-michael@walle.cc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoserdev: Fix detection of UART devices on Apple machines.
Ronald Tschalär [Tue, 11 Feb 2020 19:47:23 +0000 (11:47 -0800)]
serdev: Fix detection of UART devices on Apple machines.

On Apple devices the _CRS method returns an empty resource template, and
the resource settings are instead provided by the _DSM method. But
commit 33364d63c75d6182fa369cea80315cf1bb0ee38e (serdev: Add ACPI
devices by ResourceSource field) changed the search for serdev devices
to require valid, non-empty resource template, thereby breaking Apple
devices and causing bluetooth devices to not be found.

This expands the check so that if we don't find a valid template, and
we're on an Apple machine, then just check for the device being an
immediate child of the controller and having a "baud" property.

Cc: <stable@vger.kernel.org> # 5.5
Fixes: 33364d63c75d ("serdev: Add ACPI devices by ResourceSource field")
Signed-off-by: Ronald Tschalär <ronald@innovation.ch>
Link: https://lore.kernel.org/r/20200211194723.486217-1-ronald@innovation.ch
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 years agoarch/Kconfig: update HAVE_RELIABLE_STACKTRACE description
Miroslav Benes [Fri, 6 Mar 2020 06:28:45 +0000 (22:28 -0800)]
arch/Kconfig: update HAVE_RELIABLE_STACKTRACE description

save_stack_trace_tsk_reliable() is not the only function providing the
reliable stack traces anymore.  Architecture might define ARCH_STACKWALK
which provides a newer stack walking interface and has
arch_stack_walk_reliable() function.  Update the description accordingly.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: http://lkml.kernel.org/r/20200120154042.9934-1-mbenes@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled
Vlastimil Babka [Fri, 6 Mar 2020 06:28:42 +0000 (22:28 -0800)]
mm, hotplug: fix page online with DEBUG_PAGEALLOC compiled but not enabled

Commit cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
fixed memory hotplug with debug_pagealloc enabled, where onlining a page
goes through page freeing, which removes the direct mapping.  Some arches
don't like when the page is not mapped in the first place, so
generic_online_page() maps it first.  This is somewhat wasteful, but
better than special casing page freeing fast paths.

The commit however missed that DEBUG_PAGEALLOC configured doesn't mean
it's actually enabled.  One has to test debug_pagealloc_enabled() since
031bc5743f15 ("mm/debug-pagealloc: make debug-pagealloc boottime
configurable"), or alternatively debug_pagealloc_enabled_static() since
8e57f8acbbd1 ("mm, debug_pagealloc: don't rely on static keys too early"),
but this is not done.

As a result, a s390 kernel with DEBUG_PAGEALLOC configured but not enabled
will crash:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000483
Fault in home space mode while using kernel ASCE.
AS:0000001ece13400b R2:000003fff7fd000b R3:000003fff7fcc007 S:000003fff7fd7000 P:000000000000013d
Oops: 0004 ilc:2 [#1] SMP
CPU: 1 PID: 26015 Comm: chmem Kdump: loaded Tainted: GX 5.3.18-5-default #1 SLE15-SP2 (unreleased)
Krnl PSW : 0704e00180000000 0000001ecd281b9e (__kernel_map_pages+0x166/0x188)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000800 0000400b00000000 0000000000000100
0000000000000001 0000000000000000 0000000000000002 0000000000000100
0000001ece139230 0000001ecdd98d40 0000400b00000100 0000000000000000
000003ffa17e4000 001fffe0114f7d08 0000001ecd4d93ea 001fffe0114f7b20
Krnl Code: 0000001ecd281b8eec17ffff00d8 ahik %r1,%r7,-1
0000001ecd281b94ec111dbc0355 risbg %r1,%r1,29,188,3
>0000001ecd281b9e94fb5006 ni 6(%r5),251
0000001ecd281ba241505008 la %r5,8(%r5)
0000001ecd281ba6ec51fffc6064 cgrj %r5,%r1,6,1ecd281b9e
0000001ecd281bac: 1a07 ar %r0,%r7
0000001ecd281baeec03ff584076 crj %r0,%r3,4,1ecd281a5e
Call Trace:
[<0000001ecd281b9e>] __kernel_map_pages+0x166/0x188
[<0000001ecd4d9516>] online_pages_range+0xf6/0x128
[<0000001ecd2a8186>] walk_system_ram_range+0x7e/0xd8
[<0000001ecda28aae>] online_pages+0x2fe/0x3f0
[<0000001ecd7d02a6>] memory_subsys_online+0x8e/0xc0
[<0000001ecd7add42>] device_online+0x5a/0xc8
[<0000001ecd7d0430>] state_store+0x88/0x118
[<0000001ecd5b9f62>] kernfs_fop_write+0xc2/0x200
[<0000001ecd5064b6>] vfs_write+0x176/0x1e0
[<0000001ecd50676a>] ksys_write+0xa2/0x100
[<0000001ecda315d4>] system_call+0xd8/0x2c8

Fix this by checking debug_pagealloc_enabled_static() before calling
kernel_map_pages(). Backports for kernel before 5.5 should use
debug_pagealloc_enabled() instead. Also add comments.

Fixes: cd02cf1aceea ("mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC")
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Link: http://lkml.kernel.org/r/20200224094651.18257-1-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/z3fold.c: do not include rwlock.h directly
Sebastian Andrzej Siewior [Fri, 6 Mar 2020 06:28:39 +0000 (22:28 -0800)]
mm/z3fold.c: do not include rwlock.h directly

rwlock.h should not be included directly. Instead linux/splinlock.h
should be included. One thing it does is to break the RT build.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200224133631.1510569-1-bigeasy@linutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofat: fix uninit-memory access for partial initialized inode
OGAWA Hirofumi [Fri, 6 Mar 2020 06:28:36 +0000 (22:28 -0800)]
fat: fix uninit-memory access for partial initialized inode

When get an error in the middle of reading an inode, some fields in the
inode might be still not initialized.  And then the evict_inode path may
access those fields via iput().

To fix, this makes sure that inode fields are initialized.

Reported-by: syzbot+9d82b8de2992579da5d0@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/871rqnreqx.fsf@mail.parknet.co.jp
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: avoid data corruption on CoW fault into PFN-mapped VMA
Kirill A. Shutemov [Fri, 6 Mar 2020 06:28:32 +0000 (22:28 -0800)]
mm: avoid data corruption on CoW fault into PFN-mapped VMA

Jeff Moyer has reported that one of xfstests triggers a warning when run
on DAX-enabled filesystem:

WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
...
wp_page_copy+0x98c/0xd50 (unreliable)
do_wp_page+0xd8/0xad0
__handle_mm_fault+0x748/0x1b90
handle_mm_fault+0x120/0x1f0
__do_page_fault+0x240/0xd70
do_page_fault+0x38/0xd0
handle_page_fault+0x10/0x30

The warning happens on failed __copy_from_user_inatomic() which tries to
copy data into a CoW page.

This happens because of race between MADV_DONTNEED and CoW page fault:

CPU0 CPU1
 handle_mm_fault()
   do_wp_page()
     wp_page_copy()
       do_wp_page()
madvise(MADV_DONTNEED)
  zap_page_range()
    zap_pte_range()
      ptep_get_and_clear_full()
      <TLB flush>
 __copy_from_user_inatomic()
 sees empty PTE and fails
 WARN_ON_ONCE(1)
 clear_page()

The solution is to re-try __copy_from_user_inatomic() under PTL after
checking that PTE is matches the orig_pte.

The second copy attempt can still fail, like due to non-readable PTE, but
there's nothing reasonable we can do about, except clearing the CoW page.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Justin He <Justin.He@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: http://lkml.kernel.org/r/20200218154151.13349-1-kirill.shutemov@linux.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: fix possible PMD dirty bit lost in set_pmd_migration_entry()
Huang Ying [Fri, 6 Mar 2020 06:28:29 +0000 (22:28 -0800)]
mm: fix possible PMD dirty bit lost in set_pmd_migration_entry()

In set_pmd_migration_entry(), pmdp_invalidate() is used to change PMD
atomically.  But the PMD is read before that with an ordinary memory
reading.  If the THP (transparent huge page) is written between the PMD
reading and pmdp_invalidate(), the PMD dirty bit may be lost, and cause
data corruption.  The race window is quite small, but still possible in
theory, so need to be fixed.

The race is fixed via using the return value of pmdp_invalidate() to get
the original content of PMD, which is a read/modify/write atomic
operation.  So no THP writing can occur in between.

The race has been introduced when the THP migration support is added in
the commit 616b8371539a ("mm: thp: enable thp migration in generic path").
But this fix depends on the commit d52605d7cb30 ("mm: do not lose dirty
and accessed bits in pmdp_invalidate()").  So it's easy to be backported
after v4.16.  But the race window is really small, so it may be fine not
to backport the fix at all.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/20200220075220.2327056-1-ying.huang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>