Liu Bo [Fri, 1 Dec 2017 00:26:39 +0000 (17:26 -0700)]
Btrfs: use struct completion in scrub_submit_raid56_bio_wait
This changes to use struct completion directly and removes 'struct
scrub_bio_ret' along with the code using it.
This struct is used to get the return value from bio, but the caller can
access bio to get the return value directly and is holding a reference
on it so it won't go away underneath us and can be removed safely.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 5 Dec 2017 01:09:42 +0000 (18:09 -0700)]
Btrfs: remove unused variable wait in lock_stripe_add
The defined wait is not used anywhere.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Timofey Titovets [Mon, 23 Oct 2017 22:29:48 +0000 (01:29 +0300)]
Btrfs: compress_file_range() change page dirty status once
We need to call extent_range_clear_dirty_for_io()
on compression range to prevent application from changing
page content, while pages compressing.
extent_range_clear_dirty_for_io() runs on each loop iteration,
"(end - start)" can be much (up to 1024 times) bigger
then compression range (BTRFS_MAX_UNCOMPRESSED).
The start pointer is advanced each time we manage to compress part of
the range. The end pointer does not change so we could redirty the
remaining parts repeatedly.
Fix that behaviour by call extent_range_clear_dirty_for_io()
only once, the first time it happens.
This is the safest but probably not the best behaviour. Previous
iterations of the patch tried to redirty only the range that we were not
able to compress. This has been refused by David for safety reasons, the
writeout callchain is complex and there could be some path that relies
on redirtying the entire unwritten range.
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog, the history and safety concerns, add comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
Timofey Titovets [Sun, 3 Dec 2017 21:30:33 +0000 (00:30 +0300)]
Btrfs: compression heuristic: replace heap sort with radix sort
Slowest part of heuristic for now is kernel heap sort()
It's can take up to 55% of runtime on sorting bucket items.
As sorting will always call on most data sets to get correctly
byte_core_set_size, the only way to speed up heuristic, is to
speed up sort on bucket.
Add a general radix_sort function.
Radix sort require 2 buffers, one full size of input array
and one for store counters (jump addresses).
That increase usage per heuristic workspace +1KiB
8KiB + 1KiB -> 8KiB + 2KiB
That is LSD Radix, i use 4 bit as a base for calculating,
to make counters array acceptable small (16 elements * 8 byte).
That Radix sort implementation have several points to adjust,
I added him to make radix sort general usable in kernel,
like heap sort, if needed.
Performance tested in userspace copy of heuristic code,
throughput:
- average <-> random data: ~3500 MiB/s - heap sort
- average <-> random data: ~6000 MiB/s - radix sort
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
[ coding style fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 4 Dec 2017 04:54:56 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_FLUSH_SENT
Currently device state is being managed by each individual int
variable such as struct btrfs_device::is_tgtdev_for_dev_replace.
Instead of that declare btrfs_device::dev_state
BTRFS_DEV_STATE_FLUSH_SENT and use the bit operations.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 4 Dec 2017 04:54:55 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT
Currently device state is being managed by each individual int
variable such as struct btrfs_device::is_tgtdev_for_dev_replace.
Instead of that declare btrfs_device::dev_state
BTRFS_DEV_STATE_MISSING and use the bit operations.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 4 Dec 2017 04:54:54 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_MISSING
Currently device state is being managed by each individual int
variable such as struct btrfs_device::missing. Instead of that
declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use
the bit operations.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by : Nikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 4 Dec 2017 04:54:53 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA
Currently device state is being managed by each individual int
variable such as struct btrfs_device::in_fs_metadata. Instead of
that declare device state BTRFS_DEV_STATE_IN_FS_METADATA and use
the bit operations.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 4 Dec 2017 04:54:52 +0000 (12:54 +0800)]
btrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE
Currently device state is being managed by each individual int
variable such as struct btrfs_device::writeable. Instead of that
declare device state BTRFS_DEV_STATE_WRITEABLE and use the
bit operations.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Tue, 28 Nov 2017 02:43:10 +0000 (10:43 +0800)]
btrfs: add helper for device path or missing
This patch creates a helper function to get either the rcu device path
or missing.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ rename to btrfs_dev_name, switch to if/else ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Wed, 29 Nov 2017 10:53:43 +0000 (18:53 +0800)]
btrfs: drop btrfs_device::can_discard to query directly
We can query the bdev directly when needed at btrfs_discard_extent()
so drop btrfs_device::can_discard.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Suggested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Colin Ian King [Thu, 30 Nov 2017 12:14:47 +0000 (12:14 +0000)]
btrfs: make function update_share_count static
The function update_share_count is local to the source and does
not need to be in global scope, so make it static.
Cleans up sparse warning:
fs/btrfs/backref.c:219:6: warning: symbol 'update_share_count' was not
declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Thu, 23 Nov 2017 08:51:43 +0000 (10:51 +0200)]
btrfs: Remove redundant FLAG_VACANCY
Commit
9036c10208e1 ("Btrfs: update hole handling v2") added the
FLAG_VACANCY to denote holes, however there was already a consistent way
of flagging extents which represent hole - ->block_start =
EXTENT_MAP_HOLE. And also the only place where this flag is checked is
in the fiemap code, but the block_start value is also checked and every
other place in the filesystem detects holes by using block_start
value's. So remove the extra flag. This survived a full xfstest run.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 17 Nov 2017 07:14:19 +0000 (15:14 +0800)]
btrfs: extent-tree: Make btrfs_inode_rsv_refill function static
This function is no longer used outside of extent-tree.c.
Make it static.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 15 Nov 2017 17:27:39 +0000 (18:27 +0100)]
btrfs: move some zstd work data from stack to workspace
* ZSTD_inBuffer in_buf
* ZSTD_outBuffer out_buf
are used in all functions to pass the compression parameters and the
local variables consume some space. We can move them to the workspace
and reduce the stack consumption:
zstd.c:zstd_decompress -24 (136 -> 112)
zstd.c:zstd_decompress_bio -24 (144 -> 120)
zstd.c:zstd_compress_pages -24 (264 -> 240)
Signed-off-by: David Sterba <dsterba@suse.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 00:54:33 +0000 (01:54 +0100)]
btrfs: reorder btrfs_transaction members for better packing
There are now 20 bytes of holes, we can reduce that to 4 by minor
changes. Moving 'aborted' to the status and flags is also more logical,
similar for num_dirty_bgs. The size goes from 432 to 416.
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 01:12:57 +0000 (02:12 +0100)]
btrfs: use narrower type for btrfs_transaction::num_dirty_bgs
The u64 is an overkill here, we could not possibly create that many
blockgroups in one transaction.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 00:54:33 +0000 (01:54 +0100)]
btrfs: reorder btrfs_trans_handle members for better packing
Recent updates to the structure left some holes, reorder the types so
the packing is tight. The size goes from 112 to 104 on 64bit.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 00:39:58 +0000 (01:39 +0100)]
btrfs: switch to refcount_t type for btrfs_trans_handle::use_count
The use_count is a reference counter, we can use the refcount_t type,
though we don't use the atomicity. This is not a performance critical
code and we could catch the underflows. The type is changed from long,
but the number of references will fit an int.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 00:32:48 +0000 (01:32 +0100)]
btrfs: remove unused member of btrfs_trans_handle
Last user was removed in a monster commit
a22285a6a32390195235171
("Btrfs: Integrate metadata reservation with start_transaction") in
2010.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 8 Nov 2017 00:07:43 +0000 (01:07 +0100)]
btrfs: switch btrfs_trans_handle::adding_csums to bool
The semantics of adding_csums matches bool, 'short' was most likely used
to save space in
a698d0755adb6f2 ("Btrfs: add a type field for the
transaction handle").
Signed-off-by: David Sterba <dsterba@suse.com>
Edmund Nadolski [Mon, 20 Nov 2017 20:24:49 +0000 (13:24 -0700)]
btrfs: remove dead code from btrfs_get_extent
Due to new_inline logic, the create == 0 is always true at this
point in the code, so the create != 0 branch can be removed.
Signed-off-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Edmund Nadolski [Mon, 20 Nov 2017 20:24:47 +0000 (13:24 -0700)]
btrfs: btrfs_inode_log_parent should use defined inode_only values.
Replace hardcoded numeric argument values for inode_only with the
constants defined for that use.
Signed-off-by: Edmund Nadolski <enadolski@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 6 Nov 2017 18:23:00 +0000 (19:23 +0100)]
btrfs: switch to on-stack csum buffer in csum_tree_block
The maximum size of a checksum buffer is known, BTRFS_CSUM_SIZE, and we
don't have to allocate it dynamically. This code path is not used at all
as we have only the crc32c and use an on-stack buffer already.
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Wed, 15 Nov 2017 23:10:28 +0000 (16:10 -0700)]
Btrfs: set plug for fsync
Setting plug can merge adjacent IOs before dispatching IOs to the disk
driver.
Without plug, it'd not be a problem for single disk usecases, but for
multiple disks using raid profile, a large IO can be split to several
IOs of stripe length, and plug can be helpful to bring them together
for each disk so that we can save several disk access.
Moreover, fsync issues synchronous writes, so plug can really take
effect.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Thu, 9 Nov 2017 15:45:24 +0000 (23:45 +0800)]
btrfs: factor __btrfs_open_devices() to create btrfs_open_one_device()
No functional changes, create btrfs_open_one_device() from
__btrfs_open_devices(). This is a preparatory work to add dynamic
device scan.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
[ minor whitespace fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Thu, 9 Nov 2017 15:45:25 +0000 (23:45 +0800)]
btrfs: move check for device generation to the last
No functional changes. This helps to move the entire section into
a new function.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Thu, 9 Nov 2017 15:45:23 +0000 (23:45 +0800)]
btrfs: set fs_devices->seed directly
This is in preparation to move a section of code in __btrfs_open_devices()
into a new function so that it can be reused. As we set seeding if any of
the device is having SB flag BTRFS_SUPER_FLAG_SEEDING, so do it in the
device list loop itself. No functional changes.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Geert Uytterhoeven [Wed, 15 Nov 2017 15:04:40 +0000 (16:04 +0100)]
btrfs: ref-verify: Remove unused parameter from walk_up_tree() to kill warning
With gcc-4.1.2:
fs/btrfs/ref-verify.c: In function ‘btrfs_build_ref_tree’:
fs/btrfs/ref-verify.c:1017: warning: ‘root’ is used uninitialized in this function
The variable is indeed passed uninitialized, but it is never used by the
callee. However, not all versions of gcc are smart enough to notice.
Hence remove the unused parameter from walk_up_tree() to silence the
compiler warning.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to read_extent_buffer_pages
All callers pass btree_get_extent, which needs to be exported.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to __do_contiguous_readpages
All callers pass btrfs_get_extent.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to __extent_readpages
All callers pass btrfs_get_extent.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to extent_readpages
There's only one caller that passes btrfs_get_extent.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to get_extent_skip_holes
All callers pass btrfs_get_extent_fiemap and get_extent_skip_holes
itself is used only as a fiemap helper.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:09:57 +0000 (04:09 +0200)]
btrfs: sink get_extent parameter to extent_fiemap
All callers pass btrfs_get_extent_fiemap and we don't expect anything
else in the context of extent_fiemap.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 02:01:08 +0000 (04:01 +0200)]
btrfs: drop get_extent from extent_page_data
Previous patches cleaned up all places where
extent_page_data::get_extent was set and it was btrfs_get_extent all the
time, so we can simply call that instead.
This also reduces size of extent_page_data by 8 bytes which has positive
effect on stack consumption on various functions on the write out path.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 01:47:28 +0000 (03:47 +0200)]
btrfs: sink get_extent parameter to extent_write_full_page
There's only one caller.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 01:47:28 +0000 (03:47 +0200)]
btrfs: sink get_extent parameter to extent_write_locked_range
There's only one caller.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 23 Jun 2017 01:46:07 +0000 (03:46 +0200)]
btrfs: sink get_extent parameter to extent_writepages
There's only one caller.
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 8 Nov 2017 00:54:26 +0000 (08:54 +0800)]
btrfs: Cleanup existing name_len checks
Since tree-checker has verified leaf when reading from disk, we don't
need the existing verify_dir_item() or btrfs_is_name_len_valid() checks.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 8 Nov 2017 00:54:25 +0000 (08:54 +0800)]
btrfs: tree-checker: Add checker for dir item
Add checker for dir item, for key types DIR_ITEM, DIR_INDEX and
XATTR_ITEM.
This checker does comprehensive checks for:
1) dir_item header and its data size
Against item boundary and maximum name/xattr length.
This part is mostly the same as old verify_dir_item().
2) dir_type
Against maximum file types, and against key type.
Since XATTR key should only have FT_XATTR dir item, and normal dir
item type should not have XATTR key.
The check between key->type and dir_type is newly introduced by this
patch.
3) name hash
For XATTR and DIR_ITEM key, key->offset is name hash (crc32c).
Check the hash of the name against the key to ensure it's correct.
The name hash check is only found in btrfs-progs before this patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 31 Oct 2017 16:08:27 +0000 (17:08 +0100)]
btrfs: use GFP_KERNEL in btrfs_alloc_inode
This callback is called directly from VFS, no locks are held at the
allocation time.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 31 Oct 2017 16:02:39 +0000 (17:02 +0100)]
btrfs: sink gfp parameter to clear_extent_uptodate
There's only one callsite with GFP_NOFS.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 31 Oct 2017 15:37:52 +0000 (16:37 +0100)]
btrfs: sink gfp parameter to clear_extent_bit
All callers use GFP_NOFS, we don't have to pass it as an argument. The
built-in tests pass GFP_KERNEL, but they run only at module load time
and NOFS works there as well.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 31 Oct 2017 15:30:47 +0000 (16:30 +0100)]
btrfs: prepare to drop gfp mask parameter from clear_extent_bit
Use __clear_extent_bit directly in case we want to pass unknown
gfp flags. Otherwise all clear_extent_bit callers use GFP_NOFS, so we
can sink them to the function and reduce argument count, at the cost
that __clear_extent_bit has to be exported.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 15 Jun 2017 22:28:47 +0000 (00:28 +0200)]
btrfs: use non-RCU list traversal in write_all_supers callees
We take the fs_devices::device_list_mutex mutex in write_all_supers
which will prevent any add/del changes to the device list. Therefore we
don't need to use the RCU variant list_for_each_entry_rcu in any of the
called functions.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 15 Jun 2017 22:09:21 +0000 (00:09 +0200)]
btrfs: switch to RCU for device traversal in btrfs_ioctl_fs_info
We don't need to use the mutex as we do not modify the devices nor the
list itself and just read information about device counts.
Move copying fsid out of the protected section, not applicable to RCU
same as the rest of the retrieved information.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Thu, 15 Jun 2017 22:09:21 +0000 (00:09 +0200)]
btrfs: switch to RCU for device traversal in btrfs_ioctl_dev_info
We don't need to use the mutex as we do not modify the devices nor the
list itself and just read some information:
does not change during device lifetime:
- devid
- uuid
- name (ie. the path)
may change in parallel to the ioctl call, but can lead only to reporting
inacurracy:
- bytes_used
- total_bytes
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 19 Jun 2017 14:55:35 +0000 (16:55 +0200)]
btrfs: simplify btrfs_close_bdev
Split the conditions a bit.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Fri, 16 Jun 2017 20:30:00 +0000 (22:30 +0200)]
btrfs: document device locking
Overview of the main locks protecting various device-related structures.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 30 Oct 2017 18:29:46 +0000 (19:29 +0100)]
btrfs: simplify exit paths in btrfs_init_new_device
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 30 Oct 2017 17:55:47 +0000 (18:55 +0100)]
btrfs: use free_device where opencoded
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 30 Oct 2017 17:10:25 +0000 (18:10 +0100)]
btrfs: introduce free_device helper
A helper to free a device and all it's dynamically allocated members,
like the rcu_string name or flush_bio. This is going to replace all
open coded places.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Tue, 6 Jun 2017 15:08:23 +0000 (17:08 +0200)]
btrfs: rename device free rcu helper to free_device_rcu
Make it clear that it is an RCU helper, we want to use the name
free_device for a wrapper freeing all device members.
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Wed, 1 Nov 2017 23:19:27 +0000 (17:19 -0600)]
Btrfs: document rules about bio async submit
These rules have been hidden in several if-else and are not
straightforward to follow, for example, dio submit hook's nocsum case
has a bug , i.e. doing async submit instead of sync submit, which has
been fixed recently.
This is documenting the rules for reference.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 7 Nov 2017 09:22:54 +0000 (11:22 +0200)]
btrfs: Reduce scope of delayed_rsv->lock in may_commit_trans
After commit
996478ca9c460886ac1 ("btrfs: change how we decide to commit
transactions during flushing") there is no need to hold the delayed_rsv
during the percpu_counter_compare call since we get the byte's snapshot
earlier. So hold the lock only while reading delayed_rsv.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Thu, 2 Nov 2017 23:21:50 +0000 (17:21 -0600)]
Btrfs: add __init macro to btrfs init functions
Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 6 Nov 2017 08:36:15 +0000 (16:36 +0800)]
btrfs: rename btrfs_add_device to btrfs_add_dev_item
Function btrfs_add_device() is adding the device item so rename to
reflect that in the function. Similarly we have btrfs_rm_dev_item().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 31 Oct 2017 06:08:16 +0000 (14:08 +0800)]
btrfs: Don't generate UUID for non-fs tree
btrfs_create_tree() will unconditionally generate UUID for any root.
So for quota tree and data reloc tree created by kernel, they will have
unique UUIDs.
However UUID in root item is only referred by UUID tree, which only
records UUID for fs trees. This makes unique UUIDs for quota/data reloc
tree meaningless.
Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output
less confusing.
Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Mon, 6 Nov 2017 02:28:00 +0000 (10:28 +0800)]
btrfs: move volume_mutex into the btrfs_rm_device()
A cleanup patch no functional change, we hold volume_mutex before
calling btrfs_rm_device, so move it into the function itself.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 1 Nov 2017 09:36:05 +0000 (11:36 +0200)]
btrfs: Use locked_end rather than open coding it
Right before we go into this loop locked_end is set to alloc_end - 1 and
is being used in nearby functions, no need to have exceptions. This just
makes the code consistent, no functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Wed, 1 Nov 2017 09:32:18 +0000 (11:32 +0200)]
btrfs: Move loop termination condition in while()
Fallocating a file in btrfs goes through several stages. The one before
actually inserting the fallocated extents is to create a qgroup
reservation, covering the desired range. To this end there is a loop in
btrfs_fallocate which checks to see if there are holes in the fallocated
range or !PREALLOC extents past EOF and if so create qgroup reservations
for them. Unfortunately, the main condition of the loop is burried right
at the end of its body rather than in the actual while statement which
makes it non-obvious. Fix this by moving the condition in the while
statement where it belongs. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 10 Oct 2017 21:51:02 +0000 (15:51 -0600)]
Btrfs: remove rcu_barrier in btrfs_close_devices
It was introduced because btrfs used to do blkdev_put in a deferred
work, now that btrfs has blkdev_put in place, this rcu_barrier can be
removed.
modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup
every %fs_devices on the list, but when we do btrfs_close_devices(), we
have replaced the devices on the list with dummy ones which only have
the same name and uuid, so modprobe -r btrfs will free those instead of
what we were using, this change won't cause a problem for it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copied 2nd paragraph from mailinglist discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Mon, 23 Oct 2017 10:51:49 +0000 (13:51 +0300)]
btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items
btrfs_balance_delayed_items is the sole caller of
btrfs_wq_run_delayed_node and already includes one of the checks whether
the delayed inodes should be run. On the other hand
btrfs_wq_run_delayed_node duplicates that check and performs an
additional one for wq congestion.
Let's remove the duplicate check and move the congestion one in
btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only
care about setting up the wq run. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Mon, 23 Oct 2017 10:51:48 +0000 (13:51 +0300)]
btrfs: Make btrfs_async_run_delayed_root use a loop rather than multiple labels
Currently btrfs_async_run_delayed_root's implementation uses 3 goto
labels to mimic the functionality of a simple do {} while loop. Refactor
the function to use a do {} while construct, making intention clear and
code easier to follow. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 24 Oct 2017 08:50:39 +0000 (11:50 +0300)]
btrfs: Remove redundant mirror_num arg
The following callpath is always invoked with mirror_num set to 0, so
let's remove it as an argument and directly pass 0 to __do_redpage. No
functional change.
extent_readpages
__extent_readpages
__do_contiguous_readpages
__do_readpage
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Fri, 20 Oct 2017 15:10:59 +0000 (18:10 +0300)]
btrfs: Remove unused function
It's sole callsite was removed in a previous patch so just nuke it for good.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Fri, 20 Oct 2017 15:10:58 +0000 (18:10 +0300)]
btrfs: Remove redundant memory barrier in dev stats
As per atomic_t.txt documentation :
- RMW operations that have a return value are fully ordered;
atomic_xchg is one such operation so it already includes everything it
needs w.r.t memory ordering and add a comment to be more explicit about
that.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Tue, 24 Oct 2017 10:47:37 +0000 (13:47 +0300)]
btrfs: Fix memory barriers usage with device stats counters
Commit
addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev
stats is cleared") reworked the way device stats changes are tracked. A
new atomic dev_stats_ccnt counter was introduced which is incremented
every time any of the device stats counters are changed. This serves as
a flag whether there are any pending stats changes. However, this patch
only partially implemented the correct memory barriers necessary:
- It only ordered the stores to the counters but not the reads e.g.
btrfs_run_dev_stats
- It completely omitted any comments documenting the intended design and
how the memory barriers pair with each-other
This patch provides the necessary comments as well as adds a missing
smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only
a snapshot at best there was no point in reading the counter twice -
once in btrfs_dev_stats_dirty and then again when assigning stats_cnt.
Just collapse both reads into 1.
Fixes: addc3fa74e5b ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anand Jain [Fri, 20 Oct 2017 17:45:33 +0000 (01:45 +0800)]
btrfs: clean up btrfs_dev_stat_inc usage
btrfs_end_bio() is using btrfs_dev_stat_inc() and then
btrfs_dev_stat_print_on_error() separately instead use
btrfs_dev_stat_inc_and_print() directly.
As of now there isn't any bio in btrfs which is - a non-empty write and
also the REQ_PREFLUSH flag is set. So in actual the condition
if (bio->bi_opf & REQ_PREFLUSH)
is never true in btrfs_end_bio(), and so there won't be any redundant
error log by using btrfs_dev_stat_inc_and_print() separately one for
write and another for flush.
This consolidation will help to add the device critical error handles in
the function btrfs_dev_stat_inc_and_print() and which can be renamed as
needed.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 24 Oct 2017 05:02:54 +0000 (23:02 -0600)]
Btrfs: free btrfs_device in place
It's pointless to defer it to a kthread helper as we're not under a
special context.
For reference, commit
1f78160ce1b1 ("Btrfs: using rcu lock in the reader
side of devices list") introduced RCU freeing for device structures.
Originally the blkdev_put was called from free_device and rcu_barrier had
to be called. This is no longer required, bdev and our device structures
are now freed separately.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Fri, 20 Oct 2017 23:53:41 +0000 (17:53 -0600)]
Btrfs: remove redundant btrfs_balance_delayed_items
In functions like btrfs_create(), we run both
btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after
the operation, but btrfs_btree_balance_dirty() is surely going to run
btrfs_balance_delayed_items().
This keeps only btrfs_btree_balance_dirty().
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 21 Jan 2018 21:51:26 +0000 (13:51 -0800)]
Linux 4.15-rc9
Linus Torvalds [Sun, 21 Jan 2018 18:48:35 +0000 (10:48 -0800)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 pti fixes from Thomas Gleixner:
"A small set of fixes for the meltdown/spectre mitigations:
- Make kprobes aware of retpolines to prevent probes in the retpoline
thunks.
- Make the machine check exception speculation protected. MCE used to
issue an indirect call directly from the ASM entry code. Convert
that to a direct call into a C-function and issue the indirect call
from there so the compiler can add the retpoline protection,
- Make the vmexit_fill_RSB() assembly less stupid
- Fix a typo in the PTI documentation"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
x86/pti: Document fix wrong index
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
retpoline: Introduce start/end markers of indirect thunk
x86/mce: Make machine check speculation protected
Linus Torvalds [Sun, 21 Jan 2018 18:41:48 +0000 (10:41 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 kexec fix from Thomas Gleixner:
"A single fix for the WBINVD issue introduced by the SME support which
causes kexec fails on non AMD/SME capable CPUs. Issue WBINVD only when
the CPU has SME and avoid doing so in a loop"
[ Side note: this patch fixes the problem, but it isn't entirely clear
why it is required. The wbinvd should just work regardless, but there
seems to be some system - as opposed to CPU - issue, since the wbinvd
causes more problems later in the shutdown sequence, but wbinvd
instructions while the system is still active are not problematic.
Possibly some SMI or pending machine check issue on the affected system ]
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()
Linus Torvalds [Sun, 21 Jan 2018 18:39:58 +0000 (10:39 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the new matrix allocator to prevent vector exhaustion
by certain network drivers which allocate gazillions of unused vectors
which cannot be put into reservation mode due to MSI and the lack of
MSI entry masking.
The fix/workaround is to spread the vectors across CPUs by searching
the supplied target CPU mask for the CPU with the smallest number of
allocated vectors"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/matrix: Spread interrupts on allocation
Linus Torvalds [Sun, 21 Jan 2018 04:12:47 +0000 (20:12 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mattst88/alpha
Pull alpha fixes from Matt Turner:
"A build fix and a regression fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha/PCI: Fix noname IRQ level detection
alpha: extend memset16 to EV6 optimised routines
Laura Abbott [Sun, 21 Jan 2018 01:14:02 +0000 (17:14 -0800)]
x86: Use __nostackprotect for sme_encrypt_kernel
Commit
bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME
PGD mapping") moved some parameters into a structure.
The structure was large enough to trigger the stack protection canary in
sme_encrypt_kernel which doesn't work this early, causing reboots.
Mark sme_encrypt_kernel appropriately to not use the canary.
Fixes: bacf6b499e11 ("x86/mm: Use a struct to reduce parameters for SME PGD mapping")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lorenzo Pieralisi [Tue, 16 Jan 2018 11:52:59 +0000 (11:52 +0000)]
alpha/PCI: Fix noname IRQ level detection
The conversion of the alpha architecture PCI host bridge legacy IRQ
mapping/swizzling to the new PCI host bridge map/swizzle hooks carried
out through:
commit
0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")
implies that IRQ for devices are now allocated through pci_assign_irq()
function in pci_device_probe() that is called when a driver matching a
device is found in order to probe the device through the device driver.
Alpha noname platforms required IRQ level programming to be executed
in sio_fixup_irq_levels(), that is called in noname_init_pci(), a
platform hook called within a subsys_initcall.
In noname_init_pci(), present IRQs are detected through
sio_collect_irq_levels() that check the struct pci_dev->irq number
to detect if an IRQ has been allocated for the device.
By the time sio_collect_irq_levels() is called, some devices may still
have not a matching driver loaded to match them (eg loadable module)
therefore their IRQ allocation is still pending - which means that
sio_collect_irq_levels() does not programme the correct IRQ level for
those devices, causing their IRQ handling to be broken when the device
driver is actually loaded and the device is probed.
Fix the issue by adding code in the noname map_irq() function
(noname_map_irq()) that, whilst mapping/swizzling the IRQ line, it also
ensures that the correct IRQ level programming is executed at platform
level, fixing the issue.
Fixes: 0e4c2eeb758a ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # 4.14
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Linus Torvalds [Sat, 20 Jan 2018 19:41:09 +0000 (11:41 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- fix incorrect huge page mappings on systems using the contiguous
hint for hugetlbfs
- support alternative GICv4 init sequence
- correctly implement the ARM SMCC for HVC and SMC handling
PPC:
- add KVM IOCTL for reporting vulnerability and workaround status
s390:
- provide userspace interface for branch prediction changes in
firmware
x86:
- use correct macros for bits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: wire up bpb feature
KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds
KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs()
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
KVM: arm64: Fix GICv4 init when called from vgic_its_create
KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2
Linus Torvalds [Sat, 20 Jan 2018 19:37:00 +0000 (11:37 -0800)]
Merge tag 'mips_fixes_4.15_2' of git://git./linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
"Some final MIPS fixes for 4.15, including important build fixes and a
MAINTAINERS update:
- Add myself as MIPS co-maintainer.
- Fix various all*config build failures (particularly as a result of
switching the default MIPS platform to the "generic" platform).
- Fix GCC7 build failures (duplicate const and questionable calls to
missing __multi3 intrinsic on mips64r6).
- Fix warnings when CPU Idle is enabled (4.14).
- Fix AR7 serial output (since 3.17).
- Fix ralink platform_get_irq error checking (since 3.12)"
* tag 'mips_fixes_4.15_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MAINTAINERS: Add James as MIPS co-maintainer
MIPS: Fix undefined reference to physical_memsize
MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
MIPS: mm: Fix duplicate "const" on insn_table_MM
MIPS: CM: Drop WARN_ON(vp != 0)
MIPS: ralink: Fix platform_get_irq's error checking
MIPS: Fix CPS SMP NS16550 UART defaults
MIPS: BCM47XX Avoid compile error with MIPS allnoconfig
MIPS: RB532: Avoid undefined mac_pton without GENERIC_NET_UTILS
MIPS: RB532: Avoid undefined early_serial_setup() without SERIAL_8250_CONSOLE
MIPS: ath25: Avoid undefined early_serial_setup() without SERIAL_8250_CONSOLE
MIPS: AR7: ensure the port type's FCR value is used
Christian Borntraeger [Wed, 17 Jan 2018 13:44:34 +0000 (14:44 +0100)]
KVM: s390: wire up bpb feature
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Changed capability number to 152. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Radim Krčmář [Sat, 20 Jan 2018 16:29:00 +0000 (17:29 +0100)]
Merge tag 'kvm-ppc-cve-4.15-2' of git://git./linux/kernel/git/paulus/powerpc
Add PPC KVM ioctl to report vulnerability and workaround status to userspace.
Linus Torvalds [Fri, 19 Jan 2018 23:20:00 +0000 (15:20 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for SAS attached SATA CD-ROMs. It turns out that the libata
handling of CD devices relies on the SCSI error handler, so disable
async aborts (which don't start the error handler) for these devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: Disable asynchronous aborts for SATA devices
Linus Torvalds [Fri, 19 Jan 2018 23:16:49 +0000 (15:16 -0800)]
Merge tag 'for-4.15/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"All fixes marked for stable:
- Fix DM thinp btree corruption seen when inserting a new key/value
pair into a full root node.
- Fix DM thinp btree removal deadlock due to artificially low number
of allowed concurrent locks allowed.
- Fix possible DM crypt corruption if kernel keyring service is used.
Only affects ciphers using following IVs: essiv, lmk and tcw.
- Two DM crypt device initialization error checking fixes.
- Fix DM integrity to allow use of async ciphers that require DMA"
* tag 'for-4.15/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: fix error return code in crypt_ctr()
dm crypt: wipe kernel key copy after IV initialization
dm integrity: don't store cipher request on the stack
dm crypt: fix crash by adding missing check for auth key size
dm btree: fix serious bug in btree_split_beneath()
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
Linus Torvalds [Fri, 19 Jan 2018 19:38:19 +0000 (11:38 -0800)]
Merge tag 'trace-v4.15-rc4-3' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two more small fixes
- The conversion of enums into their actual numbers to display in the
event format file had an off-by-one bug, that could cause an enum
not to be converted, and break user space parsing tools.
- A fix to a previous fix to bring back the context recursion checks.
The interrupt case checks for NMI, IRQ and softirq, but the softirq
returned the same number regardless if it was set or not, although
the logic would force it to be set if it were hit"
* tag 'trace-v4.15-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix converting enum's from the map in trace_event_eval_update()
ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
Linus Torvalds [Fri, 19 Jan 2018 19:36:09 +0000 (11:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for use-after-free in Synaptics RMI4 driver
- correction to multitouch contact tracking on certain ALPS touchpads
(which got broken when we tried to fix the 2-finger scrolling)
- touchpad on Lenovo T640p is switched over to SMbus/RMI
- a few device node refcount fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics-rmi4 - prevent UAF reported by KASAN
Input: ALPS - fix multi-touch decoding on SS4 plus touchpads
Input: synaptics - Lenovo Thinkpad T460p devices should use RMI
Input: of_touchscreen - add MODULE_LICENSE
Input: 88pm860x-ts - fix child-node lookup
Input: twl6040-vibra - fix child-node lookup
Input: twl4030-vibra - fix sibling-node lookup
Linus Torvalds [Fri, 19 Jan 2018 19:30:06 +0000 (11:30 -0800)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two bugfixes for the I2C core: Lixing Wang fixed a refcounting problem
with DT nodes. Jeremy Compostella fixed a buffer overflow possibility
when using a 'don't use' ioctl interface directly"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: core-smbus: prevent stack corruption on read I2C_BLOCK_DATA
i2c: core: decrease reference count of device node in i2c_unregister_device
Linus Torvalds [Fri, 19 Jan 2018 19:26:59 +0000 (11:26 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixlet from Tejun Heo:
"This just adds one more entry for liteon optical drives to the device
blacklist for large IOs.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
Linus Torvalds [Fri, 19 Jan 2018 19:25:17 +0000 (11:25 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"cgroup.threads should be delegatable (ie. a container should be able
to write to it from inside) but was missing the flag.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: make cgroup.threads delegatable
Linus Torvalds [Fri, 19 Jan 2018 19:23:39 +0000 (11:23 -0800)]
Merge branch 'for-4.15-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixlet from Tejun Heo:
"One patch to add touch_nmi_watchdog() while dumping workqueue debug
messages to avoid triggering the lockup detector spuriously.
The change is very low risk"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: avoid hard lockups in show_workqueue_state()
Linus Torvalds [Fri, 19 Jan 2018 19:21:31 +0000 (11:21 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"We have various small DT fixes, and one important regression fix:
The recent device tree bugfixes that were intended to address issues
that 'dtc' started warning about in 4.15 fixed various USB PHY device
nodes, but it turns out that we had code that depended on those nodes
being incorrect and the probe failing with a particular error code.
With the workaround we can also deal with correct device nodes.
The DT fixes include:
- Allwinner A10 and A20 had the display pipeline set up incorrectly
(introduced in v4.15)
- The Altera PMU lacked an interrupt-parent (never worked)
- Pin muxing on the Openblocks A7 (never worked)
- Clocks might get set up wrong on Armada 7K/8K (4.15 regression)
We now have additional device tree patches to address all the
remaining warnings introduced in 4.15, but decided to queue them for
4.16 instead, to avoid risking another regression like the USB PHY
thing mentioned above.
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
phy: work around 'phys' references to usb-nop-xceiv devices
ARM: sunxi_defconfig: Enable CMA
arm64: dts: socfpga: add missing interrupt-parent
ARM: dts: sun[47]i: Fix display backend 1 output to TCON0 remote endpoint
ARM64: dts: marvell: armada-cp110: Fix clock resources for various node
ARM: dts: da850-lcdk: Remove leading 0x and 0s from unit address
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
Linus Torvalds [Fri, 19 Jan 2018 19:19:11 +0000 (11:19 -0800)]
Merge tag 'powerpc-4.15-8' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"More than we'd like after rc8, but nothing very alarming either, just
tying up loose ends before the release:
Since we changed powernv to use cpufreq_get() from show_cpuinfo(), we
see warnings with PREEMPT enabled. But the preempt_disable() in
show_cpuinfo() doesn't actually prevent CPU hotplug as it suggests, so
remove it.
Two updates to the recently merged RFI flush code. Wire up the generic
sysfs file to report the status, and add a debugfs file to allow
enabling/disabling it at runtime.
Two updates to xmon, one to add the RFI flush related fields to the
paca dump, and another to not use hashed pointers in the paca dump.
And one minor fix to add a missing include of linux/types.h in
asm/hvcall.h, not seen to break the build in upstream, but correct
anyway.
Thanks to: Benjamin Herrenschmidt, Michal Suchanek, Nicholas Piggin"
* tag 'powerpc-4.15-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: include linux/types.h in asm/hvcall.h
powerpc/64s: Allow control of RFI flush via debugfs
powerpc/64s: Wire up cpu_show_meltdown()
powerpc: Don't preempt_disable() in show_cpuinfo()
powerpc/xmon: Don't print hashed pointers in paca dump
powerpc/xmon: Add RFI flush related fields to paca dump
Linus Torvalds [Fri, 19 Jan 2018 19:16:01 +0000 (11:16 -0800)]
Merge tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nouveau, i915, vmwgfx and sun4i regression fixes.
The i915 change fixes a display corruption problem introduced in 4.15,
the nouveau changes are for regressions in 4.15, one of the vmwgfx
fixes goes back a little further, the other is a 4.15 regression fix,
the 3 sun4i changes fix blank HDMI output on those devices"
* tag 'drm-fixes-for-v4.15-rc9' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/mmu/mcp77: fix regressions in stolen memory handling
drm/nouveau/bar/gk20a: Avoid bar teardown during init
drm/nouveau/drm/nouveau: Pass the proper arguments to nvif_object_map_handle()
drm/vmwgfx: fix memory corruption with legacy/sou connectors
drm/vmwgfx: Fix a boot time warning
drm/i915: Fix deadlock in i830_disable_pipe()
drm/i915: Redo plane sanitation during readout
drm/i915: Add .get_hw_state() method for planes
drm/sun4i: hdmi: Add missing rate halving check in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Fix incorrect assignment in sun4i_tmds_determine_rate
drm/sun4i: hdmi: Check for unset best_parent in sun4i_tmds_determine_rate
Linus Torvalds [Fri, 19 Jan 2018 18:56:18 +0000 (10:56 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
sparse doesn't support struct randomization
proc: fix coredump vs read /proc/*/stat race
scripts/gdb/linux/tasks.py: fix get_thread_info
scripts/decodecode: fix decoding for AArch64 (arm64) instructions
mm/page_owner.c: remove drain_all_pages from init_early_allocated_pages
mm/memory.c: release locked page in do_swap_page()
Matthew Wilcox [Thu, 18 Jan 2018 21:52:17 +0000 (13:52 -0800)]
ia64: Rewrite atomic_add and atomic_sub
Force __builtin_constant_p to evaluate whether the argument to atomic_add
& atomic_sub is constant in the front-end before optimisations which
can lead GCC to output a call to __bad_increment_for_ia64_fetch_and_add().
See GCC bugzilla 83653.
Signed-off-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Fri, 19 Jan 2018 00:34:08 +0000 (16:34 -0800)]
sparse doesn't support struct randomization
Without this patch, I drown in a sea of unknown attribute warnings
Link: http://lkml.kernel.org/r/20180117024539.27354-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Fri, 19 Jan 2018 00:34:05 +0000 (16:34 -0800)]
proc: fix coredump vs read /proc/*/stat race
do_task_stat() accesses IP and SP of a task without bumping reference
count of a stack (which became an entity with independent lifetime at
some point).
Steps to reproduce:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#include <sys/wait.h>
int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){});
while (1) {
char buf[64];
char buf2[4096];
pid_t pid;
int fd;
pid = fork();
if (pid == 0) {
*(volatile int *)0 = 0;
}
snprintf(buf, sizeof(buf), "/proc/%u/stat", pid);
fd = open(buf, O_RDONLY);
read(fd, buf2, sizeof(buf2));
close(fd);
waitpid(pid, NULL, 0);
}
return 0;
}
BUG: unable to handle kernel paging request at
0000000000003fd8
IP: do_task_stat+0x8b4/0xaf0
PGD
800000003d73e067 P4D
800000003d73e067 PUD
3d558067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
RIP: 0010:do_task_stat+0x8b4/0xaf0
Call Trace:
proc_single_show+0x43/0x70
seq_read+0xe6/0x3b0
__vfs_read+0x1e/0x120
vfs_read+0x84/0x110
SyS_read+0x3d/0xa0
entry_SYSCALL_64_fastpath+0x13/0x6c
RIP: 0033:0x7f4d7928cba0
RSP: 002b:
00007ffddb245158 EFLAGS:
00000246
Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24
RIP: do_task_stat+0x8b4/0xaf0 RSP:
ffffc90000607cc8
CR2:
0000000000003fd8
John Ogness said: for my tests I added an else case to verify that the
race is hit and correctly mitigated.
Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: "Kohli, Gaurav" <gkohli@codeaurora.org>
Tested-by: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xi Kangjie [Fri, 19 Jan 2018 00:34:00 +0000 (16:34 -0800)]
scripts/gdb/linux/tasks.py: fix get_thread_info
Since kernel 4.9, the thread_info has been moved into task_struct, no
longer locates at the bottom of kernel stack.
See commits
c65eacbe290b ("sched/core: Allow putting thread_info into
task_struct") and
15f4eae70d36 ("x86: Move thread_info into
task_struct").
Before fix:
(gdb) set $current = $lx_current()
(gdb) p $lx_thread_info($current)
$1 = {flags =
1470918301}
(gdb) p $current.thread_info
$2 = {flags =
2147483648}
After fix:
(gdb) p $lx_thread_info($current)
$1 = {flags =
2147483648}
(gdb) p $current.thread_info
$2 = {flags =
2147483648}
Link: http://lkml.kernel.org/r/20180118210159.17223-1-imxikangjie@gmail.com
Fixes: 15f4eae70d36 ("x86: Move thread_info into task_struct")
Signed-off-by: Xi Kangjie <imxikangjie@gmail.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Fri, 19 Jan 2018 00:33:57 +0000 (16:33 -0800)]
scripts/decodecode: fix decoding for AArch64 (arm64) instructions
There are a couple of problems with the decodecode script and arm64:
1. AArch64 objdump refuses to disassemble .4byte directives as instructions,
insisting that they are data values and displaying them as:
a94153f3 .word 0xa94153f3 <-- trapping instruction
This is resolved by using the .inst directive instead.
2. Disassembly of branch instructions attempts to provide the target as
an offset from a symbol, e.g.:
0:
34000082 cbz w2, 10 <.text+0x10>
however this falls foul of the grep -v, which matches lines containing
".text" and ends up removing all branch instructions from the dump.
This patch resolves both issues by using the .inst directive for 4-byte
quantities on arm64 and stripping the resulting binaries (as is done on
arm already) to remove the mapping symbols.
Link: http://lkml.kernel.org/r/1506596147-23630-1-git-send-email-will.deacon@arm.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>