Jaegeuk Kim [Thu, 9 Feb 2017 18:38:09 +0000 (10:38 -0800)]
f2fs: add bitmaps for empty or full NAT blocks
This patches adds bitmaps to represent empty or full NAT blocks containing
free nid entries.
If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
use these bitmaps when building free nids. In order to avoid checkpointing
burden, up-to-date bitmaps will be flushed only during umount time. So,
normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
which recovers this bitmap again.
After this patch, we build free nids from nid #0 at mount time to make more
full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
without loading any NAT pages from disk.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Thu, 23 Feb 2017 11:39:59 +0000 (19:39 +0800)]
f2fs: replace rw semaphore extent_tree_lock with mutex lock
This patch replace rw semaphore extent_tree_lock with mutex lock
for no read cases with this lock.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Kinglong Mee [Thu, 23 Feb 2017 11:55:05 +0000 (19:55 +0800)]
f2fs: avoid m_flags overlay when allocating more data blocks
When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED
flags will be overlapped by F2FS_MAP_NEW at the later times.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hou Pengyang [Thu, 23 Feb 2017 09:18:06 +0000 (09:18 +0000)]
f2fs: remove unsafe bitmap checking
proc A: proc B:
- writeback_sb_inodes
- __writeback_single_inode
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg - write_checkpoint
- build_free_nids - flush_nat_entries
- __build_free_nids - __flush_nat_entry_set
- ra_meta_pages - get_next_nat_page
- current_nat_addr - set_to_next_nat
[do nat_bitmap checking] - f2fs_change_bit
For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and
nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp.
So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario.
This patch fix this by removing the monitoring point.
[Fix:
599a09b f2fs: check in-memory nat version bitmap]
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hou Pengyang [Thu, 23 Feb 2017 09:18:05 +0000 (09:18 +0000)]
f2fs: init local extent_info to avoid stale stack info in tp
To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp
dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end:
dev = (259,30), ino = 856, pgofs = 0,
ext_info(fofs:
3441207040, blk:
4294967232, len:
3481143808)
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlong Song [Tue, 21 Feb 2017 12:43:48 +0000 (20:43 +0800)]
f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc
Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_
free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in
both conditions.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 04:18:35 +0000 (20:18 -0800)]
f2fs: check discard alignment only for SEQWRITE zones
For converntional zones, we don't need to align discard commands to exact zone
size.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 03:58:23 +0000 (19:58 -0800)]
f2fs: wait for discard completion after submission
We don't need to wait for each discard commands when unmounting the image.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 03:53:07 +0000 (19:53 -0800)]
f2fs: much larger batched trim_fs job
We have a kernel thread to issue discard commands, so we can increase the
number of batched discard sections. By default, now it becomes 4GB range.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 03:10:35 +0000 (19:10 -0800)]
f2fs: avoid very large discard command
This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single
discard command.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 01:02:32 +0000 (17:02 -0800)]
f2fs: do SSR for node segments more aggresively
This patch gives more SSR chances for node blocks.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 01:10:18 +0000 (17:10 -0800)]
f2fs: find data segments across all the types
Previously, if type is CURSEG_HOT_DATA, we only check CURSEG_HOT_DATA only.
This patch fixes to search all the different types for SSR.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 23 Feb 2017 00:39:11 +0000 (16:39 -0800)]
f2fs: do SSR in higher priority
Let's check SSR in prior to LFS allocation.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlong Song [Wed, 22 Feb 2017 12:50:49 +0000 (20:50 +0800)]
f2fs: do SSR for data when there is enough free space
In allocate_segment_by_default(), need_SSR() already detected it's time to do
SSR. So, let's try to find victims for data segments more aggressively in time.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hou Pengyang [Wed, 22 Feb 2017 10:28:59 +0000 (10:28 +0000)]
f2fs: node segment is prior to data segment selected victim
As data segment gc may lead dnode dirty, so the greedy cost for data segment
should be valid blocks * 2, that is data segment is prior to node segment.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlong Song [Tue, 21 Feb 2017 08:59:26 +0000 (16:59 +0800)]
f2fs: put allocate_segment after refresh_sit_entry
SIT information should be updated before segment allocation, since SSR needs
latest valid block information. Current code does not update the old_blkaddr
info in sit_entry, so adjust the allocate_segment to its proper location. Commit
5e443818fa0b2a2845561ee25bec181424fb2889 ("f2fs: handle dirty segments inside
refresh_sit_entry") puts it into wrong location.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Hou Pengyang [Thu, 16 Feb 2017 12:34:31 +0000 (12:34 +0000)]
f2fs: add ovp valid_blocks check for bg gc victim to fg_gc
For foreground gc, greedy algorithm should be adapted, which makes
this formula work well:
(2 * (100 / config.overprovision + 1) + 6)
But currently, we fg_gc have a prior to select bg_gc victim segments to gc
first, these victims are selected by cost-benefit algorithm, we can't guarantee
such segments have the small valid blocks, which may destroy the f2fs rule, on
the worstest case, would consume all the free segments.
This patch fix this by add a filter in check_bg_victims, if segment's has # of
valid blocks over overprovision ratio, skip such segments.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 17 Feb 2017 17:55:55 +0000 (09:55 -0800)]
f2fs: do not wait for writeback in write_begin
Otherwise we can get livelock like below.
[79880.428136] dbench D 0 18405 18404 0x00000000
[79880.428139] Call Trace:
[79880.428142] __schedule+0x219/0x6b0
[79880.428144] schedule+0x36/0x80
[79880.428147] schedule_timeout+0x243/0x2e0
[79880.428152] ? update_sd_lb_stats+0x16b/0x5f0
[79880.428155] ? ktime_get+0x3c/0xb0
[79880.428157] io_schedule_timeout+0xa6/0x110
[79880.428161] __lock_page+0xf7/0x130
[79880.428164] ? unlock_page+0x30/0x30
[79880.428167] pagecache_get_page+0x16b/0x250
[79880.428171] grab_cache_page_write_begin+0x20/0x40
[79880.428182] f2fs_write_begin+0xa2/0xdb0 [f2fs]
[79880.428192] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs]
[79880.428197] ? kmem_cache_free+0x79/0x200
[79880.428203] ? __mark_inode_dirty+0x17f/0x360
[79880.428206] generic_perform_write+0xbb/0x190
[79880.428213] ? file_update_time+0xa4/0xf0
[79880.428217] __generic_file_write_iter+0x19b/0x1e0
[79880.428226] f2fs_file_write_iter+0x9c/0x180 [f2fs]
[79880.428231] __vfs_write+0xc5/0x140
[79880.428235] vfs_write+0xb2/0x1b0
[79880.428238] SyS_write+0x46/0xa0
[79880.428242] entry_SYSCALL_64_fastpath+0x1e/0xad
Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Fri, 17 Feb 2017 09:16:38 +0000 (17:16 +0800)]
f2fs: replace __get_victim by dirty_segments in FG_GC
In FG_GC process, it will search victim section twice. This will
cause some dirty section with less valid blocks skip garbage
collection.
section # 26425 : valid blocks # 3
142.037567: get_victim_by_default: victim 26425 : valid blocks # 3
142.037585: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.039494: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19022 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.070247: new_curseg: Debug: alloc new segment 26746
142.244341: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26054 ofs_unit = 1, pre_victim_secno = 26054, prefree = 0, free = 243
142.254475: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.293131: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23466 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.319001: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23467 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.368879: get_victim_by_default: victim 26425 : valid blocks # 3
142.368894: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.378127: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19612 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.416917: new_curseg: Debug: alloc new segment 26054
142.656794: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 25404 ofs_unit = 1, pre_victim_secno = 25404, prefree = 0, free = 243
142.662139: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.684159: new_curseg: Debug: alloc new segment 25197
142.685059: get_victim_by_default: victim 26425 : valid blocks # 3
142.685079: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 243
142.701427: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26238 ofs_unit = 1, pre_victim_secno = 26238, prefree = 0, free = 243
142.707105: do_garbage_collect: Debug: FG_GC, seg_freed = 1
142.802444: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23473 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244
142.804422: get_victim_by_default: victim 26425 : valid blocks # 3
142.804443: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
142.851567: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19092 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24
142.865014: new_curseg: Debug: alloc new segment 26238
143.082245: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26307 ofs_unit = 1, pre_victim_secno = 26307, prefree = 0, free = 244
143.088252: do_garbage_collect: Debug: FG_GC, seg_freed = 1
143.128307: new_curseg: Debug: alloc new segment 25404
143.181846: get_victim_by_default: victim 26425 : valid blocks # 3
143.181872: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 16 Feb 2017 18:35:41 +0000 (10:35 -0800)]
f2fs: trace victim's cost selectecd by f2fs_gc
This patch adds min_cost of each victims.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Feb 2017 17:54:37 +0000 (09:54 -0800)]
f2fs: fix multiple f2fs_add_link() calls having same name
It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple
vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries
having same name which breaks filesystem consistency.
Until upper layer fixes, let's work around by f2fs, which shows actually not
much performance regression.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Feb 2017 19:14:06 +0000 (11:14 -0800)]
f2fs: show actual device info in tracepoints
This patch shows actual device information in the tracepoints.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 15 Feb 2017 03:32:51 +0000 (19:32 -0800)]
f2fs: use SSR for warm node as well
We have had node chains, but haven't used it so far due to stale node blocks.
Now, we have crc|cp_ver in node footer and give random cp_ver at format time,
we can start to use it again.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Feb 2017 09:39:44 +0000 (17:39 +0800)]
f2fs: enable inline_xattr by default
In android, since SElinux is enable, security policy will be appliedd for
each file, it stores in inode as an xattr entry, so it will take one 4k
size node block additionally for each file.
Let's enable inline_xattr by default in order to save storage space.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 15 Feb 2017 02:34:45 +0000 (10:34 +0800)]
f2fs: introduce noinline_xattr mount option
This patch introduces new mount option 'noinline_xattr', so we can disable
inline xattr functionality which is already set as a default mount option.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 14 Feb 2017 01:02:44 +0000 (17:02 -0800)]
f2fs: avoid reading NAT page by get_node_info
We've not seen this buggy case for a long time, so it's time to avoid this
unnecessary get_node_info() call which reading NAT page to cache nat entry.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 11 Feb 2017 18:46:44 +0000 (10:46 -0800)]
f2fs: remove build_free_nids() during checkpoint
Let's avoid build_free_nids() in checkpoint path.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 8 Feb 2017 09:39:45 +0000 (17:39 +0800)]
f2fs: change recovery policy of xattr node block
Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.
This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.
So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Bhumika Goyal [Sat, 11 Feb 2017 10:20:46 +0000 (15:50 +0530)]
f2fs: super: constify fscrypt_operations structure
Declare fscrypt_operations structure as const as it is only stored in
the s_cop field of a super_block structure. This field is of type const,
so fscrypt_operations structure having this property can be made const
too.
File size before: fs/f2fs/super.o
text data bss dec hex filename
54131 31355 184 85670 14ea6 fs/f2fs/super.o
File size after: fs/f2fs/super.o
text data bss dec hex filename
54227 31259 184 85670 14ea6 fs/f2fs/super.o
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 9 Feb 2017 21:25:35 +0000 (13:25 -0800)]
f2fs: show checkpoint version at mount time
If we mounted f2fs successfully, let's show current checkpoint version.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tiezhu Yang [Tue, 7 Feb 2017 21:08:01 +0000 (05:08 +0800)]
f2fs: fix a typo in f2fs.txt
There is a typo "f2f2" in f2fs.txt, this patch fixes it.
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 6 Feb 2017 21:57:58 +0000 (13:57 -0800)]
f2fs: remove preflush for nobarrier case
This patch removes REQ_PREFLUSH in the nobarrier case.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 2 Feb 2017 00:51:22 +0000 (16:51 -0800)]
f2fs: check last page index in cached bio to decide submission
If the cached bio has the last page's index, then we need to submit it.
Otherwise, we don't need to submit it and can wait for further IO merges.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 4 Feb 2017 01:44:04 +0000 (17:44 -0800)]
f2fs: check io submission more precisely
This patch check IO submission more precisely than previous rough check.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sat, 4 Feb 2017 01:18:00 +0000 (17:18 -0800)]
f2fs: call internal __write_data_page directly
This patch introduces __write_data_page to call it by f2fs_write_cache_pages
directly..
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 3 Feb 2017 02:18:06 +0000 (18:18 -0800)]
f2fs: avoid out-of-order execution of atomic writes
We need to flush data writes before flushing last node block writes by using
FUA with PREFLUSH. We don't need to guarantee precedent node writes since if
those are not written, we can't reach to the last node block when scanning
node block chain during roll-forward recovery.
Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to
disk, which builds a valid node block chain.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 3 Feb 2017 02:27:17 +0000 (18:27 -0800)]
f2fs: move write_node_page above fsync_node_pages
This patch just moves write_node_page and introduces an inner function.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 3 Feb 2017 00:40:55 +0000 (16:40 -0800)]
f2fs: move flush tracepoint
This patch moves the tracepoint location for flush command.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 1 Feb 2017 23:40:11 +0000 (15:40 -0800)]
f2fs: show # of APPEND and UPDATE inodes
This patch shows cached # of APPEND and UPDATE inode entries.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
DongOh Shin [Mon, 30 Jan 2017 18:55:18 +0000 (10:55 -0800)]
f2fs: fix 446 coding style warnings in f2fs.h
1) Nine coding style warnings below have been resolved:
"Missing a blank line after declarations"
2) 435 coding style warnings below have been resolved:
"function definition argument 'x' should also have an identifier name"
3) Two coding style warnings below have been resolved:
"macros should not use a trailing semicolon"
Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
DongOh Shin [Mon, 30 Jan 2017 18:55:17 +0000 (10:55 -0800)]
f2fs: fix 3 coding style errors in f2fs.h
Two coding style errors below have been resolved:
"Macros with complex values should be enclosed in parentheses"
And a coding style error below has been resolved:
"space prohibited before that ',' (ctx:WxW)"
Signed-off-by: DongOh Shin <doscode.kr@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Sun, 29 Jan 2017 05:27:02 +0000 (14:27 +0900)]
f2fs: declare missing static function
We missed two functions declared as static functions.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Kaixu Xia [Fri, 27 Jan 2017 01:35:37 +0000 (09:35 +0800)]
f2fs: show the fault injection mount option
This patch shows the fault injection mount option in
f2fs_show_options().
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 25 Jan 2017 02:52:40 +0000 (10:52 +0800)]
f2fs: fix null pointer dereference when issuing flush in ->fsync
We only allocate flush merge control structure sbi::sm_info::fcc_info when
flush_merge option is on, but in f2fs_issue_flush we still try to access
member of the control structure without that option, it incurs panic as
show below, fix it.
Call Trace:
__remove_ino_entry+0xa9/0xc0 [f2fs]
f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs]
f2fs_sync_file+0x18/0x20 [f2fs]
vfs_fsync_range+0x3d/0xb0
__do_page_fault+0x261/0x4d0
do_fsync+0x3d/0x70
SyS_fsync+0x10/0x20
do_syscall_64+0x6e/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f18ce260de0
RSP: 002b:
00007ffdd4589258 EFLAGS:
00000246 ORIG_RAX:
000000000000004a
RAX:
ffffffffffffffda RBX:
0000000000000001 RCX:
00007f18ce260de0
RDX:
0000000000000006 RSI:
00000000016c0360 RDI:
0000000000000003
RBP:
00000000016c0360 R08:
000000000000ffff R09:
000000000000001f
R10:
00007ffdd4589020 R11:
0000000000000246 R12:
00000000016c0100
R13:
0000000000000000 R14:
00000000016c1f00 R15:
00000000016c0100
Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4
RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP:
ffffc90003b5fd78
CR2:
0000000000000020
---[ end trace
a09314c24f037648 ]---
Reported-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Wed, 25 Jan 2017 02:52:39 +0000 (10:52 +0800)]
f2fs: fix to avoid overflow when left shifting page offset
We use following method to calculate size with current page index:
size = index << PAGE_SHIFT
If type of index has only 32-bits size, left shifting will incur overflow,
which makes result incorrect.
So let's cast index with 64-bits type to avoid such issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 24 Jan 2017 12:39:51 +0000 (20:39 +0800)]
f2fs: enhance lookup xattr
Previously, in getxattr we will load all entries both in inline xattr and
xattr node block, and then do the lookup in all entries, but our lookup
flow shows low efficiency, since if we can lookup and hit in inline xattr
of inode page cache first, we don't need to load and lookup xattr node
block, which can obviously save cpu time and IO latency.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: initialize NULL to avoid warning]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Masanari Iida [Tue, 24 Jan 2017 03:47:55 +0000 (12:47 +0900)]
Doc: f2fs: Fix typo in Documentation/filesystems/f2fs.txt
This patch fix a typo in f2fs.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wei Fang [Sun, 22 Jan 2017 04:21:02 +0000 (12:21 +0800)]
f2fs: fix a dead loop in f2fs_fiemap()
A dead loop can be triggered in f2fs_fiemap() using the test case
as below:
...
fd = open();
fallocate(fd, 0, 0,
4294967296);
ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
...
It's caused by an overflow in __get_data_block():
...
bh->b_size = map.m_len << inode->i_blkbits;
...
map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits
on 64 bits archtecture, type conversion from an unsigned int to a size_t
will result in an overflow.
In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap()
will call get_data_block() at block 0 again an again.
Fix this by adding a force conversion before left shift.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 13 Jan 2017 21:12:29 +0000 (13:12 -0800)]
f2fs: do not preallocate blocks which has wrong buffer
Sheng Yong reports needless preallocation if write(small_buffer, large_size)
is called.
In that case, f2fs preallocates large_size, but vfs returns early due to
small_buffer size. Let's detect it before preallocation phase in f2fs.
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 11 Jan 2017 18:20:04 +0000 (10:20 -0800)]
f2fs: show # of on-going flush and discard bios
This patch adds stat information for flush and discard commands.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Tue, 10 Jan 2017 04:32:07 +0000 (20:32 -0800)]
f2fs: add a kernel thread to issue discard commands asynchronously
This patch adds a kernel thread to issue discard commands.
It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
bio status.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 11 Jan 2017 22:40:24 +0000 (14:40 -0800)]
f2fs: factor out discard command info into discard_cmd_control
This patch adds discard_cmd_control with the existing discarding controls.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 11 Jan 2017 18:21:15 +0000 (10:21 -0800)]
f2fs: reorganize stat information
This patch modifies stat information more clearly.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Mon, 9 Jan 2017 22:13:03 +0000 (14:13 -0800)]
f2fs: clean up flush/discard command namings
This patch simply cleans up the names for flush/discard commands.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 7 Jan 2017 10:52:34 +0000 (18:52 +0800)]
f2fs: check in-memory sit version bitmap
This patch adds a mirror for sit version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 7 Jan 2017 10:52:01 +0000 (18:52 +0800)]
f2fs: check in-memory nat version bitmap
This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 7 Jan 2017 10:51:01 +0000 (18:51 +0800)]
f2fs: check in-memory block bitmap
This patch adds a mirror for valid block bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 7 Jan 2017 10:50:26 +0000 (18:50 +0800)]
f2fs: introduce FI_ATOMIC_COMMIT
This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 7 Jan 2017 10:49:42 +0000 (18:49 +0800)]
f2fs: clean up with list_{first, last}_entry
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 30 Dec 2016 06:06:15 +0000 (22:06 -0800)]
f2fs: return fs_trim if there is no candidate
If there is no candidate to submit discard command during f2fs_trim_fs, let's
return without checkpoint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Fri, 30 Dec 2016 00:58:54 +0000 (16:58 -0800)]
f2fs: avoid needless checkpoint in f2fs_trim_fs
The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated
data blocks only which didn't change the critical checkpoint data such as nat
and sit entries.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 29 Dec 2016 22:07:53 +0000 (14:07 -0800)]
f2fs: relax async discard commands more
This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.
Test on initial partition of nvme drive.
# time fstrim /mnt/test
Before : 6.158s
After : 4.822s
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 4 Jan 2017 01:19:30 +0000 (17:19 -0800)]
f2fs: drop exist_data for inline_data when truncated to 0
A test program gets the SEEK_DATA with two values between
a new created file and the exist file on f2fs filesystem.
F2FS filesystem, (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 8192)
SEEK_DATA size != 0 (offset = 4096)
PNFS filesystem, (the first "test1" is a new file)
SEEK_DATA size != 0 (offset = 4096)
SEEK_DATA size != 0 (offset = 4096)
int main(int argc, char **argv)
{
char *filename = argv[1];
int offset = 1, i = 0, fd = -1;
if (argc < 2) {
printf("Usage: %s f2fsfilename\n", argv[0]);
return -1;
}
/*
if (!access(filename, F_OK) || errno != ENOENT) {
printf("Needs a new file for test, %m\n");
return -1;
}*/
fd = open(filename, O_RDWR | O_CREAT, 0777);
if (fd < 0) {
printf("Create test file %s failed, %m\n", filename);
return -1;
}
for (i = 0; i < 20; i++) {
offset = 1 << i;
ftruncate(fd, 0);
lseek(fd, offset, SEEK_SET);
write(fd, "test", 5);
/* Get the alloc size by seek data equal zero*/
if (lseek(fd, 0, SEEK_DATA)) {
printf("SEEK_DATA size != 0 (offset = %d)\n", offset);
break;
}
}
close(fd);
return 0;
}
Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 29 Dec 2016 01:31:15 +0000 (17:31 -0800)]
f2fs: don't allow encrypted operations without keys
This patch fixes the renaming bug on encrypted filenames, which was pointed by
(ext4: don't allow encrypted operations without keys)
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 28 Dec 2016 21:55:09 +0000 (13:55 -0800)]
f2fs: show the max number of atomic operations
This patch adds to show the max number of atomic operations which are
conducting concurrently.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 22 Dec 2016 01:09:19 +0000 (17:09 -0800)]
f2fs: get io size bit from mount option
This patch adds to set io_size_bits from mount option.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 14 Dec 2016 18:12:56 +0000 (10:12 -0800)]
f2fs: support IO alignment for DATA and NODE writes
This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.
Note that,
- it requires "-o mode=lfs".
- IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
- read IO is still 4KB.
- do checkpoint at fsync, if dummy NODE page was written.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 21 Dec 2016 20:13:03 +0000 (12:13 -0800)]
f2fs: add submit_bio tracepoint
This patch adds final submit_bio() tracepoint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 21 Dec 2016 22:26:39 +0000 (14:26 -0800)]
f2fs: fix wrong tracepoints for op and op_flags
This patch fixes wrong tracepoints in terms of op and op_flags.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Wed, 21 Dec 2016 19:51:32 +0000 (11:51 -0800)]
f2fs: reassign new segment for mode=lfs
Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Thu, 22 Dec 2016 03:46:24 +0000 (11:46 +0800)]
f2fs: fix a missing discard prefree segments
If userspace issue a fstrim with a range not involve prefree segments,
it will reuse these segments without discard. This patch fix it.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Geliang Tang [Tue, 20 Dec 2016 13:57:42 +0000 (21:57 +0800)]
f2fs: use rb_entry_safe
Use rb_entry_safe() instead of open-coding it.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Tue, 20 Dec 2016 03:11:35 +0000 (11:11 +0800)]
f2fs: add a case of no need to read a page in write begin
If the range we write cover the whole valid data in the last page,
we do not need to read it.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Mon, 19 Dec 2016 12:10:48 +0000 (20:10 +0800)]
f2fs: fix a problem of using memory after free
This patch fix a problem of using memory after free
in function __try_merge_extent_node.
Fixes: 0f825ee6e873 ("f2fs: add new interfaces for extent tree")
Cc: <stable@vger.kernel.org>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Dan Carpenter [Fri, 16 Dec 2016 08:18:15 +0000 (11:18 +0300)]
f2fs: remove unneeded condition
We checked that "inode" is not an error pointer earlier so there is
no need to check again here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Tue, 13 Dec 2016 10:54:59 +0000 (18:54 +0800)]
f2fs: don't cache nat entry if out of memory
If we run out of memory, in cache_nat_entry, it's better to avoid loop
for allocating memory to cache nat entry, so in low memory scenario, for
read path of node block, I expect this can avoid unneeded latency.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Yunlei He [Tue, 13 Dec 2016 09:23:37 +0000 (17:23 +0800)]
f2fs: remove unused values in recover_fsync_data
This patch remove unused values in function recover_fsync_data
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Linus Torvalds [Sat, 28 Jan 2017 23:09:23 +0000 (15:09 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two I2C driver bugfixes.
The 'VLLS mode support' patch should have been entitled 'reconfigure
pinctrl after suspend' to make the bugfix more clear. Sorry, I missed
that, yet didn't want to rebase"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: add VLLS mode support
i2c: i2c-cadence: Initialize configuration before probing devices
Linus Torvalds [Sat, 28 Jan 2017 19:50:17 +0000 (11:50 -0800)]
Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Stable patches:
- NFSv4.1: Fix a deadlock in layoutget
- NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
- NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
- Fix a memory leak when removing the SUNRPC module
Bugfixes:
- Fix a reference leak in _pnfs_return_layout"
* tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
pNFS: Fix a reference leak in _pnfs_return_layout
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
SUNRPC: cleanup ida information when removing sunrpc module
NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
NFSv4.1: Fix a deadlock in layoutget
Linus Torvalds [Sat, 28 Jan 2017 19:09:04 +0000 (11:09 -0800)]
Merge tag 'md/4.10-rc6' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This fixes several corner cases for raid5 cache, which is merged into
this cycle"
* tag 'md/4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/r5cache: disable write back for degraded array
md/r5cache: shift complex rmw from read path to write path
md/r5cache: flush data only stripes in r5l_recovery_log()
md/raid5: move comment of fetch_block to right location
md/r5cache: read data into orig_page for prexor of cached data
md/raid5-cache: delete meaningless code
Linus Torvalds [Sat, 28 Jan 2017 19:06:42 +0000 (11:06 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix kernel panic on ACPI-based systems where CPU capacity description
is not currently handled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: skip register_cpufreq_notifier on ACPI-based systems
Linus Torvalds [Sat, 28 Jan 2017 19:00:08 +0000 (11:00 -0800)]
Merge tag 'arc-4.10-rc6' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"Hopefully last set of changes for ARC for 4.10:
- fix for unaligned access emulation corner case
- fix for udelay loop inline asm regression
- fix irq affinity finally for AXS103 board [Yuriy]
- final fixes for setting IO-coherency sanely in SMP"
* tag 'arc-4.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [arcompact] handle unaligned access delay slot corner case
ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncached
ARC: smp-boot: Decouple Non masters waiting API from jump to entry point
ARCv2: MCIP: update the BCR per current changes
ARC: udelay: fix inline assembler by adding LP_COUNT to clobber list
ARCv2: MCIP: Deprecate setting of affinity in Device Tree
Linus Torvalds [Fri, 27 Jan 2017 20:54:16 +0000 (12:54 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
DF on transmit).
2) Netfilter needs to reflect the fwmark when sending resets, from Pau
Espin Pedrol.
3) nftable dump OOPS fix from Liping Zhang.
4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
from Rolf Neugebauer.
5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
Bergmann.
6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
from Eric Dumazet.
7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.
8) tcp_fastopen_create_child() needs to setup tp->max_window, from
Alexey Kodanev.
9) Missing kmemdup() failure check in ipv6 segment routing code, from
Eric Dumazet.
10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
with splice. From WANG Cong.
11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
therefore callers must reload cached header pointers into that skb.
Fix from Eric Dumazet.
12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
Tobias Regnery.
13) Do not allow lwtunnel drivers to be unloaded while they are
referenced by active instances, from Robert Shearman.
14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.
15) Fix a few regressions from virtio_net XDP support, from John
Fastabend and Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
ISDN: eicon: silence misleading array-bounds warning
net: phy: micrel: add support for KSZ8795
gtp: fix cross netns recv on gtp socket
gtp: clear DF bit on GTP packet tx
gtp: add genl family modules alias
tcp: don't annotate mark on control socket from tcp_v6_send_response()
ravb: unmap descriptors when freeing rings
virtio_net: reject XDP programs using header adjustment
virtio_net: use dev_kfree_skb for small buffer XDP receive
r8152: check rx after napi is enabled
r8152: re-schedule napi for tx
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: avoid start_xmit to call napi_schedule during autosuspend
net: dsa: Bring back device detaching in dsa_slave_suspend()
net: phy: leds: Fix truncated LED trigger names
net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
net-next: ethernet: mediatek: change the compatible string
Documentation: devicetree: change the mediatek ethernet compatible string
bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
...
Linus Torvalds [Fri, 27 Jan 2017 20:44:32 +0000 (12:44 -0800)]
Merge tag 'xfs-for-linus-4.10-rc6-5' of git://git./fs/xfs/xfs-linux
Pull xfs uodates from Darrick Wong:
"I have some more fixes this week: better input validation, corruption
avoidance, build fixes, memory leak fixes, and a couple from Christoph
to avoid an ENOSPC failure.
Summary:
- Fix race conditions in the CoW code
- Fix some incorrect input validation checks
- Avoid crashing fs by running out of space when freeing inodes
- Fix toctou race wrt whether or not an inode has an attr
- Fix build error on arm
- Fix page refcount corruption when readahead fails
- Don't corrupt userspace in the bmap ioctl"
* tag 'xfs-for-linus-4.10-rc6-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: prevent quotacheck from overloading inode lru
xfs: fix bmv_count confusion w/ shared extents
xfs: clear _XBF_PAGES from buffers when readahead page
xfs: extsize hints are not unlikely in xfs_bmap_btalloc
xfs: remove racy hasattr check from attr ops
xfs: use per-AG reservations for the finobt
xfs: only update mount/resv fields on success in __xfs_ag_resv_init
xfs: verify dirblocklog correctly
xfs: fix COW writeback race
Linus Torvalds [Fri, 27 Jan 2017 20:41:46 +0000 (12:41 -0800)]
Merge branch 'for-linus-4.10' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"Some fixes that we've collected from the list.
We still have one more pending to nail down a regression in lzo
compression, but I wanted to get this batch out the door"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: remove ->{get, set}_acl() from btrfs_dir_ro_inode_operations
Btrfs: disable xattr operations on subvolume directories
Btrfs: remove old tree_root case in btrfs_read_locked_inode()
Btrfs: fix truncate down when no_holes feature is enabled
Btrfs: Fix deadlock between direct IO and fast fsync
btrfs: fix false enospc error when truncating heavily reflinked file
Linus Torvalds [Fri, 27 Jan 2017 20:36:39 +0000 (12:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes for this series. This contains:
- Set of fixes for the nvme target code
- A revert of patch from this merge window, causing a regression with
WRITE_SAME on iSCSI targets at least.
- A fix for a use-after-free in the new O_DIRECT bdev code.
- Two fixes for the xen-blkfront driver"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "sd: remove __data_len hack for WRITE SAME"
nvme-fc: use blk_rq_nr_phys_segments
nvmet-rdma: Fix missing dma sync to nvme data structures
nvmet: Call fatal_error from keep-alive timout expiration
nvmet: cancel fatal error and flush async work before free controller
nvmet: delete controllers deletion upon subsystem release
nvmet_fc: correct logic in disconnect queue LS handling
block: fix use after free in __blkdev_direct_IO
xen-blkfront: correct maximum segment accounting
xen-blkfront: feature flags handling adjustments
Linus Torvalds [Fri, 27 Jan 2017 20:29:30 +0000 (12:29 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Second round of -rc fixes for 4.10.
This -rc cycle has been slow for the rdma subsystem. I had already
sent you the first batch before the Holiday break. After that, we kept
only getting a few here or there. Up until this week, when I got a
drop of 13 to one driver (qedr). So, here's the -rc patches I have. I
currently have none held in reserve, so unless something new comes in,
this is it until the next merge window opens.
Summary:
- series of iw_cxgb4 fixes to make it work with the drain cq API
- one or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem,
rxe, and ipoib
- one big series (13 patches) for the new qedr driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
IB/rxe: Prevent from completer to operate on non valid QP
IB/rxe: Fix rxe dev insertion to rxe_dev_list
IB/umem: Release pid in error and ODP flow
RDMA/qedr: Dispatch port active event from qedr_add
RDMA/qedr: Fix and simplify memory leak in PD alloc
RDMA/qedr: Fix RDMA CM loopback
RDMA/qedr: Fix formatting
RDMA/qedr: Mark three functions as static
RDMA/qedr: Don't reset QP when queues aren't flushed
RDMA/qedr: Don't spam dmesg if QP is in error state
RDMA/qedr: Remove CQ spinlock from CM completion handlers
RDMA/qedr: Return max inline data in QP query result
RDMA/qedr: Return success when not changing QP state
RDMA/qedr: Add uapi header qedr-abi.h
RDMA/qedr: Fix MTU returned from QP query
RDMA/core: Add the function ib_mtu_int_to_enum
IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
IB/vmw_pvrdma: Don't leak info from alloc_ucontext
IB/cxgb3: fix misspelling in header guard
...
Linus Torvalds [Fri, 27 Jan 2017 20:25:26 +0000 (12:25 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Another two bug fixes:
- ptrace partial write information leak
- a guest page hinting regression introduced with v4.6"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Fix cmma unused transfer from pgste into pte
s390/ptrace: Preserve previous registers for short regset write
Linus Torvalds [Fri, 27 Jan 2017 20:17:07 +0000 (12:17 -0800)]
Merge branch 'stable/for-linus-4.10' of git://git./linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"An ARM fix in the Xen SWIOTLB - mainly the translation of physical to
bus addresses was done just a tad too late"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb-xen: update dev_addr after swapping pages
Linus Torvalds [Fri, 27 Jan 2017 20:10:58 +0000 (12:10 -0800)]
Merge tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
"mdev IOMMU groups are not yet compatible with the powerpc SPAPR IOMMU
backend, detect and fail group attach (Greg Kurz)"
* tag 'vfio-v4.10-rc6' of git://github.com/awilliam/linux-vfio:
vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
Jack Morgenstein [Sun, 15 Jan 2017 18:15:00 +0000 (20:15 +0200)]
RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
If IPV6 has not been enabled in the underlying kernel, we must avoid
calling IPV6 procedures in rdma_cm.ko.
This requires using "IS_ENABLED(CONFIG_IPV6)" in "if" statements
surrounding any code which calls external IPV6 procedures.
In the instance fixed here, procedure cma_bind_addr() called
ipv6_addr_type() -- which resulted in calling external procedure
__ipv6_addr_type().
Fixes: 6c26a77124ff ("RDMA/cma: fix IPv6 address resolution")
Cc: <stable@vger.kernel.org> # v4.2+
Cc: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jens Axboe [Fri, 27 Jan 2017 18:56:06 +0000 (11:56 -0700)]
Merge branch 'stable/for-jens-4.10' of git://git./linux/kernel/git/konrad/xen into for-linus
Konrad writes:
Please pull in your 'for-linus' branch two little fixes for Xen
block front:
One fix is for handling the XEN_PAGE_SIZE != PAGE_SIZE (4KB vs 64KB
on ARM for example) mishandling while the other is fixing
the accounting for the configuration changes.
Vineet Gupta [Fri, 27 Jan 2017 18:45:27 +0000 (10:45 -0800)]
ARC: [arcompact] handle unaligned access delay slot corner case
After emulating an unaligned access in delay slot of a branch, we
pretend as the delay slot never happened - so return back to actual
branch target (or next PC if branch was not taken).
Curently we did this by handling STATUS32.DE, we also need to clear the
BTA.T bit, which is disregarded when returning from original misaligned
exception, but could cause weirdness if it took the interrupt return
path (in case interrupt was acive too)
One ARC700 customer ran into this when enabling unaligned access fixup
for kernel mode accesses as well
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Linus Torvalds [Fri, 27 Jan 2017 18:29:33 +0000 (10:29 -0800)]
Merge tag 'media/v4.10-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix a regression on tvp5150 causing failures at input selection and
image glitches
- CEC was moved out of staging for v4.10. Fix some bugs on it while not
too late
- fix a regression on pctv452e caused by VM stack changes
- fix suspend issued with smiapp
- fix a regression on cobalt driver
- fix some warnings and Kconfig issues with some random configs.
* tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] s5k4ecgx: select CRC32 helper
[media] dvb: avoid warning in dvb_net
[media] v4l: tvp5150: Don't override output pinmuxing at stream on/off time
[media] v4l: tvp5150: Fix comment regarding output pin muxing
[media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers
[media] pctv452e: move buffer to heap, no mutex
[media] media/cobalt: use pci_irq_allocate_vectors
[media] cec: fix race between configuring and unconfiguring
[media] cec: move cec_report_phys_addr into cec_config_thread_func
[media] cec: replace cec_report_features by cec_fill_msg_report_features
[media] cec: update log_addr[] before finishing configuration
[media] cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
[media] cec: when canceling a message, don't overwrite old status info
[media] cec: fix report_current_latency
[media] smiapp: Make suspend and resume functions __maybe_unused
[media] smiapp: Implement power-on and power-off sequences without runtime PM
Linus Torvalds [Fri, 27 Jan 2017 18:25:31 +0000 (10:25 -0800)]
Merge tag 'mmc-v4.10-rc5' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"MMC host: fix runtime PM resume path in dw_mmc"
* tag 'mmc-v4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: dw_mmc: force setup bus if active slots exist
Linus Torvalds [Fri, 27 Jan 2017 18:22:00 +0000 (10:22 -0800)]
Merge branch 'for-rc' of git://git./linux/kernel/git/rzhang/linux
Pull thermal management fix from Zhang Rui:
"A single revert from a recently introduced problem.
Specifics:
Commit
7611fb68062f ("thermal: thermal_hwmon: Convert to
hwmon_device_register_with_info()"), which was introduced in 4.10-rc5,
uses new hwmon API. But this breaks some soc thermal driver because
the new hwmon API has a strict rule for the hwmon device name. Revert
the offending commit as a quick solution for 4.10"
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
Revert "thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()"
Brian Foster [Thu, 26 Jan 2017 21:18:09 +0000 (13:18 -0800)]
xfs: prevent quotacheck from overloading inode lru
Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.
Commit
17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release. The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.
Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.
CC: stable@vger.kernel.org # v4.9
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Arnd Bergmann [Fri, 27 Jan 2017 12:32:14 +0000 (13:32 +0100)]
ISDN: eicon: silence misleading array-bounds warning
With some gcc versions, we get a warning about the eicon driver,
and that currently shows up as the only remaining warning in one
of the build bots:
In file included from ../drivers/isdn/hardware/eicon/message.c:30:0:
eicon/message.c: In function 'mixer_notify_update':
eicon/platform.h:333:18: warning: array subscript is above array bounds [-Warray-bounds]
The code is easily changed to open-code the unusual PUT_WORD() line
causing this to avoid the warning.
Cc: stable@vger.kernel.org
Link: http://arm-soc.lixom.net/buildlogs/stable-rc/v4.4.45/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Nyekjaer [Fri, 27 Jan 2017 07:46:23 +0000 (08:46 +0100)]
net: phy: micrel: add support for KSZ8795
This is adds support for the PHYs in the KSZ8795 5port managed switch.
It will allow to detect the link between the switch and the soc
and uses the same read_status functions as the KSZ8873MLL switch.
Signed-off-by: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Jan 2017 15:39:10 +0000 (10:39 -0500)]
Merge branch 'gtp-fixes'
Andreas Schultz says:
====================
various gtp fixes
I'm sorry for the compile error mess up in the last version.
It's no excuse for not test compiling, but the hunks got lost in
a rebase.
This is the part of the previous "simple gtp improvements" series
that Pablo indicated should go into net.
The addition of the module alias fixes genl family autoloading,
clearing the DF bit fixes a protocol violation in regard to the
specification and the netns comparison fixes a corner case of
cross netns recv.
v2->v3: fix compiler error introduced in rebase
====================
Signed-off-by: David S. Miller <davem@davemloft.net>