git.openwrt.org Git - openwrt/staging/blogic.git/commit

Btrfs: skip writeback of last page when truncating file to same size

When we truncate a file to the same size and that size is not aligned
with the sector size, we end up triggering writeback (and wait for it to
complete) of the last page. This is unncessary as we can not have delayed
allocation beyond the inode's i_size and the goal of truncating a file
to its own size is to discard prealloc extents (allocated via the
fallocate(2) system call). Besides the unnecessary IO start and wait, it
also breaks the oppurtunity for larger contiguous extents on disk, as
before the last dirty page there might be other dirty pages.

This scenario is probably not very common in general, however it is
common for btrfs receive implementations because currently the send
stream always issues a truncate operation for each processed inode as
the last operation for that inode (this truncate operation is not
always needed and the send implementation will be addressed to avoid
them).

So improve this by not starting and waiting for writeback of the inode's
last page when we are truncating to exactly the same size.

The following script was used to quickly measure the time a receive
operation takes:

$ cat test_send.sh
#!/bin/bash

SRC_DEV=/dev/sdc
DST_DEV=/dev/sdd
SRC_MNT=/mnt/sdc
DST_MNT=/mnt/sdd

mkfs.btrfs -f $SRC_DEV >/dev/null
mkfs.btrfs -f $DST_DEV >/dev/null
mount $SRC_DEV $SRC_MNT
mount $DST_DEV $DST_MNT

echo "Creating source filesystem"
for ((t = 0; t < 10; t++)); do
     (
         for ((i = 1; i <= 20000; i++)); do
             xfs_io -f -c "pwrite -S 0xab 0 5000" \
                $SRC_MNT/file_$i > /dev/null
         done
     ) &
     worker_pids[$t]=$!
done
wait ${worker_pids[@]}

echo "Creating and sending snapshot"
btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
/usr/bin/time -f "send took %e seconds"    \
     btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
/usr/bin/time -f "receive took %e seconds" \
     btrfs receive -f $SRC_MNT/send_file $DST_MNT

umount $SRC_MNT
umount $DST_MNT

The results for 5 runs were the following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* With this change

average receive time was 12.51 seconds
standard deviation of 0.32 seconds

Reported-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

author	Filipe Manana <fdmanana@suse.com>
	Tue, 6 Feb 2018 20:40:31 +0000 (20:40 +0000)
committer	David Sterba <dsterba@suse.com>
	Mon, 26 Mar 2018 13:09:40 +0000 (15:09 +0200)
commit	213e8c5520ed1ecc5401a3a0f716b51a7318bda9
tree	b13cbe9e6ed1020ecdb40f531055dbb3c3a679be	tree \| snapshot
parent	ed5d5f37e653b606c93b2d5f1cdd155be6fefce0	commit \| diff