Vivek Goyal [Fri, 11 May 2018 15:49:33 +0000 (11:49 -0400)]
ovl: Enable metadata only feature
All the bits are in patches before this. So it is time to enable the
metadata only copy up feature.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:33 +0000 (11:49 -0400)]
ovl: Do not do metacopy only for ioctl modifying file attr
ovl_copy_up() by default will only do metadata only copy up (if enabled).
That means when ovl_real_ioctl() calls ovl_real_file(), it will still get
the lower file (as ovl_real_file() opens data file and not metacopy). And
that means "chattr +i" will end up modifying lower inode.
There seem to be two ways to solve this.
A. Open metacopy file in ovl_real_ioctl() and do operations on that
B. Force full copy up when FS_IOC_SETFLAGS is called.
I am resorting to option B for now as it feels little safer option. If
there are performance issues due to this, we can revisit it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:33 +0000 (11:49 -0400)]
ovl: Do not do metadata only copy-up for truncate operation
truncate should copy up full file (and not do metacopy only), otherwise it
will be broken. For example, use truncate to increase size of a file so
that any read beyong existing size will return null bytes. If we don't
copy up full file, then we end up opening lower file and read from it only
reads upto the old size (and not new size after truncate). Hence to avoid
such situations, copy up data as well when file size changes.
So far it was being done by d_real(O_WRONLY) call in truncate() path. Now
that patch has been reverted. So force full copy up in ovl_setattr() if
size of file is changing.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:33 +0000 (11:49 -0400)]
ovl: add helper to force data copy-up
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Check redirect on index as well
Right now we seem to check redirect only if upperdentry is found. But it
is possible that there is no upperdentry but later we found an index.
We need to check redirect on index as well and set it in
ovl_inode->redirect. Otherwise link code can assume that dentry does not
have redirect and place a new one which breaks things. In my testing
overlay/033 test started failing in xfstests. Following are the details.
For example do following.
$ mkdir lower upper work merged
- Make lower dir with 4 links.
$ echo "foo" > lower/l0.txt
$ ln lower/l0.txt lower/l1.txt
$ ln lower/l0.txt lower/l2.txt
$ ln lower/l0.txt lower/l3.txt
- Mount with index on and metacopy on.
$ mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work,\
index=on,metacopy=on none merged
- Link lower
$ ln merged/l0.txt merged/l4.txt
(This will metadata copy up of l0.txt and put an absolute redirect
/l0.txt)
$ echo 2 > /proc/sys/vm/drop/caches
$ ls merged/l1.txt
(Now l1.txt will be looked up. There is no upper dentry but there is
lower dentry and index will be found. We don't check for redirect on
index, hence ovl_inode->redirect will be NULL.)
- Link Upper
$ ln merged/l4.txt merged/l5.txt
(Lookup of l4.txt will use inode from l1.txt lookup which is still in
cache. It has ovl_inode->redirect NULL, hence link will put a new
redirect and replace /l0.txt with /l4.txt
- Drop caches.
echo 2 > /proc/sys/vm/drop_caches
- List l1.txt and it returns -ESTALE
$ ls merged/l0.txt
(It returns stale because, we found a metacopy of l0.txt in upper and it
has redirect l4.txt but there is no file named l4.txt in lower layer.
So lower data copy is not found and -ESTALE is returned.)
So problem here is that we did not process redirect on index. Check
redirect on index as well and then problem is fixed.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Set redirect on upper inode when it is linked
When we create a hardlink to a metacopy upper file, first the redirect on
that inode. Path based lookup will not work with newly created link and
redirect will solve that issue.
Also use absolute redirect as two hardlinks could be in different
directores and relative redirect will not work.
I have not put any additional locking around setting redirects while
introducing redirects for non-dir files. For now it feels like existing
locking is sufficient. If that's not the case, we will have add more
locking. Following is my rationale about why do I think current locking
seems ok.
Basic problem for non-dir files is that more than on dentry could be
pointing to same inode and in theory only relying on dentry based locks
(d->d_lock) did not seem sufficient.
We set redirect upon rename and upon link creation. In both the paths for
non-dir file, VFS locks both source and target inodes (->i_rwsem). That
means vfs rename and link operations on same source and target can't he
happening in parallel (Even if there are multiple dentries pointing to same
inode). So that probably means that at a time on an inode, only one call
of ovl_set_redirect() could be working and we don't need additional locking
in ovl_set_redirect().
ovl_inode->redirect is initialized only when inode is created new. That
means it should not race with any other path and setting
ovl_inode->redirect should be fine.
Reading of ovl_inode->redirect happens in ovl_get_redirect() path. And
this called only in ovl_set_redirect(). And ovl_set_redirect() already
seemed to be protected using ->i_rwsem. That means ovl_set_redirect() and
ovl_get_redirect() on source/target inode should not make progress in
parallel and is mutually exclusive. Hence no additional locking required.
Now, only case where ovl_set_redirect() and ovl_get_redirect() could race
seems to be case of absolute redirects where ovl_get_redirect() has to
travel up the tree. In that case we already take d->d_lock and that should
be sufficient as directories will not have multiple dentries pointing to
same inode.
So given VFS locking and current usage of redirect, current locking around
redirect seems to be ok for non-dir as well. Once we have the logic to
remove redirect when metacopy file gets copied up, then we probably will
need additional locking.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Set redirect on metacopy files upon rename
Set redirect on metacopy files upon rename. This will help find data
dentry in lower dirs.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Do not set dentry type ORIGIN for broken hardlinks
If a dentry has copy up origin, we set flag OVL_PATH_ORIGIN. So far this
decision was easy that we had to check only for oe->numlower and if it is
non-zero, we knew there is copy up origin. (For non-dir we installed
origin dentry in lowerstack[0]).
But we don't create ORGIN xattr for broken hardlinks (index=off). And with
metacopy feature it is possible that we will install lowerstack[0] but
ORIGIN xattr is not there. It is data dentry of upper metacopy dentry
which has been found using regular name based lookup or using REDIRECT. So
with addition of this new case, just presence of oe->numlower is not
sufficient to guarantee that ORIGIN xattr is present.
So to differentiate between two cases, look at OVL_CONST_INO flag. If this
flag is set and upperdentry is there, that means it can be marked as type
ORIGIN. OVL_CONST_INO is not set if lower hardlink is broken or will be
broken over copy up.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Add an inode flag OVL_CONST_INO
Add an ovl_inode flag OVL_CONST_INO. This flag signifies if inode number
will remain constant over copy up or not. This flag does not get updated
over copy up and remains unmodifed after setting once.
Next patch in the series will make use of this flag. It will basically
figure out if dentry is of type ORIGIN or not. And this can be derived by
this flag.
ORIGIN = (upperdentry && ovl_test_flag(OVL_CONST_INO, inode)).
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:32 +0000 (11:49 -0400)]
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
Right now OVL_PATH_MERGE is used only for merged directories. But
conceptually, a metacopy dentry (backed by a lower data dentry) is a merged
entity as well.
So mark metacopy dentries as OVL_PATH_MERGE and ovl_rename() makes use of
this property later to set redirect on a metacopy file.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:31 +0000 (11:49 -0400)]
ovl: Check redirects for metacopy files
Right now we rely on path based lookup for data origin of metacopy upper.
This will work only if upper has not been renamed. We solved this problem
already for merged directories using redirect. Use same logic for metacopy
files.
This patch just goes on to check redirects for metacopy files.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:31 +0000 (11:49 -0400)]
ovl: Move some dir related ovl_lookup_single() code in else block
Move some directory related code in else block. This is pure code
reorganization and no functionality change.
Next patch enables redirect processing on metacopy files and needs this
change. By keeping non-functional changes in a separate patch, next patch
looks much smaller and cleaner.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:31 +0000 (11:49 -0400)]
ovl: Do not expose metacopy only dentry from d_real()
Metacopy dentry/inode is internal to overlay and is never exposed outside
of it. Exception is metacopy upper file used for fsync(). Modify d_real()
to look for dentries/inode which have data, but also allow matching upper
inode without data for the fsync case.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:31 +0000 (11:49 -0400)]
ovl: Open file with data except for the case of fsync
ovl_open() should open file which contains data and not open metacopy
inode. With the introduction of metacopy inodes, with current
implementaion we will end up opening metacopy inode as well.
But there can be certain circumstances like ovl_fsync() where we want to
allow opening a metacopy inode instead.
Hence, change ovl_open_realfile() and and add extra parameter which
specifies whether to allow opening metacopy inode or not. If this
parameter is false, we look for data inode and open that.
This should allow covering both the cases.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:31 +0000 (11:49 -0400)]
ovl: Add helper ovl_inode_realdata()
Add an helper to retrieve real data inode associated with overlay inode.
This helper will ignore all metacopy inodes and will return only the real
inode which has data.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:30 +0000 (11:49 -0400)]
ovl: Store lower data inode in ovl_inode
Right now ovl_inode stores inode pointer for lower inode. This helps with
quickly getting lower inode given overlay inode (ovl_inode_lower()).
Now with metadata only copy-up, we can have metacopy inode in middle layer
as well and inode containing data can be different from ->lower. I need to
be able to open the real file in ovl_open_realfile() and for that I need to
quickly find the lower data inode.
Hence store lower data inode also in ovl_inode. Also provide an helper
ovl_inode_lowerdata() to access this field.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:30 +0000 (11:49 -0400)]
ovl: Fix ovl_getattr() to get number of blocks from lower
If an inode has been copied up metadata only, then we need to query the
number of blocks from lower and fill up the stat->st_blocks.
We need to be careful about races where we are doing stat on one cpu and
data copy up is taking place on other cpu. We want to return
stat->st_blocks either from lower or stable upper and not something in
between. Hence, ovl_has_upperdata() is called first to figure out whether
block reporting will take place from lower or upper.
We now support metacopy dentries in middle layer. That means number of
blocks reporting needs to come from lowest data dentry and this could be
different from lower dentry. Hence we end up making a separate
vfs_getxattr() call for metacopy dentries to get number of blocks.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:30 +0000 (11:49 -0400)]
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
Now we have the notion of data dentry and metacopy dentry.
ovl_dentry_lower() will return uppermost lower dentry, but it could be
either data or metacopy dentry. Now we support metacopy dentries in lower
layers so it is possible that lowerstack[0] is metacopy dentry while
lowerstack[1] is actual data dentry.
So add an helper which returns lowest most dentry which is supposed to be
data dentry.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:30 +0000 (11:49 -0400)]
ovl: Copy up meta inode data from lowest data inode
So far lower could not be a meta inode. So whenever it was time to copy up
data of a meta inode, we could copy it up from top most lower dentry.
But now lower itself can be a metacopy inode. That means data copy up
needs to take place from a data inode in metacopy inode chain. Find lower
data inode in the chain and use that for data copy up.
Introduced a helper called ovl_path_lowerdata() to find the lower data
inode chain.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:28 +0000 (11:49 -0400)]
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
This patch modifies ovl_lookup() and friends to lookup metacopy dentries.
It also allows for presence of metacopy dentries in lower layer.
During lookup, check for presence of OVL_XATTR_METACOPY and if not present,
set OVL_UPPERDATA bit in flags.
We don't support metacopy feature with nfs_export. So in nfs_export code,
we set OVL_UPPERDATA flag set unconditionally if upper inode exists.
Do not follow metacopy origin if we find a metacopy only inode and metacopy
feature is not enabled for that mount. Like redirect, this can have
security implications where an attacker could hand craft upper and try to
gain access to file on lower which it should not have to begin with.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:28 +0000 (11:49 -0400)]
ovl: Use out_err instead of out_nomem
Right now we use goto out_nomem which assumes error code is -ENOMEM. But
there are other errors returned like -ESTALE as well. So instead of
out_nomem, use out_err which will do ERR_PTR(err). That way one can put
error code in err and jump to out_err.
This just code reorganization and no change of functionality.
I am about to add more code and this organization helps laying more code
and error paths on top of it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:28 +0000 (11:49 -0400)]
ovl: A new xattr OVL_XATTR_METACOPY for file on upper
Now we will have the capability to have upper inodes which might be only
metadata copy up and data is still on lower inode. So add a new xattr
OVL_XATTR_METACOPY to distinguish between two cases.
Presence of OVL_XATTR_METACOPY reflects that file has been copied up
metadata only and and data will be copied up later from lower origin. So
this xattr is set when a metadata copy takes place and cleared when data
copy takes place.
We also use a bit in ovl_inode->flags to cache OVL_UPPERDATA which reflects
whether ovl inode has data or not (as opposed to metadata only copy up).
If a file is copied up metadata only and later when same file is opened for
WRITE, then data copy up takes place. We copy up data, remove METACOPY
xattr and then set the UPPERDATA flag in ovl_inode->flags. While all these
operations happen with oi->lock held, read side of oi->flags can be
lockless. That is another thread on another cpu can check if UPPERDATA
flag is set or not.
So this gives us an ordering requirement w.r.t UPPERDATA flag. That is, if
another cpu sees UPPERDATA flag set, then it should be guaranteed that
effects of data copy up and remove xattr operations are also visible.
For example.
CPU1 CPU2
ovl_open() acquire(oi->lock)
ovl_open_maybe_copy_up() ovl_copy_up_data()
open_open_need_copy_up() vfs_removexattr()
ovl_already_copied_up()
ovl_dentry_needs_data_copy_up() ovl_set_flag(OVL_UPPERDATA)
ovl_test_flag(OVL_UPPERDATA) release(oi->lock)
Say CPU2 is copying up data and in the end sets UPPERDATA flag. But if
CPU1 perceives the effects of setting UPPERDATA flag but not the effects of
preceding operations (ex. upper that is not fully copied up), it will be a
problem.
Hence this patch introduces smp_wmb() on setting UPPERDATA flag operation
and smp_rmb() on UPPERDATA flag test operation.
May be some other lock or barrier is already covering it. But I am not sure
what that is and is it obvious enough that we will not break it in future.
So hence trying to be safe here and introducing barriers explicitly for
UPPERDATA flag/bit.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:28 +0000 (11:49 -0400)]
ovl: Add helper ovl_already_copied_up()
There are couple of places where we need to know if file is already copied
up (in lockless manner). Right now its open coded and there are only two
conditions to check. Soon this patch series will introduce another
condition to check and Amir wants to introduce one more. So introduce a
helper instead to check this so that code is easier to read.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:27 +0000 (11:49 -0400)]
ovl: Copy up only metadata during copy up where it makes sense
If it makes sense to copy up only metadata during copy up, do it. This is
done for regular files which are not opened for WRITE.
Right now ->metacopy is set to 0 always. Last patch in the series will
remove the hard coded statement and enable metacopy feature.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:27 +0000 (11:49 -0400)]
ovl: During copy up, first copy up metadata and then data
Just a little re-ordering of code. This helps with next patch where after
copying up metadata, we skip data copying step, if needed.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:27 +0000 (11:49 -0400)]
ovl: Provide a mount option metacopy=on/off for metadata copyup
By default metadata only copy up is disabled. Provide a mount option so
that users can choose one way or other.
Also provide a kernel config and module option to enable/disable metacopy
feature.
metacopy feature requires redirect_dir=on when upper is present.
Otherwise, it requires redirect_dir=follow atleast.
As of now, metacopy does not work with nfs_export=on. So if both
metacopy=on and nfs_export=on then nfs_export is disabled.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:27 +0000 (11:49 -0400)]
ovl: Move the copy up helpers to copy_up.c
Right now two copy up helpers are in inode.c. Amir suggested it might be
better to move these to copy_up.c.
There will one more related function which will come in later patch.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Vivek Goyal [Fri, 11 May 2018 15:49:27 +0000 (11:49 -0400)]
ovl: Initialize ovl_inode->redirect in ovl_get_inode()
ovl_inode->redirect is an inode property and should be initialized in
ovl_get_inode() only when we are adding a new inode to cache. If inode is
already in cache, it is already initialized and we should not be touching
ovl_inode->redirect field.
As of now this is not a problem as redirects are used only for directories
which don't share inode. But soon I want to use redirects for regular
files also and there it can become an issue.
Hence, move ->redirect initialization in ovl_get_inode().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:44 +0000 (15:44 +0200)]
ovl: fix documentation of non-standard behavior
We can now drop description of the ro/rw inconsistency from the
documentation.
Also clarify, that now fully standard compliant behavior can be enabled
with kernel/module/mount options.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:44 +0000 (15:44 +0200)]
ovl: obsolete "check_copy_up" module option
This was provided for debugging the ro/rw inconsistecy. The inconsitency
is now gone so this option is obsolete.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:44 +0000 (15:44 +0200)]
vfs: remove open_flags from d_real()
Opening regular files on overlayfs is now handled via ovl_open(). Remove
the now unused "open_flags" argument from d_op->d_real() and the d_real()
helper.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:44 +0000 (15:44 +0200)]
Revert "fsnotify: support overlayfs"
This reverts commit
f3fbbb079263bd29ae592478de6808db7e708267.
Overlayfs now works correctly without adding hacks to fsnotify.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Partially revert "locks: fix file locking on overlayfs"
This partially reverts commit
c568d68341be7030f5647def68851e469b21ca11.
Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.
It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode(). We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.
Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
API for some time (though not in a useful way). Will try to remove
internal flags later when the dust around the new mount API settles.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "vfs: do get_write_access() on upper layer of overlayfs"
This reverts commit
4d0c5ba2ff79ef9f5188998b29fd28fcb05f3667.
We now get write access on both overlay and underlying layers so this patch
is no longer needed for correct operation.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "vfs: add flags to d_real()"
This reverts commit
495e642939114478a5237a7d91661ba93b76f15a.
No user of "flags" argument of d_real() remain.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "vfs: update ovl inode before relatime check"
This reverts commit
598e3c8f72f5b77c84d2cb26cfd936ffb3cfdbaa.
Overlayfs no longer relies on the vfs correct atime handling.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "ovl: fix relatime for directories"
This reverts commit
cd91304e7190b4c4802f8e413ab2214b233e0260.
Overlayfs no longer relies on the vfs correct atime handling.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
vfs: fix freeze protection in mnt_want_write_file() for overlayfs
The underlying real file used by overlayfs still contains the overlay path.
This results in mnt_want_write_file() calls by the filesystem getting
freeze protection on the wrong inode (the overlayfs one instead of the real
one).
Fix by using file_inode(file)->i_sb instead of file->f_path.mnt->mnt_sb.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "ovl: don't allow writing ioctl on lower layer"
This reverts commit
7c6893e3c9abf6a9676e060a1e35e5caca673d57.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:43 +0000 (15:44 +0200)]
Revert "ovl: fix may_write_real() for overlayfs directories"
This reverts commit
954c736f865d6c0c68ae4263a2f3502ee7c447a3.
Overlayfs no longer relies on the vfs for checking writability of files.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 6 Jul 2018 21:57:06 +0000 (23:57 +0200)]
vfs: don't open real
Let overlayfs do its thing when opening a file.
This enables stacking and fixes the corner case when a file is opened for
read, modified through a writable open, and data is read from the read-only
file. After this patch the read-only open will not return stale data even
in this case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add reflink/copyfile/dedup support
Since set of arguments are so similar, handle in a common helper.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add O_DIRECT support
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add ovl_fiemap()
Implement stacked fiemap().
Need to split inode operations for regular file (which has fiemap) and
special file (which doesn't have fiemap).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add lsattr/chattr support
Implement FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add ovl_fallocate()
Implement stacked fallocate.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add ovl_mmap()
Implement stacked mmap.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:42 +0000 (15:44 +0200)]
ovl: add ovl_fsync()
Implement stacked fsync().
Don't sync if lower (noticed by Amir Goldstein).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: add ovl_write_iter()
Implement stacked writes.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: add ovl_read_iter()
Implement stacked reading.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: add helper to return real file
In the common case we can just use the real file cached in
file->private_data. There are two exceptions:
1) File has been copied up since open: in this unlikely corner case just
use a throwaway real file for the operation. If ever this becomes a
perfomance problem (very unlikely, since overlayfs has been doing most fine
without correctly handling this case at all), then we can deal with that by
updating the cached real file.
2) File's f_flags have changed since open: no need to reopen the cached
real file, we can just change the flags there as well.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: stack file ops
Implement file operations on a regular overlay file. The underlying file
is opened separately and cached in ->private_data.
It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations. We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.
This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open. The patch at the end of the series will
actually enable this functionality.
AV: make it use open_with_fake_path(), don't mess with override_creds
SzM: still need to mess with override_creds() until no fs uses
current_cred() in their open method.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: deal with overlay files in ovl_d_real()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: copy up file size as well
Copy i_size of the underlying inode to the overlay inode in ovl_copyattr().
This is in preparation for stacking I/O operations on overlay files.
This patch shouldn't have any observable effect.
Remove stale comment from ovl_setattr() [spotted by Vivek Goyal].
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
Revert "Revert "ovl: get_write_access() in truncate""
This reverts commit
31c3a7069593b072bd57192b63b62f9a7e994e9a.
Re-add functionality dealing with i_writecount on truncate to overlayfs.
This patch shouldn't have any observable effects, since we just re-assert
the writecout that vfs_truncate() already got for us.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:41 +0000 (15:44 +0200)]
ovl: copy up inode flags
On inode creation copy certain inode flags from the underlying real inode
to the overlay inode.
This is in preparation for moving overlay functionality out of the VFS.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:40 +0000 (15:44 +0200)]
ovl: copy up times
Copy up mtime and ctime to overlay inode after times in real object are
modified. Be careful not to dirty cachelines when not necessary.
This is in preparation for moving overlay functionality out of the VFS.
This patch shouldn't have any observable effect.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:40 +0000 (15:44 +0200)]
vfs: export vfs_dedupe_file_range_one() to modules
This is needed by the stacked dedupe implementation in overlayfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:40 +0000 (15:44 +0200)]
vfs: export vfs_ioctl() to modules
This is needed by the stacked ioctl implementation in overlayfs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:44:40 +0000 (15:44 +0200)]
vfs: make open_with_fake_path() not contribute to nr_files
Stacking file operations in overlay will store an extra open file for each
overlay file opened.
The overhead is just that of "struct file" which is about 256bytes, because
overlay already pins an extra dentry and inode when the file is open, which
add up to a much larger overhead.
For fear of breaking working setups, don't start accounting the extra file.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Wed, 18 Jul 2018 13:39:29 +0000 (15:39 +0200)]
Merge branch 'dedupe-cleanup' into overlayfs-next
Following series for stacking overlay files depends on this mini series.
Miklos Szeredi [Wed, 18 Jul 2018 08:46:05 +0000 (10:46 +0200)]
Merge branch 'for-ovl' of git://git./linux/kernel/git/viro/vfs into overlayfs-next
This gives us the open_with_fake_path() helper that is needed for stacked
open files in overlay and mmap in particular.
Amir Goldstein [Tue, 17 Jul 2018 13:05:38 +0000 (16:05 +0300)]
ovl: fix wrong use of impure dir cache in ovl_iterate()
Only upper dir can be impure, but if we are in the middle of
iterating a lower real dir, dir could be copied up and marked
impure. We only want the impure cache if we started iterating
a real upper dir to begin with.
Aditya Kali reported that the following reproducer hits the
WARN_ON(!cache->refcount) in ovl_get_cache():
docker run --rm drupal:8.5.4-fpm-alpine \
sh -c 'cd /var/www/html/vendor/symfony && \
chown -R www-data:www-data . && ls -l .'
Reported-by: Aditya Kali <adityakali@google.com>
Tested-by: Aditya Kali <adityakali@google.com>
Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs')
Cc: <stable@vger.kernel.org> # v4.14
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Al Viro [Thu, 12 Jul 2018 15:18:42 +0000 (11:18 -0400)]
new helper: open_with_fake_path()
open a file by given inode, faking ->f_path. Use with shitloads
of caution - at the very least you'd damn better make sure that
some dentry alias of that inode is pinned down by the path in
question. Again, this is no general-purpose interface and I hope
it will eventually go away. Right now overlayfs wants something
like that, but nothing else should.
Any out-of-tree code with bright idea of using this one *will*
eventually get hurt, with zero notice and great delight on my part.
I refuse to use EXPORT_SYMBOL_GPL(), especially in situations when
it's really EXPORT_SYMBOL_DONT_USE_IT(), but don't take that export
as "you are welcome to use it".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 8 Jun 2018 17:01:49 +0000 (13:01 -0400)]
now we can fold open_check_o_direct() into do_dentry_open()
These checks are better off in do_dentry_open(); the reason we couldn't
put them there used to be that callers couldn't tell what kind of cleanup
would do_dentry_open() failure call for. Now that we have FMODE_OPENED,
cleanup is the same in all cases - it's simply fput(). So let's fold
that into do_dentry_open(), as Christoph's patch tried to.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 8 Jun 2018 16:56:55 +0000 (12:56 -0400)]
lift fput() on late failures into path_openat()
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Jul 2018 15:14:39 +0000 (11:14 -0400)]
fold put_filp() into fput()
Just check FMODE_OPENED in __fput() and be done with that...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Jul 2018 06:35:08 +0000 (02:35 -0400)]
introduce FMODE_OPENED
basically, "is that instance set up enough for regular fput(), or
do we want put_filp() for that one".
NOTE: the only alloc_file() caller that could be followed by put_filp()
is in arch/ia64/kernel/perfmon.c, which is (Kconfig-level) broken.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 10 Jul 2018 18:13:18 +0000 (14:13 -0400)]
->file_open(): lose cred argument
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 10 Jul 2018 17:25:29 +0000 (13:25 -0400)]
security_file_open(): lose cred argument
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 10 Jul 2018 17:22:28 +0000 (13:22 -0400)]
get rid of cred argument of vfs_open() and do_dentry_open()
always equal to ->f_cred
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 11 Jul 2018 19:00:04 +0000 (15:00 -0400)]
pass ->f_flags value to alloc_empty_file()
... and have it set the f_flags-derived part of ->f_mode.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 10 Jul 2018 17:12:05 +0000 (13:12 -0400)]
pass creds to get_empty_filp(), make sure dentry_open() passes the right creds
... and rename get_empty_filp() to alloc_empty_file().
dentry_open() gets creds as argument, but the only thing that sees those is
security_file_open() - file->f_cred still ends up with current_cred(). For
almost all callers it's the same thing, but there are several broken cases.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 11 Jul 2018 18:19:04 +0000 (14:19 -0400)]
alloc_file(): switch to passing O_... flags instead of FMODE_... mode
... so that it could set both ->f_flags and ->f_mode, without callers
having to set ->f_flags manually.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Jul 2018 01:45:07 +0000 (21:45 -0400)]
make sure do_dentry_open() won't return positive as an error
An ->open() instances really, really should not be doing that. There's
a lot of places e.g. around atomic_open() that could be confused by that,
so let's catch that early.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Jul 2018 06:29:58 +0000 (02:29 -0400)]
create_pipe_files(): use fput() if allocation of the second file fails
... just use put_pipe_info() to get the pipe->files down to 1 and let
fput()-called pipe_release() do freeing.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 8 Jun 2018 15:19:32 +0000 (11:19 -0400)]
turn filp_clone_open() into inline wrapper for dentry_open()
it's exactly the same thing as
dentry_open(&file->f_path, file->f_flags, file->f_cred)
... and rename it to file_clone_open(), while we are at it.
'filp' naming convention is bogus; sure, it's "file pointer",
but we generally don't do that kind of Hungarian notation.
Some of the instances have too many callers to touch, but this
one has only two, so let's sanitize it while we can...
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 9 Jul 2018 15:24:21 +0000 (11:24 -0400)]
fold security_file_free() into file_free()
.. and the call of file_free() in case of security_file_alloc() failure
in get_empty_filp() should be simply file_free_rcu() - no point in
rcu-delays there, anyway.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 17 Jun 2018 16:38:17 +0000 (12:38 -0400)]
ocxlflash_getfile(): fix double-iput() on alloc_file() failures
Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 9 Jun 2018 13:43:13 +0000 (09:43 -0400)]
cxl_getfile(): fix double-iput() on alloc_file() failures
Doing iput() after path_put() is wrong.
Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 8 Jun 2018 15:17:54 +0000 (11:17 -0400)]
drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()
Failure of ->open() should *not* be followed by fput(). Fixed by
using filp_clone_open(), which gets the cleanups right.
Cc: stable@vger.kernel.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Miklos Szeredi [Fri, 6 Jul 2018 21:57:03 +0000 (23:57 +0200)]
vfs: dedupe: extract helper for a single dedup
Extract vfs_dedupe_file_range_one() helper to deal with a single dedup
request.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Miklos Szeredi [Fri, 6 Jul 2018 21:57:03 +0000 (23:57 +0200)]
vfs: dedupe: rationalize args
Clean up f_op->dedupe_file_range() interface.
1) Use loff_t for offsets and length instead of u64
2) Order the arguments the same way as {copy|clone}_file_range().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Miklos Szeredi [Fri, 6 Jul 2018 21:57:03 +0000 (23:57 +0200)]
vfs: dedupe: return int
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Miklos Szeredi [Fri, 6 Jul 2018 21:57:02 +0000 (23:57 +0200)]
vfs: limit size of dedupe
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Chunyu Hu [Sat, 9 Jun 2018 19:51:24 +0000 (03:51 +0800)]
proc: add proc_seq_release
kmemleak reported some memory leak on reading proc files. After adding
some debug lines, find that proc_seq_fops is using seq_release as
release handler, which won't handle the free of 'private' field of
seq_file, while in fact the open handler proc_seq_open could create
the private data with __seq_open_private when state_size is greater
than zero. So after reading files created with proc_create_seq_private,
such as /proc/timer_list and /proc/vmallocinfo, the private mem of a
seq_file is not freed. Fix it by adding the paired proc_seq_release
as the default release handler of proc_seq_ops instead of seq_release.
Fixes: 44414d82cfe0 ("proc: introduce proc_create_seq_private")
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 16 Jun 2018 23:04:49 +0000 (08:04 +0900)]
Linux 4.18-rc1
Linus Torvalds [Sat, 16 Jun 2018 20:37:55 +0000 (05:37 +0900)]
Merge tag 'for-linus-
20180616' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into -rc1. This contains:
- bsg_open vs bsg_unregister race fix (Anatoliy)
- NVMe pull request from Christoph, with fixes for regressions in
this window, FC connect/reconnect path code unification, and a
trace point addition.
- timeout fix (Christoph)
- remove a few unused functions (Christoph)
- blk-mq tag_set reinit fix (Roman)"
* tag 'for-linus-
20180616' of git://git.kernel.dk/linux-block:
bsg: fix race of bsg_open and bsg_unregister
block: remov blk_queue_invalidate_tags
nvme-fabrics: fix and refine state checks in __nvmf_check_ready
nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
nvme-fabrics: refactor queue ready check
blk-mq: remove blk_mq_tagset_iter
nvme: remove nvme_reinit_tagset
nvme-fc: fix nulling of queue data on reconnect
nvme-fc: remove reinit_request routine
blk-mq: don't time out requests again that are in the timeout handler
nvme-fc: change controllers first connect to use reconnect path
nvme: don't rely on the changed namespace list log
nvmet: free smart-log buffer after use
nvme-rdma: fix error flow during mapping request data
nvme: add bio remapping tracepoint
nvme: fix NULL pointer dereference in nvme_init_subsystem
blk-mq: reinit q->tag_set_list entry only after grace period
Linus Torvalds [Sat, 16 Jun 2018 20:25:18 +0000 (05:25 +0900)]
Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental
Pull documentation fixes from Mauro Carvalho Chehab:
"This solves a series of broken links for files under Documentation,
and improves a script meant to detect such broken links (see
scripts/documentation-file-ref-check).
The changes on this series are:
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- improve the scripts/documentation-file-ref-check script, in order
to help detecting/fixing broken references, preventing
false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check"
* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
fix a series of Documentation/ broken file name references
Documentation: rstFlatTable.py: fix a broken reference
ABI: sysfs-devices-system-cpu: remove a broken reference
devicetree: fix a series of wrong file references
devicetree: fix name of pinctrl-bindings.txt
devicetree: fix some bindings file names
MAINTAINERS: fix location of DT npcm files
MAINTAINERS: fix location of some display DT bindings
kernel-parameters.txt: fix pointers to sound parameters
bindings: nvmem/zii: Fix location of nvmem.txt
docs: Fix more broken references
scripts/documentation-file-ref-check: check tools/*/Documentation
scripts/documentation-file-ref-check: get rid of false-positives
scripts/documentation-file-ref-check: hint: dash or underline
scripts/documentation-file-ref-check: add a fix logic for DT
scripts/documentation-file-ref-check: accept more wildcards at filenames
scripts/documentation-file-ref-check: fix help message
media: max2175: fix location of driver's companion documentation
media: v4l: fix broken video4linux docs locations
media: dvb: point to the location of the old README.dvb-usb file
...
Linus Torvalds [Sat, 16 Jun 2018 20:06:18 +0000 (05:06 +0900)]
Merge tag 'fsnotify_for_v4.18-rc1' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"fsnotify cleanups unifying handling of different watch types.
This is the shortened fsnotify series from Amir with the last five
patches pulled out. Amir has modified those patches to not change
struct inode but obviously it's too late for those to go into this
merge window"
* tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: add fsnotify_add_inode_mark() wrappers
fanotify: generalize fanotify_should_send_event()
fsnotify: generalize send_to_group()
fsnotify: generalize iteration of marks by object type
fsnotify: introduce marks iteration helpers
fsnotify: remove redundant arguments to handle_event()
fsnotify: use type id to identify connector object type
Linus Torvalds [Sat, 16 Jun 2018 20:00:24 +0000 (05:00 +0900)]
Merge tag 'fbdev-v4.18' of git://github.com/bzolnier/linux
Pull fbdev updates from Bartlomiej Zolnierkiewicz:
"There is nothing really major here, few small fixes, some cleanups and
dead drivers removal:
- mark omapfb drivers as orphans in MAINTAINERS file (Tomi Valkeinen)
- add missing module license tags to omap/omapfb driver (Arnd
Bergmann)
- add missing GPIOLIB dependendy to omap2/omapfb driver (Arnd
Bergmann)
- convert savagefb, aty128fb & radeonfb drivers to use msleep & co.
(Jia-Ju Bai)
- allow COMPILE_TEST build for viafb driver (media part was reviewed
by media subsystem Maintainer)
- remove unused MERAM support from sh_mobile_lcdcfb and shmob-drm
drivers (drm parts were acked by shmob-drm driver Maintainer)
- remove unused auo_k190xfb drivers
- misc cleanups (Souptick Joarder, Wolfram Sang, Markus Elfring, Andy
Shevchenko, Colin Ian King)"
* tag 'fbdev-v4.18' of git://github.com/bzolnier/linux: (26 commits)
fb_omap2: add gpiolib dependency
video/omap: add module license tags
MAINTAINERS: make omapfb orphan
video: fbdev: pxafb: match_string() conversion fixup
video: fbdev: nvidia: fix spelling mistake: "scaleing" -> "scaling"
video: fbdev: fix spelling mistake: "frambuffer" -> "framebuffer"
video: fbdev: pxafb: Convert to use match_string() helper
video: fbdev: via: allow COMPILE_TEST build
video: fbdev: remove unused sh_mobile_meram driver
drm: shmobile: remove unused MERAM support
video: fbdev: sh_mobile_lcdcfb: remove unused MERAM support
video: fbdev: remove unused auo_k190xfb drivers
video: omap: Improve a size determination in omapfb_do_probe()
video: sm501fb: Improve a size determination in sm501fb_probe()
video: fbdev-MMP: Improve a size determination in path_init()
video: fbdev-MMP: Delete an error message for a failed memory allocation in two functions
video: auo_k190x: Delete an error message for a failed memory allocation in auok190x_common_probe()
video: sh_mobile_lcdcfb: Delete an error message for a failed memory allocation in two functions
video: sh_mobile_meram: Delete an error message for a failed memory allocation in sh_mobile_meram_probe()
video: fbdev: sh_mobile_meram: Drop SUPERH platform dependency
...
Linus Torvalds [Sat, 16 Jun 2018 07:32:04 +0000 (16:32 +0900)]
Merge branch 'afs-proc' of git://git./linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
"Assorted AFS stuff - ended up in vfs.git since most of that consists
of David's AFS-related followups to Christoph's procfs series"
* 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
afs: Optimise callback breaking by not repeating volume lookup
afs: Display manually added cells in dynamic root mount
afs: Enable IPv6 DNS lookups
afs: Show all of a server's addresses in /proc/fs/afs/servers
afs: Handle CONFIG_PROC_FS=n
proc: Make inline name size calculation automatic
afs: Implement network namespacing
afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
proc: Add a way to make network proc files writable
afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
afs: Rearrange fs/afs/proc.c to move the show routines up
afs: Rearrange fs/afs/proc.c by moving fops and open functions down
afs: Move /proc management functions to the end of the file
Linus Torvalds [Sat, 16 Jun 2018 07:21:50 +0000 (16:21 +0900)]
Merge branch 'work.compat' of git://git./linux/kernel/git/viro/vfs
Pull compat updates from Al Viro:
"Some biarch patches - getting rid of assorted (mis)uses of
compat_alloc_user_space().
Not much in that area this cycle..."
* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs: simplify compat ioctl handling
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
Linus Torvalds [Sat, 16 Jun 2018 07:11:40 +0000 (16:11 +0900)]
Merge branch 'work.aio' of git://git./linux/kernel/git/viro/vfs
Pull aio fixes from Al Viro:
"Assorted AIO followups and fixes"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
eventpoll: switch to ->poll_mask
aio: only return events requested in poll_mask() for IOCB_CMD_POLL
eventfd: only return events requested in poll_mask()
aio: mark __aio_sigset::sigmask const
Linus Torvalds [Fri, 15 Jun 2018 22:39:34 +0000 (07:39 +0900)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Various netfilter fixlets from Pablo and the netfilter team.
2) Fix regression in IPVS caused by lack of PMTU exceptions on local
routes in ipv6, from Julian Anastasov.
3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
4) Don't crash on poll in TLS, from Daniel Borkmann.
5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
including Avahi mDNS. From Bart Van Assche.
6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
7) We lack checking of the TCP checking in one special case during SYN
receive, from Frank van der Linden.
8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
10) Must grab HW caps before doing quirk checks in stmmac driver, from
Jose Abreu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
net: stmmac: Run HWIF Quirks after getting HW caps
neighbour: skip NTF_EXT_LEARNED entries during forced gc
net: cxgb3: add error handling for sysfs_create_group
tls: fix waitall behavior in tls_sw_recvmsg
tls: fix use-after-free in tls_push_record
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
mlxsw: spectrum_switchdev: Fix port_vlan refcounting
mlxsw: spectrum_router: Align with new route replace logic
mlxsw: spectrum_router: Allow appending to dev-only routes
ipv6: Only emit append events for appended routes
stmmac: added support for 802.1ad vlan stripping
cfg80211: fix rcu in cfg80211_unregister_wdev
mac80211: Move up init of TXQs
mac80211_hwsim: fix module init error paths
cfg80211: initialize sinfo in cfg80211_get_station
nl80211: fix some kernel doc tag mistakes
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
rds: avoid unenecessary cong_update in loop transport
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
...
Linus Torvalds [Fri, 15 Jun 2018 22:36:39 +0000 (07:36 +0900)]
Merge tag 'modules-for-v4.18' of git://git./linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Minor code cleanup and also allow sig_enforce param to be shown in
sysfs with CONFIG_MODULE_SIG_FORCE"
* tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Allow to always show the status of modsign
module: Do not access sig_enforce directly
Linus Torvalds [Fri, 15 Jun 2018 21:50:51 +0000 (06:50 +0900)]
Merge branch 'for-linus-4.18-rc1' of git://git./linux/kernel/git/rw/uml
Pull uml updates from Richard Weinberger:
"Minor updates for UML:
- fixes for our new vector network driver by Anton
- initcall cleanup by Alexander
- We have a new mailinglist, sourceforge.net sucks"
* 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix raw interface options
um: Fix initialization of vector queues
um: remove uml initcalls
um: Update mailing list address
Linus Torvalds [Fri, 15 Jun 2018 21:42:43 +0000 (06:42 +0900)]
Merge tag 'riscv-for-linus-4.18-merge_window' of git://git./linux/kernel/git/palmer/riscv-linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains some small RISC-V updates I'd like to target for 4.18.
They are all fairly small this time. Here's a short summary, there's
more info in the commits/merges:
- a fix to __clear_user to respect the passed arguments.
- enough support for the perf subsystem to work with RISC-V's ISA
defined performance counters.
- support for sparse and cleanups suggested by it.
- support for R_RISCV_32 (a relocation, not the 32-bit ISA).
- some MAINTAINERS cleanups.
- the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
always present.
I've given these a simple build+boot test"
* tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
RISC-V: Handle R_RISCV_32 in modules
riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
riscv: add riscv-specific predefines to CHECKFLAGS
riscv: split the declaration of __copy_user
riscv: no __user for probe_kernel_address()
riscv: use NULL instead of a plain 0
perf: riscv: Add Document for Future Porting Guide
perf: riscv: preliminary RISC-V support
MAINTAINERS: Update Albert's email, he's back at Berkeley
MAINTAINERS: Add myself as a maintainer for SiFive's drivers
riscv: Fix the bug in memory access fixup code
Linus Torvalds [Fri, 15 Jun 2018 21:37:04 +0000 (06:37 +0900)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"Mostly the PPC part of the release, but also switching to Arnd's fix
for the hyperv config issue and a typo fix.
Main PPC changes:
- reimplement the MMIO instruction emulation
- transactional memory support for PR KVM
- improve radix page table handling"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
KVM: x86: fix typo at kvm_arch_hardware_setup comment
KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
KVM: PPC: Book3S PR: Handle additional interrupt types
KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
KVM: PPC: Book3S PR: Add emulation for trechkpt.
KVM: PPC: Book3S PR: Add emulation for treclaim.
KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
...
Linus Torvalds [Fri, 15 Jun 2018 21:35:02 +0000 (06:35 +0900)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"virtio, vhost: features, fixes
- PCI virtual function support for virtio
- DMA barriers for virtio strong barriers
- bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: update the comments for transport features
virtio_pci: support enabling VFs
vhost: fix info leak due to uninitialized memory
virtio_ring: switch to dma_XX barriers for rpmsg