Steve French [Fri, 13 Sep 2019 21:47:31 +0000 (16:47 -0500)]
smb3: fix potential null dereference in decrypt offload
commit
a091c5f67c99 ("smb3: allow parallelizing decryption of reads")
had a potential null dereference
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 12 Sep 2019 22:52:54 +0000 (17:52 -0500)]
smb3: fix unmount hang in open_shroot
An earlier patch "CIFS: fix deadlock in cached root handling"
did not completely address the deadlock in open_shroot. This
patch addresses the deadlock.
In testing the recent patch:
smb3: improve handling of share deleted (and share recreated)
we were able to reproduce the open_shroot deadlock to one
of the target servers in unmount in a delete share scenario.
Fixes: 7e5a70ad88b1e ("CIFS: fix deadlock in cached root handling")
This is version 2 of this patch. An earlier version of this
patch "smb3: fix unmount hang in open_shroot" had a problem
found by Dan.
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Steve French [Thu, 12 Sep 2019 02:46:20 +0000 (21:46 -0500)]
smb3: allow disabling requesting leases
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens. Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Steve French [Wed, 11 Sep 2019 05:07:36 +0000 (00:07 -0500)]
smb3: improve handling of share deleted (and share recreated)
When a share is deleted, returning EIO is confusing and no useful
information is logged. Improve the handling of this case by
at least logging a better error for this (and also mapping the error
differently to EREMCHG). See e.g. the new messages that would be logged:
[55243.639530] server share \\192.168.1.219\scratch deleted
[55243.642568] CIFS VFS: \\192.168.1.219\scratch BAD_NETWORK_NAME: \\192.168.1.219\scratch
In addition for the case where a share is deleted and then recreated
with the same name, have now fixed that so it works. This is sometimes
done for example, because the admin had to move a share to a different,
bigger local drive when a share is running low on space.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Tue, 10 Sep 2019 03:57:11 +0000 (22:57 -0500)]
smb3: display max smb3 requests in flight at any one time
Displayed in /proc/fs/cifs/Stats once for each
socket we are connected to.
This allows us to find out what the maximum number of
requests that had been in flight (at any one time). Note that
/proc/fs/cifs/Stats can be reset if you want to look for
maximum over a small period of time.
Sample output (immediately after mount):
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 5 maximum at one time: 2
Max requests in flight: 2
1) \\localhost\scratch
SMBs: 18
Bytes read: 0 Bytes written: 0
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Steve French [Mon, 9 Sep 2019 18:30:15 +0000 (13:30 -0500)]
smb3: only offload decryption of read responses if multiple requests
No point in offloading read decryption if no other requests on the
wire
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Ronnie Sahlberg [Mon, 9 Sep 2019 05:30:00 +0000 (15:30 +1000)]
cifs: add a helper to find an existing readable handle to a file
and convert smb2_query_path_info() to use it.
This will eliminate the need for a SMB2_Create when we already have an
open handle that can be used. This will also prevent a oplock break
in case the other handle holds a lease.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Mon, 9 Sep 2019 04:22:02 +0000 (23:22 -0500)]
smb3: enable offload of decryption of large reads via mount option
Disable offload of the decryption of encrypted read responses
by default (equivalent to setting this new mount option "esize=0").
Allow setting the minimum encrypted read response size that we
will choose to offload to a worker thread - it is now configurable
via on a new mount option "esize="
Depending on which encryption mechanism (GCM vs. CCM) and
the number of reads that will be issued in parallel and the
performance of the network and CPU on the client, it may make
sense to enable this since it can provide substantial benefit when
multiple large reads are in flight at the same time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Sat, 7 Sep 2019 06:09:49 +0000 (01:09 -0500)]
smb3: allow parallelizing decryption of reads
decrypting large reads on encrypted shares can be slow (e.g. adding
multiple milliseconds per-read on non-GCM capable servers or
when mounting with dialects prior to SMB3.1.1) - allow parallelizing
of read decryption by launching worker threads.
Testing to Samba on localhost showed 25% improvement.
Testing to remote server showed very large improvement when
doing more than one 'cp' command was called at one time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Ronnie Sahlberg [Wed, 4 Sep 2019 02:32:41 +0000 (12:32 +1000)]
cifs: add a debug macro that prints \\server\share for errors
Where we have a tcon available we can log \\server\share as part
of the message. Only do this for the VFS log level.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 5 Sep 2019 04:07:52 +0000 (23:07 -0500)]
smb3: fix signing verification of large reads
Code cleanup in the 5.1 kernel changed the array
passed into signing verification on large reads leading
to warning messages being logged when copying files to local
systems from remote.
SMB signature verification returned error = -5
This changeset fixes verification of SMB3 signatures of large
reads.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Wed, 4 Sep 2019 02:18:49 +0000 (21:18 -0500)]
smb3: allow skipping signature verification for perf sensitive configurations
Add new mount option "signloosely" which enables signing but skips the
sometimes expensive signing checks in the responses (signatures are
calculated and sent correctly in the SMB2/SMB3 requests even with this
mount option but skipped in the responses). Although weaker for security
(and also data integrity in case a packet were corrupted), this can provide
enough of a performance benefit (calculating the signature to verify a
packet can be expensive especially for large packets) to be useful in
some cases.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Tue, 3 Sep 2019 23:35:42 +0000 (18:35 -0500)]
smb3: add dynamic tracepoints for flush and close
We only had dynamic tracepoints on errors in flush
and close, but may be helpful to trace enter
and non-error exits for those. Sample trace examples
(excerpts) from "cp" and "dd" show two of the new
tracepoints.
cp-22823 [002] .... 123439.179701: smb3_enter: _cifsFileInfo_put: xid=10
cp-22823 [002] .... 123439.179705: smb3_close_enter: xid=10 sid=0x98871327 tid=0xfcd585ff fid=0xc7f84682
cp-22823 [002] .... 123439.179711: smb3_cmd_enter: sid=0x98871327 tid=0xfcd585ff cmd=6 mid=43
cp-22823 [002] .... 123439.180175: smb3_cmd_done: sid=0x98871327 tid=0xfcd585ff cmd=6 mid=43
cp-22823 [002] .... 123439.180179: smb3_close_done: xid=10 sid=0x98871327 tid=0xfcd585ff fid=0xc7f84682
dd-22981 [003] .... 123696.946011: smb3_flush_enter: xid=24 sid=0x98871327 tid=0xfcd585ff fid=0x1917736f
dd-22981 [003] .... 123696.946013: smb3_cmd_enter: sid=0x98871327 tid=0xfcd585ff cmd=7 mid=123
dd-22981 [003] .... 123696.956639: smb3_cmd_done: sid=0x98871327 tid=0x0 cmd=7 mid=123
dd-22981 [003] .... 123696.956644: smb3_flush_done: xid=24 sid=0x98871327 tid=0xfcd585ff fid=0x1917736f
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Tue, 3 Sep 2019 22:49:46 +0000 (17:49 -0500)]
smb3: log warning if CSC policy conflicts with cache mount option
If the server config (e.g. Samba smb.conf "csc policy = disable)
for the share indicates that the share should not be cached, log
a warning message if forced client side caching ("cache=ro" or
"cache=singleclient") is requested on mount.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Fri, 30 Aug 2019 07:12:41 +0000 (02:12 -0500)]
smb3: add mount option to allow RW caching of share accessed by only 1 client
If a share is known to be only to be accessed by one client, we
can aggressively cache writes not just reads to it.
Add "cache=" option (cache=singleclient) for mounting read write shares
(that will not be read or written to from other clients while we have
it mounted) in order to improve performance.
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 30 Aug 2019 03:33:38 +0000 (22:33 -0500)]
smb3: add some more descriptive messages about share when mounting cache=ro
Add some additional logging so the user can see if the share they
mounted with cache=ro is considered read only by the server
CIFS: Attempting to mount //localhost/test
CIFS VFS: mounting share with read only caching. Ensure that the share will not be modified while in use.
CIFS VFS: read only mount of RW share
CIFS: Attempting to mount //localhost/test-ro
CIFS VFS: mounting share with read only caching. Ensure that the share will not be modified while in use.
CIFS VFS: mounted to read only share
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Steve French [Wed, 28 Aug 2019 04:58:54 +0000 (23:58 -0500)]
smb3: add mount option to allow forced caching of read only share
If a share is immutable (at least for the period that it will
be mounted) it would be helpful to not have to revalidate
dentries repeatedly that we know can not be changed remotely.
Add "cache=" option (cache=ro) for mounting read only shares
in order to improve performance in cases in which we know that
the share will not be changing while it is in use.
Signed-off-by: Steve French <stfrench@microsoft.com>
Colin Ian King [Mon, 2 Sep 2019 15:10:59 +0000 (16:10 +0100)]
cifs: fix dereference on ses before it is null checked
The assignment of pointer server dereferences pointer ses, however,
this dereference occurs before ses is null checked and hence we
have a potential null pointer dereference. Fix this by only
dereferencing ses after it has been null checked.
Addresses-Coverity: ("Dereference before null check")
Fixes: 2808c6639104 ("cifs: add new debugging macro cifs_server_dbg")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Wed, 28 Aug 2019 07:15:35 +0000 (17:15 +1000)]
cifs: add new debugging macro cifs_server_dbg
which can be used from contexts where we have a TCP_Server_Info *server.
This new macro will prepend the debugging string with "Server:<servername> "
which will help when debugging issues on hosts with many cifs connections
to several different servers.
Convert a bunch of cifs_dbg(VFS) calls to cifs_server_dbg(VFS)
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Thu, 29 Aug 2019 23:53:56 +0000 (09:53 +1000)]
cifs: use existing handle for compound_op(OP_SET_INFO) when possible
If we already have a writable handle for a path we want to set the
attributes for then use that instead of a create/set-info/close compound.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Thu, 29 Aug 2019 22:25:46 +0000 (08:25 +1000)]
cifs: create a helper to find a writeable handle by path name
rename() takes a path for old_file and in SMB2 we used to just create
a compound for create(old_path)/rename/close().
If we already have a writable handle we can avoid the create() and close()
altogether and just use the existing handle.
For this situation, as we avoid doing the create()
we also avoid triggering an oplock break for the existing handle.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
YueHaibing [Fri, 23 Aug 2019 12:15:35 +0000 (20:15 +0800)]
cifs: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:
fs/cifs/file.c: In function cifs_lock:
fs/cifs/file.c:1696:24: warning: variable cinode set but not used [-Wunused-but-set-variable]
fs/cifs/file.c: In function cifs_write:
fs/cifs/file.c:1765:23: warning: variable cifs_sb set but not used [-Wunused-but-set-variable]
fs/cifs/file.c: In function collect_uncached_read_data:
fs/cifs/file.c:3578:20: warning: variable tcon set but not used [-Wunused-but-set-variable]
'cinode' is never used since introduced by
commit
03776f4516bc ("CIFS: Simplify byte range locking code")
'cifs_sb' is not used since commit
cb7e9eabb2b5 ("CIFS: Use
multicredits for SMB 2.1/3 writes").
'tcon' is not used since commit
d26e2903fc10 ("smb3: fix bytes_read statistics")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Mon, 5 Aug 2019 22:07:26 +0000 (17:07 -0500)]
smb3: Incorrect size for netname negotiate context
It is not null terminated (length was off by two).
Also see similar change to Samba:
https://gitlab.com/samba-team/samba/merge_requests/666
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
zhengbin [Tue, 20 Aug 2019 14:00:47 +0000 (22:00 +0800)]
cifs: remove unused variable
In smb3_punch_hole, variable cifsi set but not used, remove it.
In cifs_lock, variable netfid set but not used, remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Colin Ian King [Wed, 31 Jul 2019 09:05:26 +0000 (10:05 +0100)]
cifs: remove redundant assignment to variable rc
Variable rc is being initialized with a value that is never read
and rc is being re-assigned a little later on. The assignment is
redundant and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Thu, 25 Jul 2019 23:19:42 +0000 (18:19 -0500)]
smb3: add missing flag definitions
SMB3 and 3.1.1 added two additional flags including
the priority mask. Add them to our protocol definitions
in smb2pdu.h. See MS-SMB2 2.2.1.2
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Ronnie Sahlberg [Thu, 25 Jul 2019 03:08:43 +0000 (13:08 +1000)]
cifs: add passthrough for smb2 setinfo
Add support to send smb2 set-info commands from userspace.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Ronnie Sahlberg [Tue, 16 Jul 2019 05:07:08 +0000 (15:07 +1000)]
cifs: prepare SMB2_Flush to be usable in compounds
Create smb2_flush_init() and smb2_flush_free() so we can use the flush command
in compounds.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 19 Jul 2019 08:15:55 +0000 (08:15 +0000)]
cifs: allow chmod to set mode bits using special sid
When mounting with "modefromsid" set mode bits (chmod) by
adding ACE with special SID (S-1-5-88-3-<mode>) to the ACL.
Subsequent patch will fix setting default mode on file
create and mkdir.
See See e.g.
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)
Signed-off-by: Steve French <stfrench@microsoft.com>
Steve French [Fri, 19 Jul 2019 06:30:07 +0000 (06:30 +0000)]
cifs: get mode bits from special sid on stat
When mounting with "modefromsid" retrieve mode bits from
special SID (S-1-5-88-3) on stat. Subsequent patch will fix
setattr (chmod) to save mode bits in S-1-5-88-3-<mode>
Note that when an ACE matching S-1-5-88-3 is not found, we
default the mode to an approximation based on the owner, group
and everyone permissions (as with the "cifsacl" mount option).
See See e.g.
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)
Signed-off-by: Steve French <stfrench@microsoft.com>
Colin Ian King [Tue, 23 Jul 2019 15:09:19 +0000 (16:09 +0100)]
fs: cifs: cifsssmb: remove redundant assignment to variable ret
The variable ret is being initialized however this is never read
and later it is being reassigned to a new value. The initialization
is redundant and hence can be removed.
Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Ronnie Sahlberg [Wed, 24 Jul 2019 01:43:49 +0000 (11:43 +1000)]
cifs: fix a comment for the timeouts when sending echos
Clarify a trivial comment
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Linus Torvalds [Sun, 15 Sep 2019 21:19:32 +0000 (14:19 -0700)]
Linux 5.3
Linus Torvalds [Sun, 15 Sep 2019 19:32:03 +0000 (12:32 -0700)]
Revert "ext4: make __ext4_get_inode_loc plug"
This reverts commit
b03755ad6f33b7b8cd7312a3596a2dbf496de6e7.
This is sad, and done for all the wrong reasons. Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.
However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.
But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom). And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.
It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration. Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).
The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion. Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?
So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.
Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 14 Sep 2019 23:07:40 +0000 (16:07 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main change here is a revert of reverts. We recently simplified
some code that was thought unnecessary; however, since then KVM has
grown quite a few cond_resched()s and for that reason the simplified
code is prone to livelocks---one CPUs tries to empty a list of guest
page tables while the others keep adding to them. This adds back the
generation-based zapping of guest page tables, which was not
unnecessary after all.
On top of this, there is a fix for a kernel memory leak and a couple
of s390 fixlets as well"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
KVM: x86: work around leak of uninitialized stack contents
KVM: nVMX: handle page fault in vmread
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
Linus Torvalds [Sat, 14 Sep 2019 23:02:49 +0000 (16:02 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fix from Michael Tsirkin:
"A last minute revert
The 32-bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
Revert "vhost: block speculation of translated descriptors"
Linus Torvalds [Sat, 14 Sep 2019 22:58:02 +0000 (15:58 -0700)]
Merge tag 'riscv/for-v5.3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Paul Walmsley:
"Last week, Palmer and I learned that there was an error in the RISC-V
kernel image header format that could make it less compatible with the
ARM64 kernel image header format. I had missed this error during my
original reviews of the patch.
The kernel image header format is an interface that impacts
bootloaders, QEMU, and other user tools. Those packages must be
updated to align with whatever is merged in the kernel. We would like
to avoid proliferating these image formats by keeping the RISC-V
header as close as possible to the existing ARM64 header. Since the
arch/riscv patch that adds support for the image header was merged
with our v5.3-rc1 pull request as commit
0f327f2aaad6a ("RISC-V: Add
an Image header that boot loader can parse."), we think it wise to try
to fix this error before v5.3 is released.
The fix itself should be backwards-compatible with any project that
has already merged support for premature versions of this interface.
It primarily involves ensuring that the RISC-V image header has
something useful in the same field as the ARM64 image header"
* tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: modify the Image header to improve compatibility with the ARM64 header
Michael S. Tsirkin [Sat, 14 Sep 2019 19:21:51 +0000 (15:21 -0400)]
Revert "vhost: block speculation of translated descriptors"
This reverts commit
a89db445fbd7f1f8457b03759aa7343fa530ef6b.
I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sat, 14 Sep 2019 19:20:38 +0000 (12:20 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Don't corrupt xfrm_interface parms before validation, from Nicolas
Dichtel.
2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
Leonardo Bras.
4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
5) Missing ULP check in sock_map, from John Fastabend.
6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
7) Fix length of SKB allocated in 6pack driver, from Christophe
JAILLET.
8) ip6_route_info_create() returns an error pointer, not NULL. From
Maciej Żenczykowski.
9) Only add RDS sock to the hashes after rs_transport is set, from
Ka-Cheong Poon.
10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
11) Presence of transmit IPSEC offload in an SKB is not tested for
correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
Kirsher.
12) Need rcu_barrier() when register_netdevice() takes one of the
notifier based failure paths, from Subash Abhinov Kasiviswanathan.
13) Fix leak in sctp_do_bind(), from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
cdc_ether: fix rndis support for Mediatek based smartphones
sctp: destroy bucket if failed to bind addr
sctp: remove redundant assignment when call sctp_get_port_local
sctp: change return type of sctp_get_port_local
ixgbevf: Fix secpath usage for IPsec Tx offload
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
ixgbe: Fix secpath usage for IPsec TX offload.
net: qrtr: fix memort leak in qrtr_tun_write_iter
net: Fix null de-reference of device refcount
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
tun: fix use-after-free when register netdev failed
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
ixgbe: fix double clean of Tx descriptors with xdp
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
mlx4: fix spelling mistake "veify" -> "verify"
net: hns3: fix spelling mistake "undeflow" -> "underflow"
net: lmc: fix spelling mistake "runnin" -> "running"
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
net/rds: An rds_sock is added too early to the hash table
mac80211: Do not send Layer 2 Update frame before authorization
...
Linus Torvalds [Sat, 14 Sep 2019 19:08:19 +0000 (12:08 -0700)]
Merge tag 'mmc-v5.3-rc8' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- tmio: Fixup runtime PM management during probe and remove
- sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC
- bcm2835: Prevent lockups when terminating work
* tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: tmio: Fixup runtime PM management during remove
mmc: tmio: Fixup runtime PM management during probe
Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
Revert "mmc: bcm2835: Terminate timeout work synchronously"
Linus Torvalds [Sat, 14 Sep 2019 18:54:57 +0000 (11:54 -0700)]
Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"From the maintainer summit, just some last minute fixes for final:
lima:
- fix gem_wait ioctl
core:
- constify modes list
i915:
- DP MST high color depth regression
- GPU hangs on vulkan compute workloads"
* tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
drm/lima: fix lima_gem_wait() return value
drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
drm/i915: Limit MST to <= 8bpc once again
drm/modes: Make the whitelist more const
Paolo Bonzini [Sat, 14 Sep 2019 07:25:30 +0000 (09:25 +0200)]
Merge tag 'kvm-s390-master-5.3-1' of git://git./linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fixes for 5.3
- prevent a user triggerable oops in the migration code
- do not leak kernel stack content
Sean Christopherson [Fri, 13 Sep 2019 02:46:02 +0000 (19:46 -0700)]
KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
James Harvey reported a livelock that was introduced by commit
d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when
removing a memslot"").
The livelock occurs because kvm_mmu_zap_all() as it exists today will
voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck
in an infinite loop as it can never zap all pages before observing lock
contention or the need to reschedule. The equivalent of kvm_mmu_zap_all()
that was in use at the time of the reverted commit (
4e103134b8623, "KVM:
x86/mmu: Zap only the relevant pages when removing a memslot") employed
a fast invalidate mechanism and was not susceptible to the above livelock.
There are three ways to fix the livelock:
- Reverting the revert (commit
d012a06ab1d23) is not a viable option as
the revert is needed to fix a regression that occurs when the guest has
one or more assigned devices. It's unlikely we'll root cause the device
assignment regression soon enough to fix the regression timely.
- Remove the conditional reschedule from kvm_mmu_zap_all(). However, although
removing the reschedule would be a smaller code change, it's less safe
in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
the wild for flushing memslots since the fast invalidate mechanism was
introduced by commit
6ca18b6950f8d ("KVM: x86: use the fast way to
invalidate all pages"), back in 2013.
- Reintroduce the fast invalidate mechanism and use it when zapping shadow
pages in response to a memslot being deleted/moved, which is what this
patch does.
For all intents and purposes, this is a revert of commit
ea145aacf4ae8
("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
commit
7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate
all pages""), i.e. restores the behavior of commit
5304b8d37c2a5 ("KVM:
MMU: fast invalidate all pages") and commit
6ca18b6950f8d ("KVM: x86:
use the fast way to invalidate all pages") respectively.
Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
Reported-by: James Harvey <jamespharvey20@gmail.com>
Cc: Alex Willamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fuqian Huang [Thu, 12 Sep 2019 04:18:17 +0000 (12:18 +0800)]
KVM: x86: work around leak of uninitialized stack contents
Emulation of VMPTRST can incorrectly inject a page fault
when passed an operand that points to an MMIO address.
The page fault will use uninitialized kernel stack memory
as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just ensure
that the error code and CR2 are zero.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Cc: stable@vger.kernel.org
[add comment]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 13 Sep 2019 22:26:27 +0000 (00:26 +0200)]
KVM: nVMX: handle page fault in vmread
The implementation of vmread to memory is still incomplete, as it
lacks the ability to do vmread to I/O memory just like vmptrst.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul Walmsley [Sat, 14 Sep 2019 01:35:50 +0000 (18:35 -0700)]
riscv: modify the Image header to improve compatibility with the ARM64 header
Part of the intention during the definition of the RISC-V kernel image
header was to lay the groundwork for a future merge with the ARM64
image header. One error during my original review was not noticing
that the RISC-V header's "magic" field was at a different size and
position than the ARM64's "magic" field. If the existing ARM64 Image
header parsing code were to attempt to parse an existing RISC-V kernel
image header format, it would see a magic number 0. This is
undesirable, since it's our intention to align as closely as possible
with the ARM64 header format. Another problem was that the original
"res3" field was not being initialized correctly to zero.
Address these issues by creating a 32-bit "magic2" field in the RISC-V
header which matches the ARM64 "magic" field. RISC-V binaries will
store "RSC\x05" in this field. The intention is that the use of the
existing 64-bit "magic" field in the RISC-V header will be deprecated
over time. Increment the minor version number of the file format to
indicate this change, and update the documentation accordingly. Fix
the assembler directives in head.S to ensure that reserved fields are
properly zero-initialized.
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reported-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Karsten Merker <merker@debian.org>
Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
Bjørn Mork [Thu, 12 Sep 2019 08:42:00 +0000 (10:42 +0200)]
cdc_ether: fix rndis support for Mediatek based smartphones
A Mediatek based smartphone owner reports problems with USB
tethering in Linux. The verbose USB listing shows a rndis_host
interface pair (e0/01/03 + 10/00/00), but the driver fails to
bind with
[ 355.960428] usb 1-4: bad CDC descriptors
The problem is a failsafe test intended to filter out ACM serial
functions using the same 02/02/ff class/subclass/protocol as RNDIS.
The serial functions are recognized by their non-zero bmCapabilities.
No RNDIS function with non-zero bmCapabilities were known at the time
this failsafe was added. But it turns out that some Wireless class
RNDIS functions are using the bmCapabilities field. These functions
are uniquely identified as RNDIS by their class/subclass/protocol, so
the failing test can safely be disabled. The same applies to the two
types of Misc class RNDIS functions.
Applying the failsafe to Communication class functions only retains
the original functionality, and fixes the problem for the Mediatek based
smartphone.
Tow examples of CDC functional descriptors with non-zero bmCapabilities
from Wireless class RNDIS functions are:
0e8d:000a Mediatek Crosscall Spider X5 3G Phone
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x0f
connection notifications
sends break
line coding and serial state
get/set/clear comm features
CDC Union:
bMasterInterface 0
bSlaveInterface 1
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
and
19d2:1023 ZTE K4201-z
CDC Header:
bcdCDC 1.10
CDC ACM:
bmCapabilities 0x02
line coding and serial state
CDC Call Management:
bmCapabilities 0x03
call management
use DataInterface
bDataInterface 1
CDC Union:
bMasterInterface 0
bSlaveInterface 1
The Mediatek example is believed to apply to most smartphones with
Mediatek firmware. The ZTE example is most likely also part of a larger
family of devices/firmwares.
Suggested-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 13 Sep 2019 20:06:20 +0000 (22:06 +0200)]
Merge branch 'sctp_do_bind-leak'
Mao Wenan says:
====================
fix memory leak for sctp_do_bind
First two patches are to do cleanup, remove redundant assignment,
and change return type of sctp_get_port_local.
Third patch is to fix memory leak for sctp_do_bind if failed
to bind address.
v2: add one patch to change return type of sctp_get_port_local.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:19 +0000 (12:02 +0800)]
sctp: destroy bucket if failed to bind addr
There is one memory leak bug report:
BUG: memory leak
unreferenced object 0xffff8881dc4c5ec0 (size 40):
comm "syz-executor.0", pid 5673, jiffies
4298198457 (age 27.578s)
hex dump (first 32 bytes):
02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................
f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=.............
backtrace:
[<
0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
[<
00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
[<
000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
[<
00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
[<
00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
[<
000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
[<
000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
[<
000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
[<
0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
[<
0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is because in sctp_do_bind, if sctp_get_port_local is to
create hash bucket successfully, and sctp_add_bind_addr failed
to bind address, e.g return -ENOMEM, so memory leak found, it
needs to destroy allocated bucket.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:18 +0000 (12:02 +0800)]
sctp: remove redundant assignment when call sctp_get_port_local
There are more parentheses in if clause when call sctp_get_port_local
in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
do cleanup.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 12 Sep 2019 04:02:17 +0000 (12:02 +0800)]
sctp: change return type of sctp_get_port_local
Currently sctp_get_port_local() returns a long
which is either 0,1 or a pointer casted to long.
It's neither of the callers use the return value since
commit
62208f12451f ("net: sctp: simplify sctp_get_port").
Now two callers are sctp_get_port and sctp_do_bind,
they actually assumend a casted to an int was the same as
a pointer casted to a long, and they don't save the return
value just check whether it is zero or non-zero, so
it would better change return type from long to int for
sctp_get_port_local.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Thu, 12 Sep 2019 19:07:34 +0000 (12:07 -0700)]
ixgbevf: Fix secpath usage for IPsec Tx offload
Port the same fix for ixgbe to ixgbevf.
The ixgbevf driver currently does IPsec Tx offloading
based on an existing secpath. However, the secpath
can also come from the Rx side, in this case it is
misinterpreted for Tx offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for Tx offload.
CC: Shannon Nelson <snelson@pensando.io>
Fixes: 7f68d4306701 ("ixgbevf: enable VF IPsec offload operations")
Reported-by: Jonathan Tooker <jonathan@reliablehosting.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulf Hansson [Fri, 13 Sep 2019 09:20:22 +0000 (11:20 +0200)]
mmc: tmio: Fixup runtime PM management during remove
Accessing the device when it may be runtime suspended is a bug, which is
the case in tmio_mmc_host_remove(). Let's fix the behaviour.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Ulf Hansson [Fri, 13 Sep 2019 09:19:26 +0000 (11:19 +0200)]
mmc: tmio: Fixup runtime PM management during probe
The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
runtime PM status of the device, as to make it reflect the current status
of the HW. This works fine for most cases, but unfortunate not for all.
Especially, there is a generic problem when the device has a genpd attached
and that genpd have the ->start|stop() callbacks assigned.
More precisely, if the driver calls pm_runtime_set_active() during
->probe(), genpd does not get to invoke the ->start() callback for it,
which means the HW isn't really fully powered on. Furthermore, in the next
phase, when the device becomes runtime suspended, genpd will invoke the
->stop() callback for it, potentially leading to usage count imbalance
problems, depending on what's implemented behind the callbacks of course.
To fix this problem, convert to call pm_runtime_get_sync() from
tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
avoid bumping usage counters and unnecessary re-initializing the HW the
first time the tmio driver's ->runtime_resume() callback is called,
introduce a state flag to keeping track of this.
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Ulf Hansson [Fri, 13 Sep 2019 08:03:15 +0000 (10:03 +0200)]
Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
This reverts commit
7ff213193310ef8d0ee5f04f79d791210787ac2c.
It turns out that the above commit introduces other problems. For example,
calling pm_runtime_set_active() must not be done prior calling
pm_runtime_enable() as that makes it fail. This leads to additional
problems, such as clock enables being wrongly balanced.
Rather than fixing the problem on top, let's start over by doing a revert.
Fixes: 7ff213193310 ("mmc: tmio: move runtime PM enablement to the driver implementations")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Linus Torvalds [Fri, 13 Sep 2019 08:52:01 +0000 (09:52 +0100)]
Merge branch 'for-5.3-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"Roman found and fixed a bug in the cgroup2 freezer which allows new
child cgroup to escape frozen state"
* 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: freezer: fix frozen state inheritance
kselftests: cgroup: add freezer mkdir test
Linus Torvalds [Fri, 13 Sep 2019 08:48:47 +0000 (09:48 +0100)]
Merge tag 'for-5.3-rc8-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Here are two fixes, one of them urgent fixing a bug introduced in 5.2
and reported by many users. It took time to identify the root cause,
catching the 5.3 release is higly desired also to push the fix to 5.2
stable tree.
The bug is a mess up of return values after adding proper error
handling and honestly the kind of bug that can cause sleeping
disorders until it's caught. My appologies to everybody who was
affected.
Summary of what could happen:
1) either a hang when committing a transaction, if this happens
there's no risk of corruption, still the hang is very inconvenient
and can't be resolved without a reboot
2) writeback for some btree nodes may never be started and we end up
committing a transaction without noticing that, this is really
serious and that will lead to the "parent transid verify failed"
messages"
* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
Btrfs: fix assertion failure during fsync and use of stale transaction
Roman Gushchin [Thu, 12 Sep 2019 17:56:45 +0000 (10:56 -0700)]
cgroup: freezer: fix frozen state inheritance
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.
The problem can be reproduced with the test_cgfreezer_mkdir test.
This is the output before this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
not ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
And with this patch:
~/test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgfreezer_mkdir
ok 5 test_cgfreezer_rmdir
ok 6 test_cgfreezer_migrate
ok 7 test_cgfreezer_ptrace
ok 8 test_cgfreezer_stopped
ok 9 test_cgfreezer_ptraced
ok 10 test_cgfreezer_vfork
Reported-by: Mark Crossen <mcrossen@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
Roman Gushchin [Thu, 12 Sep 2019 17:56:44 +0000 (10:56 -0700)]
kselftests: cgroup: add freezer mkdir test
Add a new cgroup freezer selftest, which checks that if a cgroup is
frozen, their new child cgroups will properly inherit the frozen
state.
It creates a parent cgroup, freezes it, creates a child cgroup
and populates it with a dummy process. Then it checks that both
parent and child cgroup are frozen.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Chris Wilson [Thu, 12 Sep 2019 12:56:34 +0000 (13:56 +0100)]
Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()"
The userptr put_pages can be called from inside try_to_unmap, and so
enters with the page lock held on one of the object's backing pages. We
cannot take the page lock ourselves for fear of recursion.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Martin Wilck <Martin.Wilck@suse.com>
Reported-by: Leo Kraav <leho@kraav.com>
Fixes: aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
References: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 12 Sep 2019 13:50:14 +0000 (14:50 +0100)]
Merge tag 'for-linus-
20190912' of gitolite.pub/scm/linux/kernel/git/brauner/linux
Pull clone3 fix from Christian Brauner:
"This is a last-minute bugfix for clone3() that should go in before we
release 5.3 with clone3().
clone3() did not verify that the exit_signal argument was set to a
valid signal. This can be used to cause a crash by specifying a signal
greater than NSIG. e.g. -1.
The commit from Eugene adds a check to copy_clone_args_from_user() to
verify that the exit signal is limited by CSIGNAL as with legacy
clone() and that the signal is valid. With this we don't get the
legacy clone behavior were an invalid signal could be handed down and
would only be detected and then ignored in do_notify_parent(). Users
of clone3() will now get a proper error right when they pass an
invalid exit signal. Note, that this is not a change in user-visible
behavior since no kernel with clone3() has been released yet"
* tag 'for-linus-
20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
fork: block invalid exit signals with clone3()
Linus Torvalds [Thu, 12 Sep 2019 13:47:35 +0000 (14:47 +0100)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"A KVM guest fix, and a kdump kernel relocation errors fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
Dave Airlie [Thu, 12 Sep 2019 13:14:29 +0000 (23:14 +1000)]
Merge tag 'drm-misc-fixes-2019-09-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.3 final:
- Constify modes whitelist harder.
- Fix lima driver gem_wait ioctl.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/99e52e7a-d4ce-6a2c-0501-bc559a710955@linux.intel.com
Dave Airlie [Thu, 12 Sep 2019 13:11:36 +0000 (23:11 +1000)]
Merge tag 'drm-intel-fixes-2019-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Final drm/i915 fixes for v5.3:
- Fox DP MST high color depth regression
- Fix GPU hangs on Vulkan compute workloads
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/877e6e27qm.fsf@intel.com
Eugene Syromiatnikov [Wed, 11 Sep 2019 17:45:40 +0000 (18:45 +0100)]
fork: block invalid exit signals with clone3()
Previously, higher 32 bits of exit_signal fields were lost when copied
to the kernel args structure (that uses int as a type for the respective
field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
it has to be checked for sanity before use; for the legacy syscalls,
applying CSIGNAL mask guarantees that it is at least non-negative;
however, there's no such thing is done in clone3() code path, and that
can break at least thread_group_leader.
This commit adds a check to copy_clone_args_from_user() to verify that
the exit signal is limited by CSIGNAL as with legacy clone() and that
the signal is valid. With this we don't get the legacy clone behavior
were an invalid signal could be handed down and would only be detected
and ignored in do_notify_parent(). Users of clone3() will now get a
proper error when they pass an invalid exit signal. Note, that this is
not user-visible behavior since no kernel with clone3() has been
released yet.
The following program will cause a splat on a non-fixed clone3() version
and will fail correctly on a fixed version:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
pid_t pid = -1;
struct clone_args args = {0};
args.exit_signal = -1;
pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
if (pid < 0)
exit(EXIT_FAILURE);
if (pid == 0)
exit(EXIT_SUCCESS);
wait(NULL);
exit(EXIT_SUCCESS);
}
Fixes: 7f192e3cd316 ("fork: add clone3")
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
[christian.brauner@ubuntu.com: simplify check and rework commit message]
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Thomas Huth [Thu, 12 Sep 2019 11:54:38 +0000 (13:54 +0200)]
KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
an interrupt, we convert them from the legacy struct kvm_s390_interrupt
to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
However, this function does not take care of all types of interrupts
that we can inject into the guest later (see do_inject_vcpu()). Since we
do not clear out the s390irq values before calling s390int_to_s390irq(),
there is a chance that we copy random data from the kernel stack which
could be leaked to the userspace later.
Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
interrupt: s390int_to_s390irq() does not handle it, and the function
__inject_pfault_init() later copies irq->u.ext which contains the
random kernel stack data. This data can then be leaked either to
the guest memory in __deliver_pfault_init(), or the userspace might
retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.
Fix it by handling that interrupt type in s390int_to_s390irq(), too,
and by making sure that the s390irq struct is properly pre-initialized.
And while we're at it, make sure that s390int_to_s390irq() now
directly returns -EINVAL for unknown interrupt types, so that we
immediately get a proper error code in case we add more interrupt
types to do_inject_vcpu() without updating s390int_to_s390irq()
sometime in the future.
Cc: stable@vger.kernel.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Christophe JAILLET [Wed, 11 Sep 2019 16:02:39 +0000 (18:02 +0200)]
sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Thu, 12 Sep 2019 11:01:44 +0000 (13:01 +0200)]
ixgbe: Fix secpath usage for IPsec TX offload.
The ixgbe driver currently does IPsec TX offloading
based on an existing secpath. However, the secpath
can also come from the RX side, in this case it is
misinterpreted for TX offload and the packets are
dropped with a "bad sa_idx" error. Fix this by using
the xfrm_offload() function to test for TX offload.
Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
Reported-by: Michael Marley <michael@michaelmarley.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Filipe Manana [Wed, 11 Sep 2019 16:42:00 +0000 (17:42 +0100)]
Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.
When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).
Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:
[49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
[49887.347059] Not tainted 5.2.13-gentoo #2
[49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[49887.347062] btrfs-transacti D 0 1752 2 0x80004000
[49887.347064] Call Trace:
[49887.347069] ? __schedule+0x265/0x830
[49887.347071] ? bit_wait+0x50/0x50
[49887.347072] ? bit_wait+0x50/0x50
[49887.347074] schedule+0x24/0x90
[49887.347075] io_schedule+0x3c/0x60
[49887.347077] bit_wait_io+0x8/0x50
[49887.347079] __wait_on_bit+0x6c/0x80
[49887.347081] ? __lock_release.isra.29+0x155/0x2d0
[49887.347083] out_of_line_wait_on_bit+0x7b/0x80
[49887.347084] ? var_wake_function+0x20/0x20
[49887.347087] lock_extent_buffer_for_io+0x28c/0x390
[49887.347089] btree_write_cache_pages+0x18e/0x340
[49887.347091] do_writepages+0x29/0xb0
[49887.347093] ? kmem_cache_free+0x132/0x160
[49887.347095] ? convert_extent_bit+0x544/0x680
[49887.347097] filemap_fdatawrite_range+0x70/0x90
[49887.347099] btrfs_write_marked_extents+0x53/0x120
[49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
[49887.347102] btrfs_commit_transaction+0x6bb/0x990
[49887.347103] ? start_transaction+0x33e/0x500
[49887.347105] transaction_kthread+0x139/0x15c
So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).
This is a regression introduced in the 5.2 kernel.
Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Tue, 10 Sep 2019 14:26:49 +0000 (15:26 +0100)]
Btrfs: fix assertion failure during fsync and use of stale transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.
That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.
In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:
1) evict_refill_and_join() will often change the transaction handle's
block reserve (->block_rsv) and set its ->bytes_reserved field to a
value greater than 0. If evict_refill_and_join() never commits the
transaction, the eviction handler ends up decreasing the reference
count (->use_count) of the transaction handle through the call to
btrfs_end_transaction(), and after that point we have a transaction
handle with a NULL ->block_rsv (which is the value prior to the
transaction join from evict_refill_and_join()) and a ->bytes_reserved
value greater than 0. If after the eviction/iput completes the inode
logging path hits an error or it decides that it must fallback to a
transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
non-zero value from btrfs_log_dentry_safe(), and because of that
non-zero value it tries to commit the transaction using a handle with
a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
the transaction commit hit an assertion failure at
btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
the ->block_rsv is NULL. The produced stack trace for that is like the
following:
[192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
[192922.917553] ------------[ cut here ]------------
[192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
[192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
[192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1
[192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
(...)
[192922.920925] RSP: 0018:
ffffaebdc8a27da8 EFLAGS:
00010286
[192922.921315] RAX:
0000000000000051 RBX:
ffff95c9c16a41c0 RCX:
0000000000000000
[192922.921692] RDX:
0000000000000000 RSI:
ffff95cab6b16838 RDI:
ffff95cab6b16838
[192922.922066] RBP:
ffff95c9c16a41c0 R08:
0000000000000000 R09:
0000000000000000
[192922.922442] R10:
ffffaebdc8a27e70 R11:
0000000000000000 R12:
ffff95ca731a0980
[192922.922820] R13:
0000000000000000 R14:
ffff95ca84c73338 R15:
ffff95ca731a0ea8
[192922.923200] FS:
00007f337eda4e80(0000) GS:
ffff95cab6b00000(0000) knlGS:
0000000000000000
[192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[192922.923948] CR2:
00007f337edad000 CR3:
00000001e00f6002 CR4:
00000000003606e0
[192922.924329] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[192922.924711] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[192922.925105] Call Trace:
[192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
[192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
[192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs]
[192922.926731] do_fsync+0x38/0x60
[192922.927138] __x64_sys_fdatasync+0x13/0x20
[192922.927543] do_syscall_64+0x60/0x1c0
[192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe
(...)
[192922.934077] ---[ end trace
f00808b12068168f ]---
2) If evict_refill_and_join() decides to commit the transaction, it will
be able to do it, since the nested transaction join only increments the
transaction handle's ->use_count reference counter and it does not
prevent the transaction from getting committed. This means that after
eviction completes, the fsync logging path will be using a transaction
handle that refers to an already committed transaction. What happens
when using such a stale transaction can be unpredictable, we are at
least having a use-after-free on the transaction handle itself, since
the transaction commit will call kmem_cache_free() against the handle
regardless of its ->use_count value, or we can end up silently losing
all the updates to the log tree after that iput() in the logging path,
or using a transaction handle that in the meanwhile was allocated to
another task for a new transaction, etc, pretty much unpredictable
what can happen.
In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).
The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Igor Mammedov [Wed, 11 Sep 2019 07:52:18 +0000 (03:52 -0400)]
KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
kvm_s390_vm_start_migration(), kernel will oops with:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address:
0000000000000000 TEID:
0000000000000483
Fault in home space mode while using kernel ASCE.
AS:
0000000002a2000b R2:
00000001bff8c00b R3:
00000001bff88007 S:
00000001bff91000 P:
000000000000003d
Oops: 0004 ilc:2 [#1] SMP
...
Call Trace:
([<
001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
[<
001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
[<
001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
[<
00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
[<
00000000008bb284>] ksys_ioctl+0x8c/0xb8
[<
00000000008bb2e2>] sys_ioctl+0x32/0x40
[<
000000000175552c>] system_call+0x2b8/0x2d8
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<
0000000000dbaf60>] __memset+0xc/0xa0
due to ms->dirty_bitmap being NULL, which might crash the host.
Make sure that ms->dirty_bitmap is set before using it or
return -EINVAL otherwise.
Cc: <stable@vger.kernel.org>
Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots")
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Navid Emamdoost [Wed, 11 Sep 2019 15:09:02 +0000 (10:09 -0500)]
net: qrtr: fix memort leak in qrtr_tun_write_iter
In qrtr_tun_write_iter the allocated kbuf should be release in case of
error or success return.
v2 Update: Thanks to David Miller for pointing out the release on success
path as well.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subash Abhinov Kasiviswanathan [Tue, 10 Sep 2019 20:02:57 +0000 (14:02 -0600)]
net: Fix null de-reference of device refcount
In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.
However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-
44837.761523: <6> Unable to handle kernel paging request at
virtual address
0000004a88287000
44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393: <2> Call trace:
44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8
44837.762404: <2> in_dev_rcu_put+0x24/0x30
44837.762412: <2> rcu_nocb_kthread+0x43c/0x468
44837.762418: <2> kthread+0x118/0x128
44837.762424: <2> ret_from_fork+0x10/0x1c
Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.
Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Tue, 10 Sep 2019 11:29:59 +0000 (13:29 +0200)]
ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
The '.exit' functions from 'pernet_operations' structure should be marked
as __net_exit, not __net_init.
Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Tue, 10 Sep 2019 10:56:57 +0000 (18:56 +0800)]
tun: fix use-after-free when register netdev failed
I got a UAF repport in tun driver when doing fuzzy test:
[ 466.269490] ==================================================================
[ 466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[ 466.271806] Read of size 8 at addr
ffff888372139250 by task tun-test/2699
[ 466.271810]
[ 466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted
5.3.0-rc1-00001-g5a9433db2614-dirty #427
[ 466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 466.271838] Call Trace:
[ 466.271858] dump_stack+0xca/0x13e
[ 466.271871] ? tun_chr_read_iter+0x2ca/0x2d0
[ 466.271890] print_address_description+0x79/0x440
[ 466.271906] ? vprintk_func+0x5e/0xf0
[ 466.271920] ? tun_chr_read_iter+0x2ca/0x2d0
[ 466.271935] __kasan_report+0x15c/0x1df
[ 466.271958] ? tun_chr_read_iter+0x2ca/0x2d0
[ 466.271976] kasan_report+0xe/0x20
[ 466.271987] tun_chr_read_iter+0x2ca/0x2d0
[ 466.272013] do_iter_readv_writev+0x4b7/0x740
[ 466.272032] ? default_llseek+0x2d0/0x2d0
[ 466.272072] do_iter_read+0x1c5/0x5e0
[ 466.272110] vfs_readv+0x108/0x180
[ 466.299007] ? compat_rw_copy_check_uvector+0x440/0x440
[ 466.299020] ? fsnotify+0x888/0xd50
[ 466.299040] ? __fsnotify_parent+0xd0/0x350
[ 466.299064] ? fsnotify_first_mark+0x1e0/0x1e0
[ 466.304548] ? vfs_write+0x264/0x510
[ 466.304569] ? ksys_write+0x101/0x210
[ 466.304591] ? do_preadv+0x116/0x1a0
[ 466.304609] do_preadv+0x116/0x1a0
[ 466.309829] do_syscall_64+0xc8/0x600
[ 466.309849] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 466.309861] RIP: 0033:0x4560f9
[ 466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[ 466.309889] RSP: 002b:
00007ffffa5166e8 EFLAGS:
00000206 ORIG_RAX:
0000000000000127
[ 466.322992] RAX:
ffffffffffffffda RBX:
0000000000400460 RCX:
00000000004560f9
[ 466.322999] RDX:
0000000000000003 RSI:
00000000200008c0 RDI:
0000000000000003
[ 466.323007] RBP:
00007ffffa516700 R08:
0000000000000004 R09:
0000000000000000
[ 466.323014] R10:
0000000000000000 R11:
0000000000000206 R12:
000000000040cb10
[ 466.323021] R13:
0000000000000000 R14:
00000000006d7018 R15:
0000000000000000
[ 466.323057]
[ 466.323064] Allocated by task 2605:
[ 466.335165] save_stack+0x19/0x80
[ 466.336240] __kasan_kmalloc.constprop.8+0xa0/0xd0
[ 466.337755] kmem_cache_alloc+0xe8/0x320
[ 466.339050] getname_flags+0xca/0x560
[ 466.340229] user_path_at_empty+0x2c/0x50
[ 466.341508] vfs_statx+0xe6/0x190
[ 466.342619] __do_sys_newstat+0x81/0x100
[ 466.343908] do_syscall_64+0xc8/0x600
[ 466.345303] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 466.347034]
[ 466.347517] Freed by task 2605:
[ 466.348471] save_stack+0x19/0x80
[ 466.349476] __kasan_slab_free+0x12e/0x180
[ 466.350726] kmem_cache_free+0xc8/0x430
[ 466.351874] putname+0xe2/0x120
[ 466.352921] filename_lookup+0x257/0x3e0
[ 466.354319] vfs_statx+0xe6/0x190
[ 466.355498] __do_sys_newstat+0x81/0x100
[ 466.356889] do_syscall_64+0xc8/0x600
[ 466.358037] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 466.359567]
[ 466.360050] The buggy address belongs to the object at
ffff888372139100
[ 466.360050] which belongs to the cache names_cache of size 4096
[ 466.363735] The buggy address is located 336 bytes inside of
[ 466.363735] 4096-byte region [
ffff888372139100,
ffff88837213a100)
[ 466.367179] The buggy address belongs to the page:
[ 466.368604] page:
ffffea000dc84e00 refcount:1 mapcount:0 mapping:
ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[ 466.371582] flags: 0x2fffff80010200(slab|head)
[ 466.372910] raw:
002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[ 466.375209] raw:
0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 466.377778] page dumped because: kasan: bad access detected
[ 466.379730]
[ 466.380288] Memory state around the buggy address:
[ 466.381844]
ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 466.384009]
ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 466.386131] >
ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 466.388257] ^
[ 466.390234]
ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 466.392512]
ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 466.394667] ==================================================================
tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():
CPUA CPUB
tun_set_iff()
alloc_netdev_mqs()
tun_attach()
tun_chr_read_iter()
tun_get()
tun_do_read()
tun_ring_recv()
register_netdevice() <-- inject error
goto err_detach
tun_detach_all() <-- set RCV_SHUTDOWN
free_netdev() <-- called from
err_free_dev path
netdev_freemem() <-- free the memory
without check refcount
(In this path, the refcount cannot prevent
freeing the memory of dev, and the memory
will be used by dev_put() called by
tun_chr_read_iter() on CPUB.)
(Break from tun_ring_recv(),
because RCV_SHUTDOWN is set)
tun_put()
dev_put() <-- use the memory
freed by netdev_freemem()
Put the publishing of tfile->tun after register_netdevice(),
so tun_get() won't get the tun pointer that freed by
err_detach path if register_netdevice() failed.
Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 12 Sep 2019 10:07:31 +0000 (11:07 +0100)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Last minute bugfixes.
A couple of security things.
And an error handling bugfix that is never encountered by most people,
but that also makes it kind of safe to push at the last minute, and it
helps push the fix to stable a bit sooner"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: make sure log_num < in_num
vhost: block speculation of translated descriptors
virtio_ring: fix unmap of indirect descriptors
Linus Torvalds [Thu, 12 Sep 2019 10:04:50 +0000 (11:04 +0100)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"Fix an initialization bug in the hw-breakpoints, which triggered on
the ARM platform"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
Linus Torvalds [Thu, 12 Sep 2019 10:02:00 +0000 (11:02 +0100)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Ingo Molnar:
"Fix a race in the IRQ resend mechanism, which can result in a NULL
dereference crash"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Prevent NULL pointer dereference in resend_irqs()
Linus Torvalds [Thu, 12 Sep 2019 09:58:47 +0000 (10:58 +0100)]
Merge tag 'pinctrl-v5.3-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"Hopefully last pin control fix: a single patch for some Aspeed
problems. The BMCs are much happier now"
* tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: aspeed: Fix spurious mux failures on the AST2500
Linus Torvalds [Thu, 12 Sep 2019 08:53:38 +0000 (09:53 +0100)]
Merge tag 'gpio-v5.3-6' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"I don't really like to send so many fixes at the very last minute, but
the bug-sport activity is unpredictable.
Four fixes, three are -stable material that will go everywhere, one is
for the current cycle:
- An ACPI DSDT error fixup of the type we always see and Hans
invariably gets to fix.
- A OF quirk fix for the current release (v5.3)
- Some consistency checks on the userspace ABI.
- A memory leak"
* tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
gpiolib: of: fix fallback quirks handling
gpio: fix line flag validation in lineevent_create
gpio: fix line flag validation in linehandle_create
gpio: mockup: add missing single_release()
Andrew Jeffery [Thu, 29 Aug 2019 07:17:38 +0000 (16:47 +0930)]
pinctrl: aspeed: Fix spurious mux failures on the AST2500
Commit
674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
was determined to be a partial fix to the problem of acquiring the LPC
Host Controller and GFX regmaps: The AST2500 pin controller may need to
fetch syscon regmaps during expression evaluation as well as when
setting mux state. For example, this case is hit by attempting to export
pins exposing the LPC Host Controller as GPIOs.
An optional eval() hook is added to the Aspeed pinmux operation struct
and called from aspeed_sig_expr_eval() if the pointer is set by the
SoC-specific driver. This enables the AST2500 to perform the custom
action of acquiring its regmap dependencies as required.
John Wang tested the fix on an Inspur FP5280G2 machine (AST2500-based)
where the issue was found, and I've booted the fix on Witherspoon
(AST2500) and Palmetto (AST2400) machines, and poked at relevant pins
under QEMU by forcing mux configurations via devmem before exporting
GPIOs to exercise the driver.
Fixes: 7d29ed88acbb ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers")
Fixes: 674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps")
Reported-by: John Wang <wangzqbj@inspur.com>
Tested-by: John Wang <wangzqbj@inspur.com>
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20190829071738.2523-1-andrew@aj.id.au
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
David S. Miller [Wed, 11 Sep 2019 23:05:52 +0000 (00:05 +0100)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-09-11
This series contains fixes to ixgbe.
Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
value that was either 0 or something less than 10 which was causing
issues with hardware features, like RSC, that do not function well with
ITR values that low.
Ilya Maximets fixes the ixgbe driver to limit the number of transmit
descriptors to clean by the number of transmit descriptors used in the
transmit ring, so that the driver does not try to "double" clean the
same descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Mon, 9 Sep 2019 20:56:02 +0000 (16:56 -0400)]
tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.
Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.
The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone. After this fix, the
sender properly reflects the CE bit and does not set the W bit.
Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).
Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
yongduan [Wed, 11 Sep 2019 09:44:24 +0000 (17:44 +0800)]
vhost: make sure log_num < in_num
The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.
As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.
There's no need to log when desc.len = 0, so just don't increment log_num
in this case.
Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: ruippan <ruippan@tencent.com>
Signed-off-by: yongduan <yongduan@tencent.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Michael S. Tsirkin [Sun, 8 Sep 2019 11:04:08 +0000 (07:04 -0400)]
vhost: block speculation of translated descriptors
iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.
Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.
Following the defence in depth principle, make sure
the address is not validated out of node range.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Ilya Maximets [Thu, 22 Aug 2019 17:12:37 +0000 (20:12 +0300)]
ixgbe: fix double clean of Tx descriptors with xdp
Tx code doesn't clear the descriptors' status after cleaning.
So, if the budget is larger than number of used elems in a ring, some
descriptors will be accounted twice and xsk_umem_complete_tx will move
prod_tail far beyond the prod_head breaking the completion queue ring.
Fix that by limiting the number of descriptors to clean by the number
of used descriptors in the Tx ring.
'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
'next_to_clean' and 'next_to_use' indexes.
CC: stable@vger.kernel.org
Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Tested-by: William Tu <u9012063@gmail.com>
Tested-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 4 Sep 2019 15:07:11 +0000 (08:07 -0700)]
ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
There were a couple cases where the ITR value generated via the adaptive
ITR scheme could exceed 126. This resulted in the value becoming either 0
or something less than 10. Switching back and forth between a value less
than 10 and a value greater than 10 can cause issues as certain hardware
features such as RSC to not function well when the ITR value has dropped
that low.
CC: stable@vger.kernel.org
Fixes: b4ded8327fea ("ixgbe: Update adaptive ITR algorithm")
Reported-by: Gregg Leventhal <gleventhal@janestreet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Colin Ian King [Wed, 11 Sep 2019 14:18:11 +0000 (15:18 +0100)]
mlx4: fix spelling mistake "veify" -> "verify"
There is a spelling mistake in a mlx4_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 11 Sep 2019 14:08:16 +0000 (15:08 +0100)]
net: hns3: fix spelling mistake "undeflow" -> "underflow"
There is a spelling mistake in a .msg literal string. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 11 Sep 2019 11:37:34 +0000 (12:37 +0100)]
net: lmc: fix spelling mistake "runnin" -> "running"
There is a spelling mistake in the lmc_trace message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 11 Sep 2019 10:38:48 +0000 (11:38 +0100)]
NFC: st95hf: fix spelling mistake "receieve" -> "receive"
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ka-Cheong Poon [Wed, 11 Sep 2019 09:58:05 +0000 (02:58 -0700)]
net/rds: An rds_sock is added too early to the hash table
In rds_bind(), an rds_sock is added to the RDS bind hash table before
rs_transport is set. This means that the socket can be found by the
receive code path when rs_transport is NULL. And the receive code
path de-references rs_transport for congestion update check. This can
cause a panic. An rds_sock should not be added to the bind hash table
before all the needed fields are set.
Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jouni Malinen [Wed, 11 Sep 2019 13:03:05 +0000 (16:03 +0300)]
mac80211: Do not send Layer 2 Update frame before authorization
The Layer 2 Update frame is used to update bridges when a station roams
to another AP even if that STA does not transmit any frames after the
reassociation. This behavior was described in IEEE Std 802.11F-2003 as
something that would happen based on MLME-ASSOCIATE.indication, i.e.,
before completing 4-way handshake. However, this IEEE trial-use
recommended practice document was published before RSN (IEEE Std
802.11i-2004) and as such, did not consider RSN use cases. Furthermore,
IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been
maintained amd should not be used anymore.
Sending out the Layer 2 Update frame immediately after association is
fine for open networks (and also when using SAE, FT protocol, or FILS
authentication when the station is actually authenticated by the time
association completes). However, it is not appropriate for cases where
RSN is used with PSK or EAP authentication since the station is actually
fully authenticated only once the 4-way handshake completes after
authentication and attackers might be able to use the unauthenticated
triggering of Layer 2 Update frame transmission to disrupt bridge
behavior.
Fix this by postponing transmission of the Layer 2 Update frame from
station entry addition to the point when the station entry is marked
authorized. Similarly, send out the VLAN binding update only if the STA
entry has already been authorized.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Drake [Thu, 5 Sep 2019 04:55:57 +0000 (12:55 +0800)]
Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
This reverts commit
414126f9e5abf1973c661d24229543a9458fa8ce.
This commit broke eMMC storage access on a new consumer MiniPC based on
AMD SoC, which has eMMC connected to:
02:00.0 SD Host controller: O2 Micro, Inc. Device 8620 (rev 01) (prog-if 01)
Subsystem: O2 Micro, Inc. Device 0002
During probe, several errors are seen including:
mmc1: Got data interrupt 0x02000000 even though no data operation was in progress.
mmc1: Timeout waiting for hardware interrupt.
mmc1: error -110 whilst initialising MMC card
Reverting this commit allows the eMMC storage to be detected & usable
again.
Signed-off-by: Daniel Drake <drake@endlessm.com>
Fixes: 414126f9e5ab ("mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host
controller")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Stefan Wahren [Sun, 8 Sep 2019 07:45:52 +0000 (09:45 +0200)]
Revert "mmc: bcm2835: Terminate timeout work synchronously"
The commit
37fefadee8bb ("mmc: bcm2835: Terminate timeout work
synchronously") causes lockups in case of hardware timeouts due the
timeout work also calling cancel_delayed_work_sync() on its own.
So revert it.
Fixes: 37fefadee8bb ("mmc: bcm2835: Terminate timeout work synchronously")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Hans de Goede [Tue, 27 Aug 2019 20:28:35 +0000 (22:28 +0200)]
gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
Another day; another DSDT bug we need to workaround...
Since commit
ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events
at least once on boot") we call _AEI edge handlers at boot.
In some rare cases this causes problems. One example of this is the Minix
Neo Z83-4 mini PC, this device has a clear DSDT bug where it has some copy
and pasted code for dealing with Micro USB-B connector host/device role
switching, while the mini PC does not even have a micro-USB connector.
This code, which should not be there, messes with the DDC data pin from
the HDMI connector (switching it to GPIO mode) breaking HDMI support.
To avoid problems like this, this commit adds a new
gpiolib_acpi.run_edge_events_on_boot kernel commandline option, which
allows disabling the running of _AEI edge event handlers at boot.
The default value is -1/auto which uses a DMI based blacklist, the initial
version of this blacklist contains the Neo Z83-4 fixing the HDMI breakage.
Cc: stable@vger.kernel.org
Cc: Daniel Drake <drake@endlessm.com>
Cc: Ian W MORRISON <ianwmorrison@gmail.com>
Reported-by: Ian W MORRISON <ianwmorrison@gmail.com>
Suggested-by: Ian W MORRISON <ianwmorrison@gmail.com>
Fixes: ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events at least once on boot")
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20190827202835.213456-1-hdegoede@redhat.com
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Randy Dunlap [Mon, 9 Sep 2019 21:54:21 +0000 (14:54 -0700)]
lib/Kconfig: fix OBJAGG in lib/ menu structure
Keep the "Library routines" menu intact by moving OBJAGG into it.
Otherwise OBJAGG is displayed/presented as an orphan in the
various config menus.
Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Wed, 11 Sep 2019 01:36:23 +0000 (09:36 +0800)]
net: sonic: replace dev_kfree_skb in sonic_send_packet
sonic_send_packet will be processed in irq or non-irq
context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.
Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Navid Emamdoost [Tue, 10 Sep 2019 23:01:40 +0000 (18:01 -0500)]
wimax: i2400: fix memory leak
In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
skb response.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 9 Sep 2019 07:33:29 +0000 (15:33 +0800)]
sctp: fix the missing put_user when dumping transport thresholds
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump
a transport thresholds info.
Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds.
Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>