Eric Biggers [Thu, 12 Apr 2018 15:48:09 +0000 (11:48 -0400)]
ext4: prevent right-shifting extents beyond EXT_MAX_BLOCKS
During the "insert range" fallocate operation, extents starting at the
range offset are shifted "right" (to a higher file offset) by the range
length. But, as shown by syzbot, it's not validated that this doesn't
cause extents to be shifted beyond EXT_MAX_BLOCKS. In that case
->ee_block can wrap around, corrupting the extent tree.
Fix it by returning an error if the space between the end of the last
extent and EXT4_MAX_BLOCKS is smaller than the range being inserted.
This bug can be reproduced by running the following commands when the
current directory is on an ext4 filesystem with a 4k block size:
fallocate -l 8192 file
fallocate --keep-size -o 0xfffffffe000 -l 4096 -n file
fallocate --insert-range -l 8192 file
Then after unmounting the filesystem, e2fsck reports corruption.
Reported-by: syzbot+06c885be0edcdaeab40c@syzkaller.appspotmail.com
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Mon, 2 Apr 2018 03:21:03 +0000 (23:21 -0400)]
ext4: force revalidation of directory pointer after seekdir(2)
A malicious user could force the directory pointer to be in an invalid
spot by using seekdir(2). Use the mechanism we already have to notice
if the directory has changed since the last time we called
ext4_readdir() to force a revalidation of the pointer.
Reported-by: syzbot+1236ce66f79263e8a862@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Sat, 31 Mar 2018 00:04:11 +0000 (20:04 -0400)]
ext4: add extra checks to ext4_xattr_block_get()
Add explicit checks in ext4_xattr_block_get() just in case the
e_value_offs and e_value_size fields in the the xattr block are
corrupted in memory after the buffer_verified bit is set on the xattr
block.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Theodore Ts'o [Sat, 31 Mar 2018 00:00:56 +0000 (20:00 -0400)]
ext4: add bounds checking to ext4_xattr_find_entry()
Add some paranoia checks to make sure we don't stray beyond the end of
the valid memory region containing ext4 xattr entries while we are
scanning for a match.
Also rename the function to xattr_find_entry() since it is static and
thus only used in fs/ext4/xattr.c
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Theodore Ts'o [Fri, 30 Mar 2018 19:42:25 +0000 (15:42 -0400)]
ext4: move call to ext4_error() into ext4_xattr_check_block()
Refactor the call to EXT4_ERROR_INODE() into ext4_xattr_check_block().
This simplifies the code, and fixes a problem where not all callers of
ext4_xattr_check_block() were not resulting in ext4_error() getting
called when the xattr block is corrupted.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Tyson Nottingham [Fri, 30 Mar 2018 04:56:10 +0000 (00:56 -0400)]
ext4: don't show data=<mode> option if defaulted
Previously, mount -l would show data=<mode> even if the ext4 default
journaling mode was being used. Change this to be consistent with the
rest of the options.
Ext4 already did the right thing when the journaling mode being used
matched the one specified in the superblock's default mount options. The
reason it failed to do the right thing for the ext4 defaults is that,
when set, they were never included in sbi->s_def_mount_opt (unlike the
superblock's defaults, which were).
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tyson Nottingham [Fri, 30 Mar 2018 04:53:33 +0000 (00:53 -0400)]
ext4: omit init_itable=n in procfs when disabled
Don't show init_itable=n in /proc/fs/ext4/<dev>/options when filesystem
is mounted with noinit_itable.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tyson Nottingham [Fri, 30 Mar 2018 04:51:10 +0000 (00:51 -0400)]
ext4: show more binary mount options in procfs
Previously, /proc/fs/ext4/<dev>/options would only show binary options
if they were set (1 in the options bit mask). E.g. it would show "grpid"
if it was set, but it would not show "nogrpid" if grpid was not set.
This seems sensible, but when an option is absent from the file, it can
be hard for the unfamiliar to know what is being used. E.g. if there
isn't a (no)grpid entry, nogrpid is in effect. But if there isn't a
(no)auto_da_alloc entry, auto_da_alloc is in effect. If there isn't a
(minixdf|bsddf) entry, it turns out bsddf is in effect. It all depends
on how the option is implemented.
It's clearer to be explicit, so print the corresponding option
regardless of whether it means a 1 or a 0 in the bit mask.
Note that options which do not have an explicit disable option aren't
indicated as being disabled even with this change (e.g. dax).
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tyson Nottingham [Fri, 30 Mar 2018 04:41:34 +0000 (00:41 -0400)]
ext4: simplify kobject usage
Replace kset with generic kobject provided by kobject_create_and_add(),
since the latter is sufficient.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tyson Nottingham [Fri, 30 Mar 2018 04:13:10 +0000 (00:13 -0400)]
ext4: remove unused parameters in sysfs code
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tyson Nottingham [Fri, 30 Mar 2018 04:03:38 +0000 (00:03 -0400)]
ext4: null out kobject* during sysfs cleanup
Make cleanup of ext4_feat kobject consistent with similar objects.
Signed-off-by: Tyson Nottingham <tgnottingham@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Fri, 30 Mar 2018 02:10:35 +0000 (22:10 -0400)]
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
If some metadata block, such as an allocation bitmap, overlaps the
superblock, it's very likely that if the file system is mounted
read/write, the results will not be pretty. So disallow r/w mounts
for file systems corrupted in this particular way.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Fri, 30 Mar 2018 02:10:31 +0000 (22:10 -0400)]
ext4: always initialize the crc32c checksum driver
The extended attribute code now uses the crc32c checksum for hashing
purposes, so we should just always always initialize it. We also want
to prevent NULL pointer dereferences if one of the metadata checksum
features is enabled after the file sytsem is originally mounted.
This issue has been assigned CVE-2018-1094.
https://bugzilla.kernel.org/show_bug.cgi?id=199183
https://bugzilla.redhat.com/show_bug.cgi?id=
1560788
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Fri, 30 Mar 2018 01:56:09 +0000 (21:56 -0400)]
ext4: fail ext4_iget for root directory if unallocated
If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.
This issue has been assigned CVE-2018-1092.
https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=
1560777
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Eric Biggers [Thu, 29 Mar 2018 18:31:42 +0000 (14:31 -0400)]
ext4: limit xattr size to INT_MAX
ext4 isn't validating the sizes of xattrs where the value of the xattr
is stored in an external inode. This is problematic because
->e_value_size is a u32, but ext4_xattr_get() returns an int. A very
large size is misinterpreted as an error code, which ext4_get_acl()
translates into a bogus ERR_PTR() for which IS_ERR() returns false,
causing a crash.
Fix this by validating that all xattrs are <= INT_MAX bytes.
This issue has been assigned CVE-2018-1095.
https://bugzilla.kernel.org/show_bug.cgi?id=199185
https://bugzilla.redhat.com/show_bug.cgi?id=
1560793
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
Theodore Ts'o [Tue, 27 Mar 2018 03:54:10 +0000 (23:54 -0400)]
ext4: add validity checks for bitmap block numbers
An privileged attacker can cause a crash by mounting a crafted ext4
image which triggers a out-of-bounds read in the function
ext4_valid_block_bitmap() in fs/ext4/balloc.c.
This issue has been assigned CVE-2018-1093.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=199181
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1560782
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
zhenwei.pi [Mon, 26 Mar 2018 05:44:03 +0000 (01:44 -0400)]
ext4: fix comments in ext4_swap_extents()
"mark_unwritten" in comment and "unwritten" in the function arguments
is mismatched.
Signed-off-by: zhenwei.pi <zhenwei.pi@youruncloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Goldwyn Rodrigues [Mon, 26 Mar 2018 05:32:50 +0000 (01:32 -0400)]
ext4: use generic_writepages instead of __writepage/write_cache_pages
Code cleanup. Instead of writing an internal static function, use the
available generic_writepages().
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Thu, 22 Mar 2018 15:59:00 +0000 (11:59 -0400)]
ext4: don't complain about incorrect features when probing
If mount is auto-probing for filesystem type, it will try various
filesystems in order, with the MS_SILENT flag set. We get
that flag as the silent arg to ext4_fill_super.
If we're probing (silent==1) then don't complain about feature
incompatibilities that are found if it looks like it's actually
a different valid extN type - failed probes should be silent
in this case.
If the on-disk features are unknown even to ext4, then complain.
Reported-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Nikolay Borisov [Thu, 22 Mar 2018 15:52:10 +0000 (11:52 -0400)]
ext4: remove EXT4_STATE_DIOREAD_LOCK flag
Commit
16c54688592c ("ext4: Allow parallel DIO reads") reworked the way
locking happens around parallel dio reads. This resulted in obviating
the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic.
Currently this amounts to dead code so let's remove it. No functional
changes
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Jiri Slaby [Thu, 22 Mar 2018 15:50:26 +0000 (11:50 -0400)]
ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin()
ext4_iomap_begin() has a bug where offset returned in the iomap
structure will be truncated to unsigned long size. On 64-bit
architectures this is fine but on 32-bit architectures obviously not.
Not many places actually use the offset stored in the iomap structure
but one of visible failures is in SEEK_HOLE / SEEK_DATA implementation.
If we create a file like:
dd if=/dev/urandom of=file bs=1k seek=8m count=1
then
lseek64("file", 0x100000000ULL, SEEK_DATA)
wrongly returns 0x100000000 on unfixed kernel while it should return
0x200000000. Avoid the overflow by proper type cast.
Fixes: 545052e9e35a ("ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.15
Eryu Guan [Thu, 22 Mar 2018 15:44:59 +0000 (11:44 -0400)]
ext4: update i_disksize if direct write past ondisk size
Currently in ext4 direct write path, we update i_disksize only when
new eof is greater than i_size, and don't update it even when new
eof is greater than i_disksize but less than i_size. This doesn't
work well with delalloc buffer write, which updates i_size and
i_disksize only when delalloc blocks are resolved (at writeback
time), the i_disksize from direct write can be lost if a previous
buffer write succeeded at write time but failed at writeback time,
then results in corrupted ondisk inode size.
Consider this case, first buffer write 4k data to a new file at
offset 16k with delayed allocation, then direct write 4k data to the
same file at offset 4k before delalloc blocks are resolved, which
doesn't update i_disksize because it writes within i_size(20k), but
the extent tree metadata has been committed in journal. Then
writeback of the delalloc blocks fails (due to device error etc.),
and i_size/i_disksize from buffer write can't be written to disk
(still zero). A subsequent umount/mount cycle recovers journal and
writes extent tree metadata from direct write to disk, but with
i_disksize being zero.
Fix it by updating i_disksize too in direct write path when new eof
is greater than i_disksize but less than i_size, so i_disksize is
always consistent with direct write.
This fixes occasional i_size corruption in fstests generic/475.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eryu Guan [Thu, 22 Mar 2018 15:41:25 +0000 (11:41 -0400)]
ext4: protect i_disksize update by i_data_sem in direct write path
i_disksize update should be protected by i_data_sem, by either taking
the lock explicitly or by using ext4_update_i_disksize() helper. But the
i_disksize updates in ext4_direct_IO_write() are not protected at all,
which may be racing with i_disksize updates in writeback path in
delalloc buffer write path.
This is found by code inspection, and I didn't hit any i_disksize
corruption due to this bug. Thanks to Jan Kara for catching this bug and
suggesting the fix!
Reported-by: Jan Kara <jack@suse.cz>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 19:16:47 +0000 (14:16 -0500)]
ext4: don't update checksum of new initialized bitmaps
When reading the inode or block allocation bitmap, if the bitmap needs
to be initialized, do not update the checksum in the block group
descriptor. That's because we're not set up to journal those changes.
Instead, just set the verified bit on the bitmap block, so that it's
not necessary to validate the checksum.
When a block or inode allocation actually happens, at that point the
checksum will be calculated, and update of the bg descriptor block
will be properly journalled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 17:22:53 +0000 (12:22 -0500)]
jbd2: if the journal is aborted then don't allow update of the log tail
This updates the jbd2 superblock unnecessarily, and on an abort we
shouldn't truncate the log.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 04:45:18 +0000 (23:45 -0500)]
ext4: pass -ESHUTDOWN code to jbd2 layer
Previously the jbd2 layer assumed that a file system check would be
required after a journal abort. In the case of the deliberate file
system shutdown, this should not be necessary. Allow the jbd2 layer
to distinguish between these two cases by using the ESHUTDOWN errno.
Also add proper locking to __journal_abort_soft().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 04:16:28 +0000 (23:16 -0500)]
ext4: eliminate sleep from shutdown ioctl
The msleep() when processing EXT4_GOING_FLAGS_NOLOGFLUSH was a hack to
avoid some races (that are now fixed), but in fact it introduced its
own race.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 03:07:36 +0000 (22:07 -0500)]
ext4: shutdown should not prevent get_write_access
The ext4 forced shutdown flag needs to prevent new handles from being
started, but it needs to allow existing handles to complete. So the
forced shutdown flag should not force ext4_journal_get_write_access to
fail.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Theodore Ts'o [Mon, 19 Feb 2018 02:33:13 +0000 (21:33 -0500)]
jbd2: clarify bad journal block checksum message
There were two error messages emitted by jbd2, one for a bad checksum
for a jbd2 descriptor block, and one for a bad checksum for a jbd2
data block. Change the data block checksum error so that the two can
be disambiguated.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Mon, 19 Feb 2018 01:53:23 +0000 (20:53 -0500)]
ext4: add tracepoints for shutdown and file system errors
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sun, 11 Feb 2018 23:04:29 +0000 (15:04 -0800)]
Linux 4.16-rc1
Al Viro [Thu, 1 Feb 2018 20:13:18 +0000 (15:13 -0500)]
unify {de,}mangle_poll(), get rid of kernel-side POLL...
except, again, POLLFREE and POLL_BUSY_LOOP.
With this, we finally get to the promised end result:
- POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any
stray instances of ->poll() still using those will be caught by
sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 11 Feb 2018 22:34:03 +0000 (14:34 -0800)]
vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 11 Feb 2018 21:57:19 +0000 (13:57 -0800)]
Merge branch 'work.poll2' of git://git./linux/kernel/git/viro/vfs
Pull more poll annotation updates from Al Viro:
"This is preparation to solving the problems you've mentioned in the
original poll series.
After this series, the kernel is ready for running
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
as a for bulk search-and-replace.
After that, the kernel is ready to apply the patch to unify
{de,}mangle_poll(), and then get rid of kernel-side POLL... uses
entirely, and we should be all done with that stuff.
Basically, that's what you suggested wrt KPOLL..., except that we can
use EPOLL... instead - they already are arch-independent (and equal to
what is currently kernel-side POLL...).
After the preparations (in this series) switch to returning EPOLL...
from ->poll() instances is completely mechanical and kernel-side
POLL... can go away. The last step (killing kernel-side POLL... and
unifying {de,}mangle_poll() has to be done after the
search-and-replace job, since we need userland-side POLL... for
unified {de,}mangle_poll(), thus the cherry-pick at the last step.
After that we will have:
- POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of
->poll() still using those will be caught by sparse.
- eventpoll.c and select.c warning-free wrt __poll_t
- no more kernel-side definitions of POLL... - userland ones are
visible through the entire kernel (and used pretty much only for
mangle/demangle)
- same behavior as after the first series (i.e. sparc et.al. epoll(2)
working correctly)"
* 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
annotate ep_scan_ready_list()
ep_send_events_proc(): return result via esed->res
preparation to switching ->poll() to returning EPOLL...
add EPOLLNVAL, annotate EPOLL... and event_poll->event
use linux/poll.h instead of asm/poll.h
xen: fix poll misannotation
smc: missing poll annotations
Linus Torvalds [Sun, 11 Feb 2018 21:54:52 +0000 (13:54 -0800)]
Merge tag 'xtensa-
20180211' of git://github.com/jcmvbkbc/linux-xtensa
Pull xtense fix from Max Filippov:
"Build fix for xtensa architecture with KASAN enabled"
* tag 'xtensa-
20180211' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix build with KASAN
Linus Torvalds [Sun, 11 Feb 2018 21:52:32 +0000 (13:52 -0800)]
Merge tag 'nios2-v4.16-rc1' of git://git./linux/kernel/git/lftan/nios2
Pull nios2 update from Ley Foon Tan:
- clean up old Kconfig options from defconfig
- remove leading 0x and 0s from bindings notation in dts files
* tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: defconfig: Cleanup from old Kconfig options
nios2: dts: Remove leading 0x and 0s from bindings notation
Max Filippov [Sun, 11 Feb 2018 09:07:54 +0000 (01:07 -0800)]
xtensa: fix build with KASAN
The commit
917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT
usage") removed KASAN_SHADOW_SCALE_SHIFT definition from
include/linux/kasan.h and added it to architecture-specific headers,
except for xtensa. This broke the xtensa build with KASAN enabled.
Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h
Reported by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage")
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Krzysztof Kozlowski [Sun, 11 Feb 2018 15:01:17 +0000 (23:01 +0800)]
nios2: defconfig: Cleanup from old Kconfig options
Remove old, dead Kconfig option INET_LRO. It is gone since
commit
7bbf3cae65b6 ("ipv4: Remove inet_lro library").
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Mathieu Malaterre [Sun, 11 Feb 2018 14:59:18 +0000 (22:59 +0800)]
nios2: dts: Remove leading 0x and 0s from bindings notation
Improve the DTS files by removing all the leading "0x" and zeros to fix the
following dtc warnings:
Warning (unit_address_format): Node /XXX unit name should not have leading "0x"
and
Warning (unit_address_format): Node /XXX unit name should not have leading 0s
Converted using the following command:
find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +
For simplicity, two sed expressions were used to solve each warnings separately.
To make the regex expression more robust a few other issues were resolved,
namely setting unit-address to lower case, and adding a whitespace before the
the opening curly brace:
https://elinux.org/Device_Tree_Linux#Linux_conventions
This is a follow up to commit
4c9847b7375a ("dt-bindings: Remove leading 0x from bindings notation")
Reported-by: David Daney <ddaney@caviumnetworks.com>
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
Linus Torvalds [Sat, 10 Feb 2018 22:08:26 +0000 (14:08 -0800)]
Merge tag 'pci-v4.16-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Fix a POWER9/powernv INTx regression from the merge window (Alexey
Kardashevskiy)"
* tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
powerpc/pci: Fix broken INTx configuration via OF
Linus Torvalds [Sat, 10 Feb 2018 22:05:11 +0000 (14:05 -0800)]
Merge tag 'for-linus-
20180210' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes to round off the merge window on the block side:
- a set of bcache fixes by way of Michael Lyle, from the usual bcache
suspects.
- add a simple-to-hook-into function for bpf EIO error injection.
- fix blk-wbt that mischarectized flushes as reads. Improve the logic
so that flushes and writes are accounted as writes, and only reads
as reads. From me.
- fix requeue crash in BFQ, from Paolo"
* tag 'for-linus-
20180210' of git://git.kernel.dk/linux-block:
block, bfq: add requeue-request hook
bcache: fix for data collapse after re-attaching an attached device
bcache: return attach error when no cache set exist
bcache: set writeback_rate_update_seconds in range [1, 60] seconds
bcache: fix for allocator and register thread race
bcache: set error_limit correctly
bcache: properly set task state in bch_writeback_thread()
bcache: fix high CPU occupancy during journal
bcache: add journal statistic
block: Add should_fail_bio() for bpf error injection
blk-wbt: account flush requests correctly
Linus Torvalds [Sat, 10 Feb 2018 21:55:33 +0000 (13:55 -0800)]
Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86
Pull x86 platform driver updates from Darren Hart:
"Mellanox fixes and new system type support.
Mostly data for new system types with a correction and an
uninitialized variable fix"
[ Pulling from github because git.infradead.org currently seems to be
down for some reason, but Darren had a backup location - Linus ]
* tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86:
platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
platform/x86: mlx-platform: Add support for new msn201x system type
platform/x86: mlx-platform: Add support for new msn274x system type
platform/x86: mlx-platform: Fix power cable setting for msn21xx family
platform/x86: mlx-platform: Add define for the negative bus
platform/x86: mlx-platform: Use defines for bus assignment
platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
Linus Torvalds [Sat, 10 Feb 2018 21:50:23 +0000 (13:50 -0800)]
Merge tag 'chrome-platform-for-linus-4.16' of git://git./linux/kernel/git/bleung/chrome-platform
Pull chrome platform updates from Benson Leung:
- move cros_ec_dev to drivers/mfd
- other small maintenance fixes
[ The cros_ec_dev movement came in earlier through the MFD tree - Linus ]
* tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
platform/chrome: Use proper protocol transfer function
platform/chrome: cros_ec_lpc: Add support for Google Glimmer
platform/chrome: cros_ec_lpc: Register the driver if ACPI entry is missing.
platform/chrome: cros_ec_lpc: remove redundant pointer request
cros_ec: fix nul-termination for firmware build info
platform/chrome: chromeos_laptop: make chromeos_laptop const
Linus Torvalds [Sat, 10 Feb 2018 21:16:35 +0000 (13:16 -0800)]
Merge tag 'kvm-4.16-1' of git://git./virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"ARM:
- icache invalidation optimizations, improving VM startup time
- support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- a small fix for power-management notifiers, and some cosmetic
changes
PPC:
- add MMIO emulation for vector loads and stores
- allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- improve the handling of escalation interrupts with the XIVE
interrupt controller
- support decrement register migration
- various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- exitless interrupts for emulated devices
- cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
AVX512 features
- show vcpu id in its anonymous inode name
- many fixes and cleanups
- per-VCPU MSR bitmaps (already merged through x86/pti branch)
- stable KVM clock when nesting on Hyper-V (merged through
x86/hyperv)"
* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
KVM: PPC: Book3S HV: Branch inside feature section
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
KVM: PPC: Book3S PR: Fix broken select due to misspelling
KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
KVM: PPC: Book3S HV: Drop locks before reading guest memory
kvm: x86: remove efer_reload entry in kvm_vcpu_stat
KVM: x86: AMD Processor Topology Information
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
kvm: embed vcpu id to dentry of vcpu anon inode
kvm: Map PFN-type memory regions as writable (if possible)
x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
KVM: arm/arm64: Fixup userspace irqchip static key optimization
KVM: arm/arm64: Fix userspace_irqchip_in_use counting
KVM: arm/arm64: Fix incorrect timer_is_pending logic
MAINTAINERS: update KVM/s390 maintainers
MAINTAINERS: add Halil as additional vfio-ccw maintainer
MAINTAINERS: add David as a reviewer for KVM/s390
...
Alexey Kardashevskiy [Fri, 9 Feb 2018 06:23:58 +0000 (17:23 +1100)]
powerpc/pci: Fix broken INTx configuration via OF
59f47eff03a0 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper")
replaced of_irq_parse_pci() + irq_create_of_mapping() with
of_irq_parse_and_map_pci(), but neglected to capture the virq
returned by irq_create_of_mapping(), so virq remained zero, which
caused INTx configuration to fail.
Save the virq value returned by of_irq_parse_and_map_pci() and correct
the virq declaration to match the of_irq_parse_and_map_pci() signature.
Fixes: 59f47eff03a0 "powerpc/pci: Use of_irq_parse_and_map_pci() helper"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Linus Torvalds [Sat, 10 Feb 2018 03:32:41 +0000 (19:32 -0800)]
Merge tag 'kbuild-v4.16-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
"Makefile changes:
- enable unused-variable warning that was wrongly disabled for clang
Kconfig changes:
- warn about blank 'help' and fix existing instances
- fix 'choice' behavior to not write out invisible symbols
- fix misc weirdness
Coccinell changes:
- fix false positive of free after managed memory alloc detection
- improve performance of NULL dereference detection"
* tag 'kbuild-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits)
kconfig: remove const qualifier from sym_expand_string_value()
kconfig: add xrealloc() helper
kconfig: send error messages to stderr
kconfig: echo stdin to stdout if either is redirected
kconfig: remove check_stdin()
kconfig: remove 'config*' pattern from .gitignnore
kconfig: show '?' prompt even if no help text is available
kconfig: do not write choice values when their dependency becomes n
coccinelle: deref_null: avoid useless computation
coccinelle: devm_free: reduce false positives
kbuild: clang: disable unused variable warnings only when constant
kconfig: Warn if help text is blank
nios2: kconfig: Remove blank help text
arm: vt8500: kconfig: Remove blank help text
MIPS: kconfig: Remove blank help text
MIPS: BCM63XX: kconfig: Remove blank help text
lib/Kconfig.debug: Remove blank help text
Staging: rtl8192e: kconfig: Remove blank help text
Staging: rtl8192u: kconfig: Remove blank help text
mmc: kconfig: Remove blank help text
...
Al Viro [Sat, 10 Feb 2018 01:35:16 +0000 (01:35 +0000)]
mconsole_proc(): don't mess with file->f_pos
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 10 Feb 2018 03:22:17 +0000 (19:22 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
seq_file: fix incomplete reset on read from zero offset
kernfs: fix regression in kernfs_fop_write caused by wrong type
Masahiro Yamada [Thu, 8 Feb 2018 16:19:08 +0000 (01:19 +0900)]
kconfig: remove const qualifier from sym_expand_string_value()
This function returns realloc'ed memory, so the returned pointer
must be passed to free() when done. So, 'const' qualifier is odd.
It is allowed to modify the expanded string.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Thu, 8 Feb 2018 16:19:07 +0000 (01:19 +0900)]
kconfig: add xrealloc() helper
We already have xmalloc(), xcalloc(). Add xrealloc() as well
to save tedious error handling.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Vadim Pasternak [Fri, 9 Feb 2018 23:59:32 +0000 (23:59 +0000)]
platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems
It adds support for new Mellanox system types of basic classes qmb7, sn34,
sn37, containing systems QMB700 (40x200GbE InfiniBand switch), SN3700
(32x200GbE and 16x400GbE Ethernet switch) and SN3410 (6x400GbE plus
48x50GbE Ethernet switch). These are the Top of the Rack systems, equipped
with Mellanox COM-Express carrier board and switch board with Mellanox
Quantum device, which supports InfiniBand switching with 40X200G ports and
line rate of up to HDR speed or with Mellanox Spectrum-2 device, which
supports Ethernet switching with 32X200G ports line rate of up to HDR
speed.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Vadim Pasternak [Fri, 9 Feb 2018 23:59:31 +0000 (23:59 +0000)]
platform/x86: mlx-platform: Add support for new msn201x system type
It adds support for new Mellanox system types of basic half unit size
class msn201x, containing system MSN2010 (18x10GbE plus 4x4x25GbE) half
and its derivatives. This is the Top of the Rack system, equipped with
Mellanox Small Form Factor carrier board and switch board with Mellanox
Spectrum device, which supports Ethernet switching with 32X100G ports line
rate of up to EDR speed.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Vadim Pasternak [Fri, 9 Feb 2018 23:59:30 +0000 (23:59 +0000)]
platform/x86: mlx-platform: Add support for new msn274x system type
It adds support for new Mellanox system types of basic class msn274x,
containing system MSN2740 (32x100GbE Ethernet switch with cost reduction)
and its derivatives. These are the Top of the Rack system, equipped with
Mellanox Small Form Factor carrier board and switch board with Mellanox
Spectrum device, which supports Ethernet switching with 32X100G ports line
rate of up to EDR speed.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Linus Torvalds [Fri, 9 Feb 2018 23:34:18 +0000 (15:34 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Make allocations less aggressive in x_tables, from Minchal Hocko.
2) Fix netfilter flowtable Kconfig deps, from Pablo Neira Ayuso.
3) Fix connection loss problems in rtlwifi, from Larry Finger.
4) Correct DRAM dump length for some chips in ath10k driver, from Yu
Wang.
5) Fix ABORT handling in rxrpc, from David Howells.
6) Add SPDX tags to Sun networking drivers, from Shannon Nelson.
7) Some ipv6 onlink handling fixes, from David Ahern.
8) Netem packet scheduler interval calcualtion fix from Md. Islam.
9) Don't put crypto buffers on-stack in rxrpc, from David Howells.
10) Fix handling of error non-delivery status in netlink multicast
delivery over multiple namespaces, from Nicolas Dichtel.
11) Missing xdp flush in tuntap driver, from Jason Wang.
12) Synchonize RDS protocol netns/module teardown with rds object
management, from Sowini Varadhan.
13) Add nospec annotations to mpls, from Dan Williams.
14) Fix SKB truesize handling in TIPC, from Hoang Le.
15) Interrupt masking fixes in stammc from Niklas Cassel.
16) Don't allow ptr_ring objects to be sized outside of kmalloc's
limits, from Jason Wang.
17) Don't allow SCTP chunks to be built which will have a length
exceeding the chunk header's 16-bit length field, from Alexey
Kodanev.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
ibmvnic: Remove skb->protocol checks in ibmvnic_xmit
bpf: fix rlimit in reuseport net selftest
sctp: verify size of a new chunk in _sctp_make_chunk()
s390/qeth: fix SETIP command handling
s390/qeth: fix underestimated count of buffer elements
ptr_ring: try vmalloc() when kmalloc() fails
ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE
net: stmmac: remove redundant enable of PMT irq
net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4
net: stmmac: discard disabled flags in interrupt status register
ibmvnic: Reset long term map ID counter
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
selftests/bpf: add selftest that use test_libbpf_open
selftests/bpf: add test program for loading BPF ELF files
tools/libbpf: improve the pr_debug statements to contain section numbers
bpf: Sync kernel ABI header with tooling header for bpf_common.h
net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
net: thunder: change q_len's type to handle max ring size
tipc: fix skb truesize/datasize ratio control
net/sched: cls_u32: fix cls_u32 on filter replace
...
Linus Torvalds [Fri, 9 Feb 2018 22:55:30 +0000 (14:55 -0800)]
Merge tag 'nfs-for-4.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull more NFS client updates from Trond Myklebust:
"A few bugfixes and some small sunrpc latency/performance improvements
before the merge window closes:
Stable fixes:
- fix an incorrect calculation of the RDMA send scatter gather
element limit
- fix an Oops when attempting to free resources after RDMA device
removal
Bugfixes:
- SUNRPC: Ensure we always release the TCP socket in a timely fashion
when the connection is shut down.
- SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context
Latency/Performance:
- SUNRPC: Queue latency sensitive socket tasks to the less contended
xprtiod queue
- SUNRPC: Make the xprtiod workqueue unbounded.
- SUNRPC: Make the rpciod workqueue unbounded"
* tag 'nfs-for-4.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context
fix parallelism for rpc tasks
Make the xprtiod workqueue unbounded.
SUNRPC: Queue latency-sensitive socket tasks to xprtiod
SUNRPC: Ensure we always close the socket after a connection shuts down
xprtrdma: Fix BUG after a device removal
xprtrdma: Fix calculation of ri_max_send_sges
Linus Torvalds [Fri, 9 Feb 2018 22:49:46 +0000 (14:49 -0800)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"The highlights include:
- numerous target-core-user improvements related to queue full and
timeout handling. (MNC)
- prevent target-core-user corruption when invalid data page is
requested. (MNC)
- add target-core device action configfs attributes to allow
user-space to trigger events separate from existing attributes
exposed to end-users. (MNC)
- fix iscsi-target NULL pointer dereference 4.6+ regression in CHAP
error path. (David Disseldorp)
- avoid target-core backend UNMAP callbacks if range is zero. (Andrei
Vagin)
- fix a iscsi-target 4.14+ regression related multiple PDU logins,
that was exposed due to removal of TCP prequeue support. (Florian
Westphal + MNC)
Also, there is a iser-target bug still being worked on for post -rc1
code to address a long standing issue resulting in persistent
ib_post_send() failures, for RNICs with small max_send_sge"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (36 commits)
iscsi-target: make sure to wake up sleeping login worker
tcmu: Fix trailing semicolon
tcmu: fix cmd user after free
target: fix destroy device in target_configure_device
tcmu: allow userspace to reset ring
target core: add device action configfs files
tcmu: fix error return code in tcmu_configure_device()
target_core_user: add cmd id to broken ring message
target: add SAM_STAT_BUSY sense reason
tcmu: prevent corruption when invalid data page requested
target: don't call an unmap callback if a range length is zero
target/iscsi: avoid NULL dereference in CHAP auth error path
cxgbit: call neigh_event_send() to update MAC address
target: tcm_loop: Use seq_puts() in tcm_loop_show_info()
target: tcm_loop: Delete an unnecessary return statement in tcm_loop_submission_work()
target: tcm_loop: Delete two unnecessary variable initialisations in tcm_loop_issue_tmr()
target: tcm_loop: Combine substrings for 26 messages
target: tcm_loop: Improve a size determination in two functions
target: tcm_loop: Delete an error message for a failed memory allocation in four functions
sbp-target: Delete an error message for a failed memory allocation in three functions
...
Linus Torvalds [Fri, 9 Feb 2018 22:47:09 +0000 (14:47 -0800)]
Merge tag 'trace-v4.16-rc1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Al Viro discovered some breakage with the parsing of the
set_ftrace_filter as well as the removing of function probes.
This fixes the code with Al's suggestions. I also added a few
selftests to test the broken cases such that they wont happen
again"
* tag 'trace-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests/ftrace: Add more tests for removing of function probes
selftests/ftrace: Add some missing glob checks
selftests/ftrace: Have reset_ftrace_filter handle multiple instances
selftests/ftrace: Have reset_ftrace_filter handle modules
tracing: Fix parsing of globs with a wildcard at the beginning
ftrace: Remove incorrect setting of glob search field
Linus Torvalds [Fri, 9 Feb 2018 22:42:57 +0000 (14:42 -0800)]
Merge tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"There are a couple additional security fixes that are still being
tested that are not in this set."
* tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
Add missing structs and defines from recent SMB3.1.1 documentation
address lock imbalance warnings in smbdirect.c
cifs: silence compiler warnings showing up with gcc-8.0.0
Add some missing debug fields in server and tcon structs
Linus Torvalds [Fri, 9 Feb 2018 22:40:16 +0000 (14:40 -0800)]
Merge tag 'fbdev-v4.16-fix' of git://github.com/bzolnier/linux
Pull fbdev fix from Bartlomiej Zolnierkiewicz:
"Fix building of the omapfb driver (Tomi Valkeinen)"
* tag 'fbdev-v4.16-fix' of git://github.com/bzolnier/linux:
video: omapfb: fix missing #includes
Radim Krčmář [Fri, 9 Feb 2018 20:36:57 +0000 (21:36 +0100)]
Merge tag 'kvm-ppc-next-4.16-2' of git://git./linux/kernel/git/paulus/powerpc
Second PPC KVM update for 4.16
Seven fixes that are either trivial or that address bugs that people
are actually hitting. The main ones are:
- Drop spinlocks before reading guest memory
- Fix a bug causing corruption of VCPU state in PR KVM with preemption
enabled
- Make HPT resizing work on POWER9
- Add MMIO emulation for vector loads and stores, because guests now
use these instructions in memcpy and similar routines.
Radim Krčmář [Fri, 2 Feb 2018 17:26:58 +0000 (18:26 +0100)]
Merge branch 'msr-bitmaps' of git://git./virt/kvm/kvm
This topic branch allocates separate MSR bitmaps for each VCPU.
This is required for the IBRS enablement to choose, on a per-VM
basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs;
the IBRS enablement comes in through the tip tree.
John Allen [Fri, 9 Feb 2018 19:19:46 +0000 (13:19 -0600)]
ibmvnic: Remove skb->protocol checks in ibmvnic_xmit
Having these checks in ibmvnic_xmit causes problems with VLAN
tagging and balance-alb/tlb bonding modes. The restriction they
imposed can be removed.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 9 Feb 2018 13:49:44 +0000 (14:49 +0100)]
bpf: fix rlimit in reuseport net selftest
Fix two issues in the reuseport_bpf selftests that were
reported by Linaro CI:
[...]
+ ./reuseport_bpf
---- IPv4 UDP ----
Testing EBPF mod 10...
Reprograming, testing mod 5...
./reuseport_bpf: ebpf error. log:
0: (bf) r6 = r1
1: (20) r0 = *(u32 *)skb[0]
2: (97) r0 %= 10
3: (95) exit
processed 4 insns
: Operation not permitted
+ echo FAIL
[...]
---- IPv4 TCP ----
Testing EBPF mod 10...
./reuseport_bpf: failed to bind send socket: Address already in use
+ echo FAIL
[...]
For the former adjust rlimit since this was the cause of
failure for loading the BPF prog, and for the latter add
SO_REUSEADDR.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: https://bugs.linaro.org/show_bug.cgi?id=3502
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Fri, 9 Feb 2018 14:35:23 +0000 (17:35 +0300)]
sctp: verify size of a new chunk in _sctp_make_chunk()
When SCTP makes INIT or INIT_ACK packet the total chunk length
can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
transmitting these packets, e.g. the crash on sending INIT_ACK:
[ 597.804948] skbuff: skb_over_panic: text:
00000000ffae06e4 len:120168
put:120156 head:
000000007aa47635 data:
00000000d991c2de
tail:0x1d640 end:0xfec0 dev:<NULL>
...
[ 597.976970] ------------[ cut here ]------------
[ 598.033408] kernel BUG at net/core/skbuff.c:104!
[ 600.314841] Call Trace:
[ 600.345829] <IRQ>
[ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
[ 600.436934] skb_put+0x16c/0x200
[ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp]
[ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp]
[ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
[ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
[ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp]
[ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp]
[ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
[ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp]
[ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
[ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
[ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
[ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp]
[ 601.233575] sctp_do_sm+0x182/0x560 [sctp]
[ 601.284328] ? sctp_has_association+0x70/0x70 [sctp]
[ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp]
[ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp]
...
Here the chunk size for INIT_ACK packet becomes too big, mostly
because of the state cookie (INIT packet has large size with
many address parameters), plus additional server parameters.
Later this chunk causes the panic in skb_put_data():
skb_packet_transmit()
sctp_packet_pack()
skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
'nskb' (head skb) was previously allocated with packet->size
from u16 'chunk->chunk_hdr->length'.
As suggested by Marcelo we should check the chunk's length in
_sctp_make_chunk() before trying to allocate skb for it and
discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Feb 2018 19:30:23 +0000 (14:30 -0500)]
Merge branch 's390-qeth-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2018-02-09
please apply the following two qeth patches for 4.16 and stable.
One restricts a command quirk to the intended commandd type,
while the other fixes an off-by-one during data transmission
that can cause qeth to build malformed buffer descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Fri, 9 Feb 2018 10:03:50 +0000 (11:03 +0100)]
s390/qeth: fix SETIP command handling
send_control_data() applies some special handling to SETIP v4 IPA
commands. But current code parses *all* command types for the SETIP
command code. Limit the command code check to IPA commands.
Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 9 Feb 2018 10:03:49 +0000 (11:03 +0100)]
s390/qeth: fix underestimated count of buffer elements
For a memory range/skb where the last byte falls onto a page boundary
(ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the
calculation currently doesn't round up to the next PFN due to an
off-by-one error.
Thus qeth believes that the skb occupies one page less than it
actually does, and may select a IO buffer that doesn't have enough spare
buffer elements to fit all of the skb's data.
HW detects this as a malformed buffer descriptor, and raises an
exception which then triggers device recovery.
Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 9 Feb 2018 09:45:50 +0000 (17:45 +0800)]
ptr_ring: try vmalloc() when kmalloc() fails
This patch switch to use kvmalloc_array() for using a vmalloc()
fallback to help in case kmalloc() fails.
Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 9 Feb 2018 09:45:49 +0000 (17:45 +0800)]
ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE
To avoid slab to warn about exceeded size, fail early if queue
occupies more than KMALLOC_MAX_SIZE.
Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com
Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Feb 2018 19:23:04 +0000 (14:23 -0500)]
Merge branch 'stmmac-irq-fixes-cleanups'
Niklas Cassel says:
====================
stmmac irq fixes/cleanups
A couple of small stmmac irq fixes/cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Fri, 9 Feb 2018 16:22:47 +0000 (17:22 +0100)]
net: stmmac: remove redundant enable of PMT irq
For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes
GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt
is set, and if so, setting the bit again.
For dwmac1000, GMAC_INT_DEFAULT_MASK does not include
GMAC_INT_DISABLE_PMT, so it is redundant to check if
hw->pmt is set, and if so, clearing an already cleared bit.
Improve code readability by removing this redundant code.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Fri, 9 Feb 2018 16:22:46 +0000 (17:22 +0100)]
net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4
GMAC_INT_DEFAULT_MASK is written to the interrupt enable register.
In previous versions of the IP (e.g. dwmac1000), this register was
instead an interrupt mask register.
To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK
to GMAC_INT_DEFAULT_ENABLE.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Fri, 9 Feb 2018 16:22:45 +0000 (17:22 +0100)]
net: stmmac: discard disabled flags in interrupt status register
The interrupt status register in both dwmac1000 and dwmac4 ignores
interrupt enable (for dwmac4) / interrupt mask (for dwmac1000).
Therefore, if we want to check only the bits that can actually trigger
an irq, we have to filter the interrupt status register manually.
Commit
0a764db10337 ("stmmac: Discard masked flags in interrupt status
register") fixed this for dwmac1000. Fix the same issue for dwmac4.
Just like commit
0a764db10337 ("stmmac: Discard masked flags in
interrupt status register"), this makes sure that we do not get
spurious link up/link down prints.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 9 Feb 2018 17:41:09 +0000 (11:41 -0600)]
ibmvnic: Reset long term map ID counter
When allocating RX or TX buffer pools, the driver needs to provide a
unique mapping ID to firmware for each pool. This value is assigned
using a counter which is incremented after a new pool is created. The
ID can be an integer ranging from 1-255. When migrating to a device
that requests a different number of queues, this value was not being
reset properly. As a result, after enough migrations, the counter
exceeded the upper bound and pool creation failed. This is fixed by
resetting the counter to one in this case.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Feb 2018 19:05:10 +0000 (14:05 -0500)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-02-09
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Two fixes for BPF sockmap in order to break up circular map references
from programs attached to sockmap, and detaching related sockets in
case of socket close() event. For the latter we get rid of the
smap_state_change() and plug into ULP infrastructure, which will later
also be used for additional features anyway such as TX hooks. For the
second issue, dependency chain is broken up via map release callback
to free parse/verdict programs, all from John.
2) Fix a libbpf relocation issue that was found while implementing XDP
support for Suricata project. Issue was that when clang was invoked
with default target instead of bpf target, then various other e.g.
debugging relevant sections are added to the ELF file that contained
relocation entries pointing to non-BPF related sections which libbpf
trips over instead of skipping them. Test cases for libbpf are added
as well, from Jesper.
3) Various misc fixes for bpftool and one for libbpf: a small addition
to libbpf to make sure it recognizes all standard section prefixes.
Then, the Makefile in bpftool/Documentation is improved to explicitly
check for rst2man being installed on the system as we otherwise risk
installing empty man pages; the man page for bpftool-map is corrected
and a set of missing bash completions added in order to avoid shipping
bpftool where the completions are only partially working, from Quentin.
4) Fix applying the relocation to immediate load instructions in the
nfp JIT which were missing a shift, from Jakub.
5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y
gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL
in this case; and explicitly delete the veth devices in the two tests
test_xdp_{meta,redirect}.sh before dismantling the netnses as when
selftests are run in batch mode, then workqueue to handle destruction
might not have finished yet and thus veth creation in next test under
same dev name would fail, from Yonghong.
6) Fix test_kmod.sh to check the test_bpf.ko module path before performing
an insmod, and fallback to modprobe. Especially the latter is useful
when having a device under test that has the modules installed instead,
from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 9 Feb 2018 18:07:39 +0000 (10:07 -0800)]
Merge tag 'for-linus-4.16-rc1-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Only five small fixes for issues when running under Xen"
* tag 'for-linus-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix {set,clear}_foreign_p2m_mapping on autotranslating guests
pvcalls-back: do not return error on inet_accept EAGAIN
xen-netfront: Fix race between device setup and open
xen/grant-table: Use put_page instead of free_page
x86/xen: init %gs very early to avoid page faults with stack protector
Linus Torvalds [Fri, 9 Feb 2018 17:58:37 +0000 (09:58 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
"The main thing in this merge is the defense for the Spectre
vulnerabilities. But there are other updates as well, the changes in
more detail:
- An s390 specific implementation of the array_index_mask_nospec
function to the defense against spectre v1
- Two patches to utilize the new PPA-12/PPA-13 instructions to run
the kernel and/or user space with reduced branch predicton.
- The s390 variant of the 'retpoline' spectre v2 defense called
'expoline'. There is no return instruction for s390, instead an
indirect branch is used for function return
The s390 defense mechanism for indirect branches works by using an
execute-type instruction with the indirect branch as the target of
the execute. In effect that turns off the prediction for the
indirect branch.
- Scrub registers in entry.S that contain user controlled values to
prevent the speculative use of these values.
- Re-add the second parameter for the s390 specific runtime
instrumentation system call and move the header file to uapi. The
second parameter will continue to do nothing but older kernel
versions only accepted valid real-time signal numbers. The details
will be documented in the man-page for the system call.
- Corrections and improvements for the s390 specific documentation
- Add a line to /proc/sysinfo to display the CPU model dependent
license-internal-code identifier
- A header file include fix for eadm.
- An error message fix in the kprobes code.
- The removal of an outdated ARCH_xxx select statement"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kconfig: Remove ARCH_WANTS_PROT_NUMA_PROT_NONE select
s390: introduce execute-trampolines for branches
s390: run user space and KVM guests with modified branch prediction
s390: add options to change branch prediction behaviour for the kernel
s390/alternative: use a copy of the facility bit mask
s390: add optimized array_index_mask_nospec
s390: scrub registers on kernel entry and KVM exit
s390/cio: fix kernel-doc usage
s390/runtime_instrumentation: re-add signum system call parameter
s390/cpum_cf: correct counter number of LAST_HOST_TRANSLATIONS
s390/kprobes: Fix %p uses in error messages
s390/runtime instrumentation: provide uapi header file
s390/sysinfo: add and display licensed internal code identifier
s390/docs: reword airq section
s390/docs: mention subchannel types
s390/cmf: fix kerneldoc
s390/eadm: fix CONFIG_BLOCK include dependency
Linus Torvalds [Fri, 9 Feb 2018 17:44:25 +0000 (09:44 -0800)]
Merge tag 'acpi-part2-4.16-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These are mostly fixes and cleanups, a few new quirks, a couple of
updates related to the handling of ACPI tables and ACPICA copyrights
refreshment.
Specifics:
- Update the ACPICA kernel code to upstream revision
20180105
including:
* Assorted fixes (Jung-uk Kim)
* Support for X32 ABI compilation (Anuj Mittal)
* Update of ACPICA copyrights to 2018 (Bob Moore)
- Prepare for future modifications to avoid executing the _STA
control method too early (Hans de Goede)
- Make the processor performance control library code ignore _PPC
notifications if they cannot be handled and fix up the C1 idle
state definition when it is used as a fallback state (Chen Yu,
Yazen Ghannam)
- Make it possible to use the SPCR table on x86 and to replace the
original IORT table with a new one from initrd (Prarit Bhargava,
Shunyong Yang)
- Add battery-related quirks for Asus UX360UA and UX410UAK and add
quirks for table parsing on Dell XPS 9570 and Precision M5530 (Kai
Heng Feng)
- Address static checker warnings in the CPPC code (Gustavo Silva)
- Avoid printing a raw pointer to the kernel log in the smart battery
driver (Greg Kroah-Hartman)"
* tag 'acpi-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: sbshc: remove raw pointer from printk() message
ACPI: SPCR: Make SPCR available to x86
ACPI / CPPC: Use 64-bit arithmetic instead of 32-bit
ACPI / tables: Add IORT to injectable table list
ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530
ACPICA: Update version to
20180105
ACPICA: All acpica: Update copyrights to 2018
ACPI / processor: Set default C1 idle state description
ACPI / battery: Add quirk for Asus UX360UA and UX410UAK
ACPI: processor_perflib: Do not send _PPC change notification if not ready
ACPI / scan: Use acpi_bus_get_status() to initialize ACPI_TYPE_DEVICE devs
ACPI / bus: Do not call _STA on battery devices with unmet dependencies
PCI: acpiphp_ibm: prepare for acpi_get_object_info() no longer returning status
ACPI: export acpi_bus_get_status_handle()
ACPICA: Add a missing pair of parentheses
ACPICA: Prefer ACPI_TO_POINTER() over ACPI_ADD_PTR()
ACPICA: Avoid NULL pointer arithmetic
ACPICA: Linux: add support for X32 ABI compilation
ACPI / video: Use true for boolean value
Linus Torvalds [Fri, 9 Feb 2018 17:40:33 +0000 (09:40 -0800)]
Merge tag 'pm-part2-4.16-rc1' of git://git./linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are mostly fixes and cleanups and removal of the no longer
needed at32ap-cpufreq driver.
Specifics:
- Drop the at32ap-cpufreq driver which is useless after the removal
of the corresponding arch (Corentin LABBE).
- Fix a regression from the 4.14 cycle in the APM idle driver by
making it initialize the polling state properly (Rafael Wysocki).
- Fix a crash on failing system suspend due to a missing check in the
cpufreq core (Bo Yan).
- Make the intel_pstate driver initialize the hardware-managed
P-state control (HWP) feature on CPU0 upon resume from system
suspend if HWP had been enabled before the system was suspended
(Chen Yu).
- Fix up the SCPI cpufreq driver after recent changes (Sudeep Holla,
Wei Yongjun).
- Avoid pointer subtractions during frequency table walks in cpufreq
(Dominik Brodowski).
- Avoid the check for ProcFeedback in ST/CZ in the cpufreq driver for
AMD processors and add a MODULE_ALIAS for cpufreq on ARM IMX (Akshu
Agrawal, Nicolas Chauvet).
- Fix the prototype of swsusp_arch_resume() on x86 (Arnd Bergmann).
- Fix up the parsing of power domains DT data (Ulf Hansson)"
* tag 'pm-part2-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
arm: imx: Add MODULE_ALIAS for cpufreq
cpufreq: Add and use cpufreq_for_each_{valid_,}entry_idx()
cpufreq: intel_pstate: Enable HWP during system resume on CPU0
cpufreq: scpi: fix error return code in scpi_cpufreq_init()
x86: hibernate: fix swsusp_arch_resume() prototype
PM / domains: Fix up domain-idle-states OF parsing
cpufreq: scpi: fix static checker warning cdev isn't an ERR_PTR
cpufreq: remove at32ap-cpufreq
cpufreq: AMD: Ignore the check for ProcFeedback in ST/CZ
x86: PM: Make APM idle driver initialize polling state
cpufreq: Skip cpufreq resume if it's not suspended
Trond Myklebust [Fri, 9 Feb 2018 14:39:42 +0000 (09:39 -0500)]
SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context
Calling __UDPX_INC_STATS() from a preemptible context leads to a
warning of the form:
BUG: using __this_cpu_add() in preemptible [
00000000] code: kworker/u5:0/31
caller is xs_udp_data_receive_workfn+0x194/0x270
CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted
4.15.0-rc8-00076-g90ea9f1 #2
Workqueue: xprtiod xs_udp_data_receive_workfn
Call Trace:
dump_stack+0x85/0xc1
check_preemption_disabled+0xce/0xe0
xs_udp_data_receive_workfn+0x194/0x270
process_one_work+0x318/0x620
worker_thread+0x20a/0x390
? process_one_work+0x620/0x620
kthread+0x120/0x130
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Since we're taking a spinlock in those functions anyway, let's fix the
issue by moving the call so that it occurs under the spinlock.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tomi Valkeinen [Fri, 9 Feb 2018 13:43:49 +0000 (14:43 +0100)]
video: omapfb: fix missing #includes
The omapfb driver fails to build after commit
23c35f48f5fb
("pinctrl: remove include file from <linux/device.h>") because it
relies on the <linux/pinctrl/consumer.h> and <linux/seq_file.h>
being pulled in by the <linux/device.h> header implicitly.
Include these headers explicitly to avoid the build failures.
Fixes: 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>")
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
[b.zolnierkie: fix include order and patch description]
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Jose Ricardo Ziviani [Sat, 3 Feb 2018 20:24:26 +0000 (18:24 -0200)]
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
This patch provides the MMIO load/store vector indexed
X-Form emulation.
Instructions implemented:
lvx: the quadword in storage addressed by the result of EA &
0xffff_ffff_ffff_fff0 is loaded into VRT.
stvx: the contents of VRS are stored into the quadword in storage
addressed by the result of EA & 0xffff_ffff_ffff_fff0.
Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Alexander Graf [Thu, 8 Feb 2018 17:38:53 +0000 (18:38 +0100)]
KVM: PPC: Book3S HV: Branch inside feature section
We ended up with code that did a conditional branch inside a feature
section to code outside of the feature section. Depending on how the
object file gets organized, that might mean we exceed the 14bit
relocation limit for conditional branches:
arch/powerpc/kvm/built-in.o:arch/powerpc/kvm/book3s_hv_rmhandlers.S:416:(__ftr_alt_97+0x8): relocation truncated to fit: R_PPC64_REL14 against `.text'+1ca4
So instead of doing a conditional branch outside of the feature section,
let's just jump at the end of the same, making the branch very short.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
David Gibson [Fri, 2 Feb 2018 03:29:08 +0000 (14:29 +1100)]
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
This adds code to enable the HPT resizing code to work on POWER9,
which uses a slightly modified HPT entry format compared to POWER8.
On POWER9, we convert HPTEs read from the HPT from the new format to
the old format so that the rest of the HPT resizing code can work as
before. HPTEs written to the new HPT are converted to the new format
as the last step before writing them into the new HPT.
This takes out the checks added by commit
bcd3bb63dbc8 ("KVM: PPC:
Book3S HV: Disable HPT resizing on POWER9 for now", 2017-02-18),
now that HPT resizing works on POWER9.
On POWER9, when we pivot to the new HPT, we now call
kvmppc_setup_partition_table() to update the partition table in order
to make the hardware use the new HPT.
[paulus@ozlabs.org - added kvmppc_setup_partition_table() call,
wrote commit message.]
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras [Wed, 7 Feb 2018 08:49:54 +0000 (19:49 +1100)]
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
This fixes the computation of the HPTE index to use when the HPT
resizing code encounters a bolted HPTE which is stored in its
secondary HPTE group. The code inverts the HPTE group number, which
is correct, but doesn't then mask it with new_hash_mask. As a result,
new_pteg will be effectively negative, resulting in new_hptep
pointing before the new HPT, which will corrupt memory.
In addition, this removes two BUG_ON statements. The condition that
the BUG_ONs were testing -- that we have computed the hash value
incorrectly -- has never been observed in testing, and if it did
occur, would only affect the guest, not the host. Given that
BUG_ON should only be used in conditions where the kernel (i.e.
the host kernel, in this case) can't possibly continue execution,
it is not appropriate here.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Vadim Pasternak [Fri, 2 Feb 2018 08:45:47 +0000 (08:45 +0000)]
platform/x86: mlx-platform: Fix power cable setting for msn21xx family
Add dedicated structure with power cable setting for Mellanox msn21xx
family. These systems do not have a physical device for the power unit
controller. When the power cable is inserted or removed, the relevant
interrupt signal is handled, the status is updated, but no device is
associated with the signal.
Add definition for interrupt low aggregation signal. On system from
msn21xx family, low aggregation mask should be removed in order to allow
signal to hit CPU.
Fixes: 6613d18e9038 ("platform/x86: mlx-platform: Move module from arch/x86")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Vadim Pasternak [Fri, 2 Feb 2018 08:45:46 +0000 (08:45 +0000)]
platform/x86: mlx-platform: Add define for the negative bus
Add define for the negative bus ID in order to use it in case no hotplug
device is associated with the hotplug interrupt signal. In this case,
the signal will be handled by the mlxreg-hotplug driver, but no device
will be associated with the signal.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Vadim Pasternak [Fri, 2 Feb 2018 08:45:45 +0000 (08:45 +0000)]
platform/x86: mlx-platform: Use defines for bus assignment
Add defines for the bus IDs, used for hotplug device topology to improve
code readability. Defines added for FAN and power units.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Geert Uytterhoeven [Wed, 7 Feb 2018 09:12:04 +0000 (10:12 +0100)]
platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
With gcc-4.1.2:
drivers/platform/mellanox/mlxreg-hotplug.c: In function ‘mlxreg_hotplug_health_work_helper’:
drivers/platform/mellanox/mlxreg-hotplug.c:347: warning: ‘ret’ is used uninitialized in this function
Indeed, if mlxreg_core_item.count is zero, ret is used uninitialized.
While this is unlikely to happen (it is set to ARRAY_SIZE(...) in x86
board files), this is done in another source file, so fix this by
preinitializing ret to zero.
Fixes: c6acad68eb2dbffd ("platform/mellanox: mlxreg-hotplug: Modify to use a regmap interface")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
Daniel Borkmann [Thu, 8 Feb 2018 23:26:18 +0000 (00:26 +0100)]
Merge branch 'bpf-libbpf-relo-fix-and-tests'
Jesper Dangaard Brouer says:
====================
While playing with using libbpf for the Suricata project, we had
issues LLVM >= 4.0.1 generating ELF files that could not be loaded
with libbpf (tools/lib/bpf/).
During the troubleshooting phase, I wrote a test program and improved
the debugging output in libbpf. I turned this into a selftests
program, and it also serves as a code example for libbpf in itself.
I discovered that there are at least three ELF load issues with
libbpf. I left them as TODO comments in (tools/testing/selftests/bpf)
test_libbpf.sh. I've only fixed the load issue with eh_frames, and
other types of relo-section that does not have exec flags. We can
work on the other issues later.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:32 +0000 (12:48 +0100)]
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
V3: More generic skipping of relo-section (suggested by Daniel)
If clang >= 4.0.1 is missing the option '-target bpf', it will cause
llc/llvm to create two ELF sections for "Exception Frames", with
section names '.eh_frame' and '.rel.eh_frame'.
The BPF ELF loader library libbpf fails when loading files with these
sections. The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
handle this gracefully. And iproute2 loader also seems to work with these
"eh" sections.
The issue in libbpf is caused by bpf_object__elf_collect() skipping
some sections, and later when performing relocation it will be
pointing to a skipped section, as these sections cannot be found by
bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
This is a general issue that also occurs for other sections, like
debug sections which are also skipped and can have relo section.
As suggested by Daniel. To avoid keeping state about all skipped
sections, instead perform a direct qlookup in the ELF object. Lookup
the section that the relo-section points to and check if it contains
executable machine instructions (denoted by the sh_flags
SHF_EXECINSTR). Use this check to also skip irrelevant relo-sections.
Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
due to incompatibility with asm embedded headers, that some of the samples
include. This is explained in more details by Yonghong Song in bpf_devel_QA.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:27 +0000 (12:48 +0100)]
selftests/bpf: add selftest that use test_libbpf_open
This script test_libbpf.sh will be part of the 'make run_tests'
invocation, but can also be invoked manually in this directory,
and a verbose mode can be enabled via setting the environment
variable $VERBOSE like:
$ VERBOSE=yes ./test_libbpf.sh
The script contains some tests that are commented out, as they
currently fail. They are reminders about what we need to improve
for the libbpf loader library.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:22 +0000 (12:48 +0100)]
selftests/bpf: add test program for loading BPF ELF files
V2: Moved program into selftests/bpf from tools/libbpf
This program can be used on its own for testing/debugging if a
BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf).
If something is wrong with the ELF object, the program have
a --debug mode that will display the ELF sections and especially
the skipped sections. This allows for quickly identifying the
problematic ELF section number, which can be corrolated with the
readelf tool.
The program signal error via return codes, and also have
a --quiet mode, which is practical for use in scripts like
selftests/bpf.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:17 +0000 (12:48 +0100)]
tools/libbpf: improve the pr_debug statements to contain section numbers
While debugging a bpf ELF loading issue, I needed to correlate the
ELF section number with the failed relocation section reference.
Thus, add section numbers/index to the pr_debug.
In debug mode, also print section that were skipped. This helped
me identify that a section (.eh_frame) was skipped, and this was
the reason the relocation section (.rel.eh_frame) could not find
that section number.
The section numbers corresponds to the readelf tools Section Headers [Nr].
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:12 +0000 (12:48 +0100)]
bpf: Sync kernel ABI header with tooling header for bpf_common.h
I recently fixed up a lot of commits that forgot to keep the tooling
headers in sync. And then I forgot to do the same thing in commit
cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
that before people notice ;-).
Lawrence did partly fix/sync this for bpf.h in commit
d6d4f60c3a09
("bpf: add selftest for tcpbpf").
Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linus Torvalds [Thu, 8 Feb 2018 23:18:32 +0000 (15:18 -0800)]
Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
"A fairly small update this time around. Some cleanup, RDMA fixes,
overlayfs fixes, and a fix for an NFSv4 state bug.
The bigger deal for nfsd this time around was Jeff Layton's
already-merged i_version patches"
* tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux:
svcrdma: Fix Read chunk round-up
NFSD: hide unused svcxdr_dupstr()
nfsd: store stat times in fill_pre_wcc() instead of inode times
nfsd: encode stat->mtime for getattr instead of inode->i_mtime
nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
nfsd4: don't set lock stateid's sc_type to CLOSED
nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
sunrpc: remove dead code in svc_sock_setbufsize
svcrdma: Post Receives in the Receive completion handler
nfsd4: permit layoutget of executable-only files
lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
lockd: convert nlm_lockowner.count from atomic_t to refcount_t
lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
Linus Torvalds [Thu, 8 Feb 2018 22:39:29 +0000 (14:39 -0800)]
Merge branch 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-dax
Pull idr updates from Matthew Wilcox:
- test-suite improvements
- replace the extended API by improving the normal API
- performance improvement for IDRs which are 1-based rather than
0-based
- add documentation
* 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-dax:
idr: Add documentation
idr: Make 1-based IDRs more efficient
idr: Warn if old iterators see large IDs
idr: Rename idr_for_each_entry_ext
idr: Remove idr_alloc_ext
cls_u32: Convert to idr_alloc_u32
cls_u32: Reinstate cyclic allocation
cls_flower: Convert to idr_alloc_u32
cls_bpf: Convert to use idr_alloc_u32
cls_basic: Convert to use idr_alloc_u32
cls_api: Convert to idr_alloc_u32
net sched actions: Convert to use idr_alloc_u32
idr: Add idr_alloc_u32 helper
idr: Delete idr_find_ext function
idr: Delete idr_replace_ext function
idr: Delete idr_remove_ext function
IDR test suite: Check handling negative end correctly
idr test suite: Fix ida_test_random()
radix tree test suite: Remove ARRAY_SIZE
Linus Torvalds [Thu, 8 Feb 2018 22:37:32 +0000 (14:37 -0800)]
Merge tag 'gcc-plugins-v4.16-rc1' of git://git./linux/kernel/git/kees/linux
Pull gcc plugins updates from Kees Cook:
- update includes for gcc 8 (Valdis Kletnieks)
- update initializers for gcc 8
* tag 'gcc-plugins-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: Use dynamic initializers
gcc-plugins: Add include required by GCC release 8
Olga Kornievskaia [Thu, 29 Jun 2017 13:25:36 +0000 (09:25 -0400)]
fix parallelism for rpc tasks
Hi folks,
On a multi-core machine, is it expected that we can have parallel RPCs
handled by each of the per-core workqueue?
In testing a read workload, observing via "top" command that a single
"kworker" thread is running servicing the requests (no parallelism).
It's more prominent while doing these operations over krb5p mount.
What has been suggested by Bruce is to try this and in my testing I
see then the read workload spread among all the kworker threads.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Heiner Kallweit [Thu, 8 Feb 2018 20:01:48 +0000 (21:01 +0100)]
net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT
This condition wasn't adjusted when PHY_IGNORE_INTERRUPT (-2) was added
long ago. In case of PHY_IGNORE_INTERRUPT the MAC interrupt indicates
also PHY state changes and we should do what the symbol says.
Fixes: 84a527a41f38 ("net: phylib: fix interrupts re-enablement in phy_start")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>