Linus Torvalds [Sun, 17 Feb 2019 01:33:39 +0000 (17:33 -0800)]
Merge tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull more NFS client fixes from Anna Schumaker:
"Three fixes this time.
Nicolas's is for xprtrdma completion vector allocation on single-core
systems. Greg's adds an error check when allocating a debugfs dentry.
And Ben's is an additional fix for nfs_page_async_flush() to prevent
pages from accidentally getting truncated.
Summary:
- Make sure Send CQ is allocated on an existing compvec
- Properly check debugfs dentry before using it
- Don't use page_file_mapping() after removing a page"
* tag 'nfs-for-5.0-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Don't use page_file_mapping after removing the page
rpc: properly check debugfs dentry before using it
xprtrdma: Make sure Send CQ is allocated on an existing compvec
Linus Torvalds [Sun, 17 Feb 2019 01:31:36 +0000 (17:31 -0800)]
Merge tag 'auxdisplay-for-linus-v5.0-rc7' of git://github.com/ojeda/linux
Pull auxdisplay fix from Miguel Ojeda:
"Fix potential user-after-free on ht16k33 module unload. Reported by
Sven Van Asbroeck"
* tag 'auxdisplay-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
auxdisplay: ht16k33: fix potential user-after-free on module unload
Linus Torvalds [Sat, 16 Feb 2019 18:28:05 +0000 (10:28 -0800)]
Merge tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux
Pull compiler attributes fixes from Miguel Ojeda:
"Clean the new GCC 9 -Wmissing-attributes warnings
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
These patch series clean these new warnings. Most of them are caused
by the module_init/exit macros"
Link: https://lore.kernel.org/lkml/20190125104353.2791-1-labbott@redhat.com/
* tag 'compiler-attributes-for-linus-v5.0-rc7' of git://github.com/ojeda/linux:
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
Compiler Attributes: add support for __copy (gcc >= 9)
lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
Linus Torvalds [Fri, 15 Feb 2019 21:36:43 +0000 (13:36 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two fairly small fixes: the qla one is a panic inducing use after free
and the entropy fix may seem minor but it has had huge userspace
impact thanks to an unrelated change in openssl that causes sshd to
refuse logins until it has enough entropy for the session keys, which
causes tens of minutes delay before the affected systems allow logins
after reboot"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
scsi: sd: fix entropy gathering for most rotational disks
Miguel Ojeda [Sat, 19 Jan 2019 19:59:34 +0000 (20:59 +0100)]
include/linux/module.h: copy __init/__exit attrs to init/cleanup_module
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers for all the init/cleanup_module
aliases in the kernel (defined by the module_init/exit macros),
ending up being very noisy.
These aliases point to the __init/__exit functions of a module,
which are defined as __cold (among other attributes). However,
the aliases themselves do not have the __cold attribute.
Since the compiler behaves differently when compiling a __cold
function as well as when compiling paths leading to calls
to __cold functions, the warning is trying to point out
the possibly-forgotten attribute in the alias.
In order to keep the warning enabled, we decided to silence
this case. Ideally, we would mark the aliases directly
as __init/__exit. However, there are currently around 132 modules
in the kernel which are missing __init/__exit in their init/cleanup
functions (either because they are missing, or for other reasons,
e.g. the functions being called from somewhere else); and
a section mismatch is a hard error.
A conservative alternative was to mark the aliases as __cold only.
However, since we would like to eventually enforce __init/__exit
to be always marked, we chose to use the new __copy function
attribute (introduced by GCC 9 as well to deal with this).
With it, we copy the attributes used by the target functions
into the aliases. This way, functions that were not marked
as __init/__exit won't have their aliases marked either,
and therefore there won't be a section mismatch.
Note that the warning would go away marking either the extern
declaration, the definition, or both. However, we only mark
the definition of the alias, since we do not want callers
(which only see the declaration) to be compiled as if the function
was __cold (and therefore the paths leading to those calls
would be assumed to be unlikely).
Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Acked-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Miguel Ojeda [Fri, 8 Feb 2019 22:51:05 +0000 (23:51 +0100)]
Compiler Attributes: add support for __copy (gcc >= 9)
From the GCC manual:
copy
copy(function)
The copy attribute applies the set of attributes with which function
has been declared to the declaration of the function to which
the attribute is applied. The attribute is designed for libraries
that define aliases or function resolvers that are expected
to specify the same set of attributes as their targets. The copy
attribute can be used with functions, variables, or types. However,
the kind of symbol to which the attribute is applied (either
function or variable) must match the kind of symbol to which
the argument refers. The copy attribute copies only syntactic and
semantic attributes but not attributes that affect a symbol’s
linkage or visibility such as alias, visibility, or weak.
The deprecated attribute is also not copied.
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target, e.g.:
void __cold f(void) {}
void __alias("f") g(void);
diagnoses:
warning: 'g' specifies less restrictive attribute than
its target 'f': 'cold' [-Wmissing-attributes]
Using __copy(f) we can copy the __cold attribute from f to g:
void __cold f(void) {}
void __copy(f) __alias("f") g(void);
This attribute is most useful to deal with situations where an alias
is declared but we don't know the exact attributes the target has.
For instance, in the kernel, the widely used module_init/exit macros
define the init/cleanup_module aliases, but those cannot be marked
always as __init/__exit since some modules do not have their
functions marked as such.
Suggested-by: Martin Sebor <msebor@gcc.gnu.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Miguel Ojeda [Thu, 24 Jan 2019 14:59:11 +0000 (15:59 +0100)]
lib/crc32.c: mark crc32_le_base/__crc32c_le_base aliases as __pure
The upcoming GCC 9 release extends the -Wmissing-attributes warnings
(enabled by -Wall) to C and aliases: it warns when particular function
attributes are missing in the aliases but not in their target.
In particular, it triggers here because crc32_le_base/__crc32c_le_base
aren't __pure while their target crc32_le/__crc32c_le are.
These aliases are used by architectures as a fallback in accelerated
versions of CRC32. See commit
9784d82db3eb ("lib/crc32: make core crc32()
routines weak so they can be overridden").
Therefore, being fallbacks, it is likely that even if the aliases
were called from C, there wouldn't be any optimizations possible.
Currently, the only user is arm64, which calls this from asm.
Still, marking the aliases as __pure makes sense and is a good idea
for documentation purposes and possible future optimizations,
which also silences the warning.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Miguel Ojeda [Fri, 8 Feb 2019 23:38:45 +0000 (00:38 +0100)]
auxdisplay: ht16k33: fix potential user-after-free on module unload
On module unload/remove, we need to ensure that work does not run
after we have freed resources. Concretely, cancel_delayed_work()
may return while the callback function is still running.
From kernel/workqueue.c:
The work callback function may still be running on return,
unless it returns true and the work doesn't re-arm itself.
Explicitly flush or use cancel_delayed_work_sync() to wait on it.
Link: https://lore.kernel.org/lkml/20190204220952.30761-1-TheSven73@googlemail.com/
Reported-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Acked-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Linus Torvalds [Fri, 15 Feb 2019 17:12:28 +0000 (09:12 -0800)]
Merge tag 'for-linus-
20190215' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Ensure we insert into the hctx dispatch list, if a request is marked
as DONTPREP (Jianchao)
- NVMe pull request, single missing unlock on error fix (Keith)
- MD pull request, single fix for a potentially data corrupting issue
(Nate)
- Floppy check_events regression fix (Yufen)
* tag 'for-linus-
20190215' of git://git.kernel.dk/linux-block:
md/raid1: don't clear bitmap bits on interrupted recovery.
floppy: check_events callback should not return a negative number
nvme-pci: add missing unlock for reset error
blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
Linus Torvalds [Fri, 15 Feb 2019 16:50:48 +0000 (08:50 -0800)]
Merge tag 'for-5.0/dm-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix bug in DM crypt's sizing of its block integrity tag space,
resulting in less memory use when DM crypt layers on DM integrity.
- Fix a long-standing DM thinp crash consistency bug that was due to
improper handling of FUA. This issue is specific to writes that fill
an entire thinp block which needs to be allocated.
* tag 'for-5.0/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: fix bug where bio that overwrites thin block ignores FUA
dm crypt: don't overallocate the integrity tag space
Linus Torvalds [Fri, 15 Feb 2019 16:45:28 +0000 (08:45 -0800)]
Merge tag 'mmc-v5.0-rc5' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes intended for v5.0-rc7.
MMC core:
- Fix deadlock bug for block I/O requests
MMC host:
- sunxi: Disable broken HS-DDR mode for H5 by default
- sunxi: Avoid unsupported speed modes declared via DT
- meson-gx: Restore interrupt name"
* tag 'mmc-v5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: fix interrupt name
mmc: block: handle complete_work on separate workqueue
mmc: sunxi: Filter out unsupported modes declared in the device tree
mmc: sunxi: Disable HS-DDR mode for H5 eMMC controller by default
Linus Torvalds [Fri, 15 Feb 2019 16:20:33 +0000 (08:20 -0800)]
Merge tag 'drm-fixes-2019-02-15-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Usual pull request, little larger than I'd like but nothing too
strange in it. Willy found an bug in the lease ioctl calculations, but
it's a drm master only ioctl which makes it harder to mess with.
i915:
- combo phy programming fix
- opregion version check fix for VBT RVDA lookup
- gem mmap ioctl race fix
- fbdev hpd during suspend fix
- array size bounds check fix in pmu
amdgpu:
- Vega20 psp fix
- Add vrr range to debugfs for freesync debugging
sched:
- Scheduler race fix
vkms:
- license header fixups
imx:
- Fix CSI register offsets for i.MX51 and i.MX53.
- Fix delayed page flip completion events on i.MX6QP due to
unexpected behaviour of the PRE when issuing NOP buffer updates to
the same buffer address.
- Stop throwing errors for plane updates on disabled CRTCs when a
userspace process is killed while a plane update is pending.
- Add missing of_node_put cleanup in imx_ldb_bind"
* tag 'drm-fixes-2019-02-15-1' of git://anongit.freedesktop.org/drm/drm:
drm: Use array_size() when creating lease
drm/amdgpu/psp11: TA firmware is optional (v3)
drm/i915/opregion: rvda is relative from opregion base in opregion 2.1+
drm/i915/opregion: fix version check
drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
drm/i915: Block fbdev HPD processing during suspend
drm/i915/pmu: Fix enable count array size and bounds checking
drm/i915/cnl: Fix CNL macros for Voltage Swing programming
drm/i915/icl: combo port vswing programming changes per BSPEC
drm/vkms: Fix license inconsistent
drm/amd/display: Expose connector VRR range via debugfs
drm/sched: Always trace the dependencies we wait on, to fix a race.
gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
gpu: ipu-v3: Fix CSI offsets for imx53
drm/imx: imx-ldb: add missing of_node_puts
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
drm/imx: ignore plane updates on disabled crtcs
Linus Torvalds [Fri, 15 Feb 2019 16:11:43 +0000 (08:11 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a crash on resume in the ccree driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccree - fix resume race condition on init
Linus Torvalds [Fri, 15 Feb 2019 16:00:11 +0000 (08:00 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.
2) Probe SFP modules after being attached, from Russell King.
3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.
4) Revert some r8169 changes that are causing regressions, from Heiner
Kallweit.
5) Fix spurious connection timeouts in netfilter nat code, from Florian
Westphal.
6) SKB leak in tipc, from Hoang Le.
7) Short packet checkum issue in mlx4, similar to a previous mlx5
change, from Saeed Mahameed. The issue is that whilst padding bytes
are usually zero, it is not guarateed and the hardware doesn't take
the padding bytes into consideration when generating the checksum.
8) Fix various races in cls_tcindex, from Cong Wang.
9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
Long.
10) Fix locking in phy_is_started, from Heiner Kallweit.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
net: ethernet: freescale: set FEC ethtool regs version
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
net: phy: fix potential race in the phylib state machine
net: phy: don't use locking in phy_is_started
selftests: fix timestamping Makefile
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
net: fix possible overflow in __sk_mem_raise_allocated()
dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
net: phy: fix interrupt handling in non-started states
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
net/mlx5e: XDP, fix redirect resources availability check
net/mlx5: Fix a compilation warning in events.c
net/mlx5: No command allowed when command interface is not ready
net/mlx5e: Fix NULL pointer derefernce in set channels error flow
netfilter: nft_compat: use-after-free when deleting targets
team: avoid complex list operations in team_nl_cmd_options_set()
net_sched: fix two more memory leaks in cls_tcindex
net_sched: fix a memory leak in cls_tcindex
...
Linus Torvalds [Fri, 15 Feb 2019 15:56:24 +0000 (07:56 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull signal fix from Eric Biederman:
"Just a single patch that restores PTRACE_EVENT_EXIT functionality that
was accidentally broken by last weeks fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Restore the stop PTRACE_EVENT_EXIT
Matthew Wilcox [Thu, 14 Feb 2019 19:03:48 +0000 (11:03 -0800)]
drm: Use array_size() when creating lease
Passing an object_count of sufficient size will make
object_count * 4 wrap around to be very small, then a later function
will happily iterate off the end of the object_ids array. Using
array_size() will saturate at SIZE_MAX, the kmalloc() will fail and
we'll return an -ENOMEM to the norty userspace.
Fixes: 62884cd386b8 ("drm: Add four ioctls for managing drm mode object leases [v7]")
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 15 Feb 2019 01:46:40 +0000 (11:46 +1000)]
Merge branch 'drm-fixes-5.0' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
amdgpu:
- Vega20 psp fix
- Add vrr range to debugfs for freesync debugging
sched:
- Scheduler race fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213202958.3336-1-alexander.deucher@amd.com
Dave Airlie [Fri, 15 Feb 2019 01:22:37 +0000 (11:22 +1000)]
Merge tag 'drm-intel-fixes-2019-02-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.0-rc7:
- combo phy programming fix
- opregion version check fix for VBT RVDA lookup
- gem mmap ioctl race fix
- fbdev hpd during suspend fix
- array size bounds check fix in pmu
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/877ee3504b.fsf@intel.com
Dave Airlie [Fri, 15 Feb 2019 01:21:46 +0000 (11:21 +1000)]
Merge tag 'drm-misc-fixes-2019-02-13' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.0:
- Fix license inconsistency in vkms.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/812e2f53-d72a-8fba-6c8c-fde8f44cf141@linux.intel.com
Nikos Tsironis [Thu, 14 Feb 2019 18:38:47 +0000 (20:38 +0200)]
dm thin: fix bug where bio that overwrites thin block ignores FUA
When provisioning a new data block for a virtual block, either because
the block was previously unallocated or because we are breaking sharing,
if the whole block of data is being overwritten the bio that triggered
the provisioning is issued immediately, skipping copying or zeroing of
the data block.
When this bio completes the new mapping is inserted in to the pool's
metadata by process_prepared_mapping(), where the bio completion is
signaled to the upper layers.
This completion is signaled without first committing the metadata. If
the bio in question has the REQ_FUA flag set and the system crashes
right after its completion and before the next metadata commit, then the
write is lost despite the REQ_FUA flag requiring that I/O completion for
this request must only be signaled after the data has been committed to
non-volatile storage.
Fix this by deferring the completion of overwrite bios, with the REQ_FUA
flag set, until after the metadata has been committed.
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Thu, 14 Feb 2019 23:02:18 +0000 (15:02 -0800)]
Revert "exec: load_script: don't blindly truncate shebang string"
This reverts commit
8099b047ecc431518b9bb6bdbba3549bbecdc343.
It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.
Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Peterson [Wed, 13 Feb 2019 20:12:17 +0000 (15:12 -0500)]
Revert "gfs2: read journal in large chunks to locate the head"
This reverts commit
2a5f14f279f59143139bcd1606903f2f80a34241.
This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vivien Didelot [Thu, 14 Feb 2019 16:15:35 +0000 (11:15 -0500)]
net: ethernet: freescale: set FEC ethtool regs version
Currently the ethtool_regs version is set to 0 for FEC devices.
Use this field to store the register dump version exposed by the
kernel. The choosen version 2 corresponds to the kernel compile test:
#if defined(CONFIG_M523x) || defined(CONFIG_M527x)
|| defined(CONFIG_M528x) || defined(CONFIG_M520x)
|| defined(CONFIG_M532x) || defined(CONFIG_ARM)
|| defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
and version 1 corresponds to the opposite. Binaries of ethtool unaware
of this version will dump the whole set as usual.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huang Zijiang [Thu, 14 Feb 2019 06:41:45 +0000 (14:41 +0800)]
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jann Horn [Wed, 13 Feb 2019 21:45:59 +0000 (22:45 +0100)]
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.
However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.
Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.
To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Feb 2019 17:04:55 +0000 (12:04 -0500)]
Merge branch 'net-phy-fix-locking-issue'
Heiner Kallweit says:
====================
net: phy: fix locking issue
Russell pointed out that the locking used in phy_is_started() isn't
needed and misleading. This locking also contributes to a race fixed
with patch 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 13 Feb 2019 19:12:54 +0000 (20:12 +0100)]
net: phy: fix potential race in the phylib state machine
Russell reported the following race in the phylib state machine
(quoting from his mail):
if (phy_polling_mode(phydev) && phy_is_started(phydev))
phy_queue_state_machine(phydev, PHY_STATE_TIME);
state = PHY_UP
thread 0 thread 1
phy_disconnect()
+-phy_is_started()
phy_is_started() |
`-phy_stop()
+-phydev->state = PHY_HALTED
`-phy_stop_machine()
`-cancel_delayed_work_sync()
phy_queue_state_machine()
`-mod_delayed_work()
At this point, the phydev->state_queue() has been added back onto the
system workqueue despite phy_stop_machine() having been called and
cancel_delayed_work_sync() called on it.
Fix this by protecting the complete operation in thread 0.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 13 Feb 2019 19:11:40 +0000 (20:11 +0100)]
net: phy: don't use locking in phy_is_started
Russell suggested to remove the locking from phy_is_started() because
the read is atomic anyway and actually the locking may be more
misleading.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deepa Dinamani [Wed, 13 Feb 2019 17:09:13 +0000 (09:09 -0800)]
selftests: fix timestamping Makefile
The clean target in the makefile conflicts with the generic
kselftests lib.mk, and fails to properly remove the compiled
test programs.
Remove the redundant rule, the TEST_GEN_FILES will be already
removed by the CLEAN macro in lib.mk.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 13 Feb 2019 08:23:04 +0000 (11:23 +0300)]
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.
Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 12 Feb 2019 20:26:27 +0000 (12:26 -0800)]
net: fix possible overflow in __sk_mem_raise_allocated()
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.
They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John David Anglin [Mon, 11 Feb 2019 18:40:21 +0000 (13:40 -0500)]
dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the
88E6341, it is
possible to miss an edge. When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.
I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are
directly cleared by reading the Global 1 status register. However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.
The code only services interrupts whose status bit is set at the time of reading its status
register. If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.
This is not a problem with polling or level interrupts since the handler will be called
again to process the event. However, it's a big problem when using level interrupts.
The fix presented here is to add a loop around the code servicing switch interrupts. If
any pending interrupts remain after the current set has been handled, we loop and process
the new set. If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.
Tested on espressobin board.
Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 12 Feb 2019 18:56:15 +0000 (19:56 +0100)]
net: phy: fix interrupt handling in non-started states
phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 12 Feb 2019 10:51:01 +0000 (18:51 +0800)]
sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.
With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.
We fix it by simply setting them to NULL after freeing.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 12 Feb 2019 10:47:30 +0000 (18:47 +0800)]
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
Jianlin reported a panic when running sctp gso over gre over vlan device:
[ 84.772930] RIP: 0010:do_csum+0x6d/0x170
[ 84.790605] Call Trace:
[ 84.791054] csum_partial+0xd/0x20
[ 84.791657] gre_gso_segment+0x2c3/0x390
[ 84.792364] inet_gso_segment+0x161/0x3e0
[ 84.793071] skb_mac_gso_segment+0xb8/0x120
[ 84.793846] __skb_gso_segment+0x7e/0x180
[ 84.794581] validate_xmit_skb+0x141/0x2e0
[ 84.795297] __dev_queue_xmit+0x258/0x8f0
[ 84.795949] ? eth_header+0x26/0xc0
[ 84.796581] ip_finish_output2+0x196/0x430
[ 84.797295] ? skb_gso_validate_network_len+0x11/0x80
[ 84.798183] ? ip_finish_output+0x169/0x270
[ 84.798875] ip_output+0x6c/0xe0
[ 84.799413] ? ip_append_data.part.50+0xc0/0xc0
[ 84.800145] iptunnel_xmit+0x144/0x1c0
[ 84.800814] ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
[ 84.801699] gre_tap_xmit+0xac/0xf0 [ip_gre]
[ 84.802395] dev_hard_start_xmit+0xa5/0x210
[ 84.803086] sch_direct_xmit+0x14f/0x340
[ 84.803733] __dev_queue_xmit+0x799/0x8f0
[ 84.804472] ip_finish_output2+0x2e0/0x430
[ 84.805255] ? skb_gso_validate_network_len+0x11/0x80
[ 84.806154] ip_output+0x6c/0xe0
[ 84.806721] ? ip_append_data.part.50+0xc0/0xc0
[ 84.807516] sctp_packet_transmit+0x716/0xa10 [sctp]
[ 84.808337] sctp_outq_flush+0xd7/0x880 [sctp]
It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().
For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.
So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
computing checksum in sctp_gso_segment.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Feb 2019 00:13:47 +0000 (19:13 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Missing structure initialization in ebtables causes splat with
32-bit user level on a 64-bit kernel, from Francesco Ruggeri.
2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
Andrea Claudi.
3) Fix possible use-after-free from release path of target extensions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Feb 2019 00:07:47 +0000 (19:07 -0500)]
Merge tag 'mlx5-fixes-2019-02-13' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-02-13
This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Tue, 12 Feb 2019 00:27:02 +0000 (16:27 -0800)]
net/mlx5e: XDP, fix redirect resources availability check
Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).
To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.
The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.
Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 22 Jan 2019 12:18:04 +0000 (14:18 +0200)]
net/mlx5: Fix a compilation warning in events.c
Eliminate the following compilation warning:
drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]: => 238:3
Fixes: c2fb3db22d35 ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Huy Nguyen [Thu, 7 Feb 2019 15:22:56 +0000 (09:22 -0600)]
net/mlx5: No command allowed when command interface is not ready
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.
Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.
Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maria Pasechnik [Sun, 3 Feb 2019 15:55:09 +0000 (17:55 +0200)]
net/mlx5e: Fix NULL pointer derefernce in set channels error flow
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.
The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.
This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.
Fixes: bbeb53b8b2c9 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Linus Torvalds [Wed, 13 Feb 2019 18:28:17 +0000 (10:28 -0800)]
Merge tag 'trace-v5.0-rc4' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"This fixes kprobes/uprobes dynamic processing of strings, where it
processes the args but does not update the remaining length of the
buffer that the string arguments will be placed in. It constantly
passes in the total size of buffer used instead of passing in the
remaining size of the buffer used.
This could cause issues if the strings are larger than the max size of
an event which could cause the strings to be written beyond what was
reserved on the buffer"
* tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: probeevent: Correctly update remaining space in dynamic area
Pablo Neira Ayuso [Wed, 13 Feb 2019 12:03:53 +0000 (13:03 +0100)]
netfilter: nft_compat: use-after-free when deleting targets
Fetch pointer to module before target object is released.
Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Alex Deucher [Tue, 12 Feb 2019 14:54:31 +0000 (09:54 -0500)]
drm/amdgpu/psp11: TA firmware is optional (v3)
Don't warn or fail if it's missing.
v2: handle xgmi case more gracefully.
v3: handle older kernels properly
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jens Axboe [Wed, 13 Feb 2019 14:32:30 +0000 (07:32 -0700)]
Merge branch 'nvme-5.0' of git://git.infradead.org/nvme into for-linus
Pull single NVMe fix from Christoph
* 'nvme-5.0' of git://git.infradead.org/nvme:
nvme-pci: add missing unlock for reset error
Eric W. Biederman [Tue, 12 Feb 2019 05:27:42 +0000 (23:27 -0600)]
signal: Restore the stop PTRACE_EVENT_EXIT
In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.
Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set. This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending. This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.
This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored
Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ivan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa1751 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Martin Blumenstingl [Sat, 9 Feb 2019 00:58:50 +0000 (01:58 +0100)]
mmc: meson-gx: fix interrupt name
Commit
bb364890323cca ("mmc: meson-gx: Free irq in release() callback")
changed the _probe code to use request_threaded_irq() instead of
devm_request_threaded_irq().
Unfortunately this removes a fallback for the interrupt name:
devm_request_threaded_irq() uses the device name as fallback if the
given IRQ name is NULL. request_threaded_irq() has no such fallback,
thus /proc/interrupts shows "(null)" instead.
Explicitly pass the dev_name() so we get the IRQ name shown in
/proc/interrupts again.
While here, also fix the indentation of the request_threaded_irq()
parameter list.
Fixes: bb364890323cca ("mmc: meson-gx: Free irq in release() callback")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Bill Kuzeja [Tue, 12 Feb 2019 14:29:50 +0000 (09:29 -0500)]
scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
In qla2x00_async_tm_cmd, we reference off sp after it has been freed. This
caused a panic on a system running a slub debug kernel. Since fcport is
passed in anyways, just use that instead.
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Acked-by: Giridhar Malavali <gmalavali@marvell.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
James Bottomley [Tue, 12 Feb 2019 16:05:25 +0000 (08:05 -0800)]
scsi: sd: fix entropy gathering for most rotational disks
The problem is that the default for MQ is not to gather entropy, whereas
the default for the legacy queue was always to gather it. The original
attempt to fix entropy gathering for rotational disks under MQ added an
else branch in sd_read_block_characteristics(). Unfortunately, the entire
check isn't reached if the device has no characteristics VPD page. Since
this page was only introduced in SBC-3 and its optional anyway, most less
expensive rotational disks don't have one, meaning they all stopped
gathering entropy when we made MQ the default. In a wholly unrelated
change, openssl and openssh won't function until the random number
generator is initialised, meaning lots of people have been seeing large
delays before they could log into systems with default MQ kernels due to
this lack of entropy, because it now can take tens of minutes to initialise
the kernel random number generator.
The fix is to set the non-rotational and add-randomness flags
unconditionally early on in the disk initialization path, so they can be
reset only if the device actually reports being non-rotational via the VPD
page.
Reported-by: Mikael Pettersson <mikpelinux@gmail.com>
Fixes: 83e32a591077 ("scsi: sd: Contribute to randomness when running rotational device")
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Xuewei Zhang <xueweiz@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Dave Airlie [Wed, 13 Feb 2019 03:05:03 +0000 (13:05 +1000)]
Merge tag 'imx-drm-fixes-2019-02-12' of git://git.pengutronix.de/pza/linux into drm-fixes
drm/imx: plane, ldb, and ipu-v3 fixes
- Fix CSI register offsets for i.MX51 and i.MX53.
- Fix delayed page flip completion events on i.MX6QP due to unexpected
behaviour of the PRE when issuing NOP buffer updates to the same
buffer address.
- Stop throwing errors for plane updates on disabled CRTCs when a
userspace process is killed while a plane update is pending.
- Add missing of_node_put cleanup in imx_ldb_bind.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/1549990602.4800.11.camel@pengutronix.de
Linus Torvalds [Wed, 13 Feb 2019 01:15:33 +0000 (17:15 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: proc: smaps_rollup: fix pss_locked calculation
Rename include/{uapi => }/asm-generic/shmparam.h really
Revert "mm: use early_pfn_to_nid in page_ext_init"
mm/gup: fix gup_pmd_range() for dax
Revert "mm: slowly shrink slabs with a relatively small number of objects"
Revert "mm: don't reclaim inodes with many attached pages"
Sandeep Patil [Tue, 12 Feb 2019 23:36:11 +0000 (15:36 -0800)]
mm: proc: smaps_rollup: fix pss_locked calculation
The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found. Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.
Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Sandeep Patil <sspatil@android.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org> [4.14.x, 4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Tue, 12 Feb 2019 23:36:06 +0000 (15:36 -0800)]
Rename include/{uapi => }/asm-generic/shmparam.h really
Commit
36c0f7f0f899 ("arch: unexport asm/shmparam.h for all
architectures") is different from the patch I submitted.
My patch is this:
https://lore.kernel.org/lkml/
1546904307-11124-1-git-send-email-yamada.masahiro@socionext.com/T/#u
The file renaming part:
rename include/{uapi => }/asm-generic/shmparam.h (100%)
was lost when it was picked up.
I think it was an accident because Andrew did not say anything.
Link: http://lkml.kernel.org/r/1549158277-24558-1-git-send-email-yamada.masahiro@socionext.com
Fixes: 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Tue, 12 Feb 2019 23:36:03 +0000 (15:36 -0800)]
Revert "mm: use early_pfn_to_nid in page_ext_init"
This reverts commit
fe53ca54270a ("mm: use early_pfn_to_nid in
page_ext_init").
When booting a system with "page_owner=on",
start_kernel
page_ext_init
invoke_init_callbacks
init_section_page_ext
init_page_owner
init_early_allocated_pages
init_zones_in_node
init_pages_in_zone
lookup_page_ext
page_to_nid
The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
access with an invalid nid.
UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
index 7 is out of range for type 'zone [5]'
Also, kernel will panic since flags were poisoned earlier with,
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n
start_kernel
setup_arch
pagetable_init
paging_init
sparse_init
sparse_init_nid
memblock_alloc_try_nid_raw
It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().
page:
ffffea0004200000 is uninitialized and poisoned
raw:
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw:
ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
page_owner info is not active (free page?)
kernel BUG at include/linux/mm.h:990!
RIP: 0010:init_page_owner+0x486/0x520
This means that assumptions behind commit
fe53ca54270a ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
the commit for now. A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.
Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yu Zhao [Tue, 12 Feb 2019 23:35:58 +0000 (15:35 -0800)]
mm/gup: fix gup_pmd_range() for dax
For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86. So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.
Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner [Tue, 12 Feb 2019 23:35:55 +0000 (15:35 -0800)]
Revert "mm: slowly shrink slabs with a relatively small number of objects"
This reverts commit
172b06c32b9497 ("mm: slowly shrink slabs with a
relatively small number of objects").
This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches. As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.
As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.
As such, this is a bad change and needs to be reverted.
[ Needs some massaging to retain the later seekless shrinker
modifications.]
Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner [Tue, 12 Feb 2019 23:35:51 +0000 (15:35 -0800)]
Revert "mm: don't reclaim inodes with many attached pages"
This reverts commit
a76cf1a474d7d ("mm: don't reclaim inodes with many
attached pages").
This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.
https://bugzilla.kernel.org/show_bug.cgi?id=202441
This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit
172b06c32b94 ("mm:
slowly shrink slabs with a relatively small number of objects"). It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.
Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 13 Feb 2019 00:29:33 +0000 (16:29 -0800)]
Merge tag 'hwmon-for-v5.0-rc7' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix fan detection for NCT6793D"
* tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix fan6 detection for NCT6793D
Jens Axboe [Tue, 12 Feb 2019 22:21:48 +0000 (15:21 -0700)]
Merge branch 'md-fixes' of https://github.com/liu-song-6/linux into for-linus
Pull MD fix from Song
* 'md-fixes' of https://github.com/liu-song-6/linux:
md/raid1: don't clear bitmap bits on interrupted recovery.
Nate Dailey [Thu, 7 Feb 2019 19:19:01 +0000 (14:19 -0500)]
md/raid1: don't clear bitmap bits on interrupted recovery.
sync_request_write no longer submits writes to a Faulty device. This has
the unfortunate side effect that bitmap bits can be incorrectly cleared
if a recovery is interrupted (previously, end_sync_write would have
prevented this). This means the next recovery may not copy everything
it should, potentially corrupting data.
Add a function for doing the proper md_bitmap_end_sync, called from
end_sync_write and the Faulty case in sync_request_write.
backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync
Cc: stable@vger.kernel.org 4.14+
Fixes: 0c9d5b127f69 ("md/raid1: avoid reusing a resync bio after error handling.")
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Linus Torvalds [Tue, 12 Feb 2019 21:33:48 +0000 (13:33 -0800)]
Merge tag 'riscv-for-linus-5.0-rc7' of git://git./linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains a pair of bug fixes that I'd like to include in 5.0:
- A fix to disambiguate swap from invalid PTEs, which fixes an error
when trying to unmap PROT_NONE pages.
- A revert to an optimization of the size of flat binaries. This is
really a workaround to prevent breaking existing boot flows, but
since the change was introduced as part of the 5.0 merge window I'd
like to have the fix in before 5.0 so we can avoid a regression for
any proper releases.
With these I hope we're out of patches for 5.0 in RISC-V land"
* tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
riscv: Add pte bit to distinguish swap from invalid
Benjamin Coddington [Wed, 6 Feb 2019 11:09:43 +0000 (06:09 -0500)]
NFS: Don't use page_file_mapping after removing the page
If nfs_page_async_flush() removes the page from the mapping, then we can't
use page_file_mapping() on it as nfs_updatepate() is wont to do when
receiving an error. Instead, push the mapping to the stack before the page
is possibly truncated.
Fixes: 8fc75bed96bb ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Greg Kroah-Hartman [Tue, 12 Feb 2019 18:27:34 +0000 (19:27 +0100)]
rpc: properly check debugfs dentry before using it
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of
ff9fb72bc077 ("debugfs: return error
values, not NULL"), but why debugfs files are not being created properly
is an older issue, probably one that has always been there and should
probably be looked at...
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Nicolas Morey-Chaisemartin [Tue, 5 Feb 2019 17:21:02 +0000 (18:21 +0100)]
xprtrdma: Make sure Send CQ is allocated on an existing compvec
Make sure the device has at least 2 completion vectors
before allocating to compvec#1
Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cong Wang [Tue, 12 Feb 2019 05:59:51 +0000 (21:59 -0800)]
team: avoid complex list operations in team_nl_cmd_options_set()
The current opt_inst_list operations inside team_nl_cmd_options_set()
is too complex to track:
LIST_HEAD(opt_inst_list);
nla_for_each_nested(...) {
list_for_each_entry(opt_inst, &team->option_inst_list, list) {
if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
continue;
list_add(&opt_inst->tmp_list, &opt_inst_list);
}
}
team_nl_send_event_options_get(team, &opt_inst_list);
as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().
Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.
Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list")
Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2019 19:10:56 +0000 (14:10 -0500)]
Merge branch 'net_sched-some-fixes-for-cls_tcindex'
Cong Wang says:
====================
net_sched: some fixes for cls_tcindex
This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.
v2: fix a compile error in patch 2
drop netns refcnt in patch 1
====================
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cong Wang [Mon, 11 Feb 2019 21:06:16 +0000 (13:06 -0800)]
net_sched: fix two more memory leaks in cls_tcindex
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.
For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.
For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 11 Feb 2019 21:06:15 +0000 (13:06 -0800)]
net_sched: fix a memory leak in cls_tcindex
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 11 Feb 2019 21:06:14 +0000 (13:06 -0800)]
net_sched: fix a race condition in tcindex_destroy()
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2019 19:05:07 +0000 (14:05 -0500)]
Merge branch 'ena-races'
Arthur Kiyanovski says:
====================
net: ena: race condition bug fix and version update
This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Mon, 11 Feb 2019 17:17:44 +0000 (19:17 +0200)]
net: ena: update driver version from 2.0.2 to 2.0.3
Update driver version due to bug fix.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski [Mon, 11 Feb 2019 17:17:43 +0000 (19:17 +0200)]
net: ena: fix race between link up and device initalization
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kal Conley [Sun, 10 Feb 2019 08:57:11 +0000 (09:57 +0100)]
net/packet: fix 4gb buffer limit due to overflow check
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.
This change fixes support for packet ring buffers >= UINT_MAX.
Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Konstantin Khlebnikov [Sat, 9 Feb 2019 10:35:52 +0000 (13:35 +0300)]
inet_diag: fix reporting cgroup classid and fallback to priority
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.
Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.
Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.
Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 11 Feb 2019 22:41:22 +0000 (14:41 -0800)]
batman-adv: fix uninit-value in batadv_interface_tx()
KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]
batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.
[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
__msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
__netdev_start_xmit include/linux/netdevice.h:4356 [inline]
netdev_start_xmit include/linux/netdevice.h:4365 [inline]
xmit_one net/core/dev.c:3257 [inline]
dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
__dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
packet_snd net/packet/af_packet.c:2928 [inline]
packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
__sys_sendto+0x8c4/0xac0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1796
__x64_sys_sendto+0x6e/0x90 net/socket.c:1796
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007ffdda6fd468 EFLAGS:
00000216 ORIG_RAX:
000000000000002c
RAX:
ffffffffffffffda RBX:
0000000000000002 RCX:
0000000000441889
RDX:
000000000000000e RSI:
00000000200000c0 RDI:
0000000000000003
RBP:
0000000000000003 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000216 R12:
00007ffdda6fd4c0
R13:
00007ffdda6fd4b0 R14:
0000000000000000 R15:
0000000000000000
Uninit was created at:
kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
slab_post_alloc_hook mm/slab.h:446 [inline]
slab_alloc_node mm/slub.c:2759 [inline]
__kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
__kmalloc_reserve net/core/skbuff.c:137 [inline]
__alloc_skb+0x309/0xa20 net/core/skbuff.c:205
alloc_skb include/linux/skbuff.h:998 [inline]
alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
packet_alloc_skb net/packet/af_packet.c:2781 [inline]
packet_snd net/packet/af_packet.c:2872 [inline]
packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg net/socket.c:631 [inline]
__sys_sendto+0x8c4/0xac0 net/socket.c:1788
__do_sys_sendto net/socket.c:1800 [inline]
__se_sys_sendto+0x107/0x130 net/socket.c:1796
__x64_sys_sendto+0x6e/0x90 net/socket.c:1796
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 12 Feb 2019 18:18:08 +0000 (10:18 -0800)]
Merge tag 'sound-5.0-rc7' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"It's a bit of surprising that we've got more changes than hoped at
this late stage, but they all don't look too scary but small fixes.
One change in ALSA core side is again the PCM regression fix that was
partially addressed for OSS, but now the all relevant change is
reverted instead. Also, a few ASoC core fixes for UAF and OOB are
included, while the rest are usual random device-specific fixes"
* tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Revert capture stream behavior change in blocking mode
ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
ALSA: hda - Add quirk for HP EliteBook 840 G5
ASoC: samsung: Prevent clk_get_rate() calls in atomic context
ASoC: rsnd: ssiu: correct shift bit for ssiu9
ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
ASoC: topology: fix oops/use-after-free case with dai driver
ASoC: rsnd: fixup MIX kctrl registration
ASoC: core: Allow soc_find_component lookups to match parent of_node
ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
ASoC: MAINTAINERS: fsl: Change Fabio's email address
ASoC: hdmi-codec: fix oops on re-probe
Linus Torvalds [Tue, 12 Feb 2019 17:57:45 +0000 (09:57 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fix from Ted Ts'o:
"Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
nojournal mode unsafe"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
Li RongQing [Mon, 11 Feb 2019 11:32:20 +0000 (19:32 +0800)]
ipv6: propagate genlmsg_reply return code
genlmsg_reply can fail, so propagate its return code
Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Mon, 11 Feb 2019 16:04:17 +0000 (18:04 +0200)]
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.
Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.
Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Mon, 11 Feb 2019 15:04:24 +0000 (15:04 +0000)]
net: phylink: avoid resolving link state too early
During testing on Armada 388 platforms, it was found with a certain
module configuration that it was possible to trigger a kernel oops
during the module load process, caused by the phylink resolver being
triggered for a currently disabled interface.
This problem was introduced by changing the way the SFP registration
works, which now can result in the sfp link down notification being
called during phylink_create().
Fixes: b5bfc21af5cb ("net: sfp: do not probe SFP module before we're attached")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce [Mon, 11 Feb 2019 14:32:36 +0000 (15:32 +0100)]
geneve: change NET_UDP_TUNNEL dependency to select
Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
selected.
Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
select list, as it already depends on NET_UDP_TUNNEL.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Tue, 12 Feb 2019 13:10:00 +0000 (13:10 +0000)]
sfc: initialise found bitmap in efx_ef10_mtd_probe
The bitmap of found partitions in efx_ef10_mtd_probe was not
initialised, causing partitions to be suppressed based off whatever
value was in the bitmap at the start.
Fixes: 3366463513f5 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2019 16:43:19 +0000 (11:43 -0500)]
Merge tag 'mac80211-for-davem-2019-02-12' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a few fixes:
* aggregation session teardown with internal TXQs was
continuing to send some frames marked as aggregation,
fix from Ilan
* IBSS join was missed during firmware restart, should
such a thing happen
* speculative execution based on the return value of
cfg80211_classify8021d() - which is controlled by the
sender of the packet - could be problematic in some
code using it, prevent it
* a few peer measurement fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yufen Yu [Tue, 29 Jan 2019 08:34:04 +0000 (16:34 +0800)]
floppy: check_events callback should not return a negative number
floppy_check_events() is supposed to return bit flags to say which
events occured. We should return zero to say that no event flags are
set. Only BIT(0) and BIT(1) are used in the caller. And .check_events
interface also expect to return an unsigned int value.
However, after commit
a0c80efe5956, it may return -EINTR (-4u).
Here, both BIT(0) and BIT(1) are cleared. So this patch shouldn't
affect runtime, but it obviously is still worth fixing.
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a0c80efe5956 ("floppy: fix lock_fdc() signal handling")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jani Nikula [Fri, 8 Feb 2019 18:42:53 +0000 (20:42 +0200)]
drm/i915/opregion: rvda is relative from opregion base in opregion 2.1+
Starting from opregion version 2.1 (roughly corresponding to ICL+) the
RVDA field is relative from the beginning of opregion, not absolute
address.
Fix the error path while at it.
v2: Make relative vs. absolute conditional on the opregion version,
bumped for the purpose. Turned out there are machines relying on
absolute RVDA in the wild.
v3: Fix the version checks
Fixes: 04ebaadb9f2d ("drm/i915/opregion: handle VBT sizes bigger than 6 KB")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208184254.24123-2-jani.nikula@intel.com
(cherry picked from commit
a0f52c3d357af218a9c1f7cd906ab70426176a1a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Jani Nikula [Fri, 8 Feb 2019 18:42:52 +0000 (20:42 +0200)]
drm/i915/opregion: fix version check
The u32 version field encodes major, minor, revision and reserved. We've
basically been checking for any non-zero version.
Add opregion version logging while at it.
v2: Fix the fix of the version check
Fixes: 04ebaadb9f2d ("drm/i915/opregion: handle VBT sizes bigger than 6 KB")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208184254.24123-1-jani.nikula@intel.com
(cherry picked from commit
98fdaaca9537b997062f1abc0aa87c61b50ce40a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Joonas Lahtinen [Thu, 7 Feb 2019 08:54:53 +0000 (10:54 +0200)]
drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set
Make sure the underlying VMA in the process address space is the
same as it was during vm_mmap to avoid applying WC to wrong VMA.
A more long-term solution would be to have vm_mmap_locked variant
in linux/mmap.h for when caller wants to hold mmap_sem for an
extended duration.
v2:
- Refactor the compare function
Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user mappings for objects")
Reported-by: Adam Zabrocki <adamza@microsoft.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com
(cherry picked from commit
5c4604e757ba9b193b09768d75a7d2105a5b883f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Lyude Paul [Tue, 29 Jan 2019 19:09:59 +0000 (14:09 -0500)]
drm/i915: Block fbdev HPD processing during suspend
When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.
However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.
This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.
(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).
We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.
This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.
Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
(Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129191001.442-2-lyude@redhat.com
(cherry picked from commit
fe5ec65668cdaa4348631d8ce1766eed43b33c10)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tvrtko Ursulin [Tue, 5 Feb 2019 13:03:53 +0000 (13:03 +0000)]
drm/i915/pmu: Fix enable count array size and bounds checking
Enable count array is supposed to have one counter for each possible
engine sampler. As such, array sizing and bounds checking is not correct
and would blow up the asserts if more samplers were added.
No ill-effect in the current code base but lets fix it for correctness.
At the same time tidy the assert for readability and robustness.
v2:
* One check per assert. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: b46a33e271ed ("drm/i915/pmu: Expose a PMU interface for perf queries")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130353.21105-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit
26a11deea685b41a43edb513194718aa1f461c9a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Andrea Claudi [Mon, 11 Feb 2019 15:14:39 +0000 (16:14 +0100)]
ipvs: fix dependency on nf_defrag_ipv6
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation,
but lacks proper Kconfig dependencies and does not explicitly
request defrag features.
As a result, if netfilter hooks are not loaded, when IPv6 fragmented
packet are handled by ipvs only the first fragment makes through.
Fix it properly declaring the dependency on Kconfig and registering
netfilter hooks on ip_vs_add_service() and ip_vs_new_dest().
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Keith Busch [Mon, 11 Feb 2019 16:23:50 +0000 (09:23 -0700)]
nvme-pci: add missing unlock for reset error
The reset work holds a mutex to prevent races with removal modifying the
same resources, but was unlocking only on success. Unlock on failure
too.
Fixes: 5c959d73dba64 ("nvme-pci: fix rapid add remove sequence")
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tuong Lien [Mon, 11 Feb 2019 06:29:43 +0000 (13:29 +0700)]
tipc: fix link session and re-establish issues
When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.
However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:
1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!
Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.
2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!
Solution:
===========
This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.
In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.
The solution is backward compatible.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhiqiang Liu [Mon, 11 Feb 2019 02:57:46 +0000 (10:57 +0800)]
net: fix IPv6 prefix route residue
Follow those steps:
# ip addr add 2001:123::1/32 dev eth0
# ip addr add 2001:123:456::2/64 dev eth0
# ip addr del 2001:123::1/32 dev eth0
# ip addr del 2001:123:456::2/64 dev eth0
and then prefix route of 2001:123::1/32 will still exist.
This is because ipv6_prefix_equal in check_cleanup_prefix_route
func does not check whether two IPv6 addresses have the same
prefix length. If the prefix of one address starts with another
shorter address prefix, even though their prefix lengths are
different, the return value of ipv6_prefix_equal is true.
Here I add a check of whether two addresses have the same prefix
to decide whether their prefixes are equal.
Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianchao Wang [Tue, 12 Feb 2019 01:56:25 +0000 (09:56 +0800)]
blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),
kworker/0:1H-339 [000] ...1 2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
scsi_inert_test-1987 [000] .... 2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
scsi_inert_test-1987 [000] ...2 2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
kworker/0:1H-339 [000] .... 2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
scsi_inert_test-1996 [000] ..s1 2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
scsi_inert_test-1996 [000] .Ns1 2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
kworker/0:1H-339 [000] ...1 2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
kworker/0:1H-339 [000] ...1 2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
scsi_inert_test-1986 [000] ..s1 2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hoang Le [Mon, 11 Feb 2019 02:18:28 +0000 (09:18 +0700)]
tipc: fix skb may be leaky in tipc_link_input
When we free skb at tipc_data_input, we return a 'false' boolean.
Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,
<snip>
1303 int tipc_link_rcv:
...
1354 if (!tipc_data_input(l, skb, l->inputq))
1355 rc |= tipc_link_input(l, skb, l->inputq);
</snip>
Fix it by simple changing to a 'true' boolean when skb is being free-ed.
Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
condition.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Ruggeri [Sun, 10 Feb 2019 19:58:29 +0000 (11:58 -0800)]
netfilter: compat: initialize all fields in xt_init
If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init
time, the following panic can be caused by running
% ebtables -t broute -F BROUTING
from a 32-bit user level on a 64-bit kernel. This patch replaces
kmalloc_array with kcalloc when allocating xt.
[ 474.680846] BUG: unable to handle kernel paging request at
0000000009600920
[ 474.687869] PGD
2037006067 P4D
2037006067 PUD
2038938067 PMD 0
[ 474.693838] Oops: 0000 [#1] SMP
[ 474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-
11302235.AroraKernelnext.fc18.x86_64 #1
[ 474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013
[ 474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables]
[ 474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d
[ 474.739023] RSP: 0018:
ffffc9000943fc58 EFLAGS:
00010207
[ 474.744296] RAX:
0000000000000000 RBX:
ffffc90006465000 RCX:
0000000002580249
[ 474.751485] RDX:
00000000012c0124 RSI:
fffffffff7be17e9 RDI:
00000000012c0124
[ 474.758670] RBP:
ffffc9000943fc58 R08:
0000000000000000 R09:
ffffffff8117cf8f
[ 474.765855] R10:
ffffc90006477000 R11:
0000000000000000 R12:
0000000000000001
[ 474.773048] R13:
0000000000000000 R14:
ffffc9000943fcb8 R15:
ffffc9000943fcb8
[ 474.780234] FS:
0000000000000000(0000) GS:
ffff88a03f840000(0063) knlGS:
00000000f7ac7700
[ 474.788612] CS: 0010 DS: 002b ES: 002b CR0:
0000000080050033
[ 474.794632] CR2:
0000000009600920 CR3:
0000002037422006 CR4:
00000000000606e0
[ 474.802052] Call Trace:
[ 474.804789] compat_do_replace+0x1fb/0x2a3 [ebtables]
[ 474.810105] compat_do_ebt_set_ctl+0x69/0xe6 [ebtables]
[ 474.815605] ? try_module_get+0x37/0x42
[ 474.819716] compat_nf_setsockopt+0x4f/0x6d
[ 474.824172] compat_ip_setsockopt+0x7e/0x8c
[ 474.828641] compat_raw_setsockopt+0x16/0x3a
[ 474.833220] compat_sock_common_setsockopt+0x1d/0x24
[ 474.838458] __compat_sys_setsockopt+0x17e/0x1b1
[ 474.843343] ? __check_object_size+0x76/0x19a
[ 474.847960] __ia32_compat_sys_socketcall+0x1cb/0x25b
[ 474.853276] do_fast_syscall_32+0xaf/0xf6
[ 474.857548] entry_SYSENTER_compat+0x6b/0x7a
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Palmer Dabbelt [Fri, 8 Feb 2019 17:11:08 +0000 (09:11 -0800)]
Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
At least BBL relies on the flat binaries containing all the bytes in the
actual image to exist in the file. Before this revert the flat images
dropped the trailing zeros, which caused BBL to put its copy of the
device tree where Linux thought the BSS was, which wreaks all sorts of
havoc. Manifesting the bug is a bit subtle because BBL aligns
everything to 2MiB page boundaries, but with large enough kernels you're
almost certain to get bitten by the bug.
While moving the sections around isn't a great long-term fix, it will at
least avoid producing broken images.
This reverts commit
22e6a2e14cb8ebcae059488cf24e778e4058c2bf.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Stefan O'Rear [Sun, 16 Dec 2018 18:03:36 +0000 (13:03 -0500)]
riscv: Add pte bit to distinguish swap from invalid
Previously, invalid PTEs and swap PTEs had the same binary
representation, causing errors when attempting to unmap PROT_NONE
mappings, including implicit unmap on exit.
Typical error:
swap_info_get: Bad swap file entry
40000000007a9879
BUG: Bad page map in process a.out pte:
3d4c3cc0 pmd:
3e521401
Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear2@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Linus Torvalds [Mon, 11 Feb 2019 21:30:50 +0000 (13:30 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC management fixes from Eduardo Valentin:
"Minor fixes on of-thermal and cpu cooling"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: cpu_cooling: Clarify error message
thermal: of-thermal: Print name of device node with error
Eric Dumazet [Fri, 8 Feb 2019 20:41:05 +0000 (12:41 -0800)]
net/x25: do not hold the cpu too long in x25_new_lci()
Due to quadratic behavior of x25_new_lci(), syzbot was able
to trigger an rcu stall.
Fix this by not blocking BH for the whole duration of
the function, and inserting a reschedule point when possible.
If we care enough, using a bitmap could get rid of the quadratic
behavior.
syzbot report :
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-...!: (10500 ticks this GP) idle=4fa/1/0x4000000000000002 softirq=283376/283376 fqs=0
rcu: (t=10501 jiffies g=383105 q=136)
rcu: rcu_preempt kthread starved for 10502 jiffies! g383105 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt I28928 10 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2844 [inline]
__schedule+0x817/0x1cc0 kernel/sched/core.c:3485
schedule+0x92/0x180 kernel/sched/core.c:3529
schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
rcu_gp_fqs_loop kernel/rcu/tree.c:1948 [inline]
rcu_gp_kthread+0x956/0x17a0 kernel/rcu/tree.c:2105
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 8759 Comm: syz-executor2 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1211
print_cpu_stall kernel/rcu/tree.c:1348 [inline]
check_cpu_stall kernel/rcu/tree.c:1422 [inline]
rcu_pending kernel/rcu/tree.c:3018 [inline]
rcu_check_callbacks.cold+0x500/0xa4a kernel/rcu/tree.c:2521
update_process_times+0x32/0x80 kernel/time/timer.c:1635
tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
__run_hrtimer kernel/time/hrtimer.c:1389 [inline]
__hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
</IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
RIP: 0010:queued_write_lock_slowpath+0x13e/0x290 kernel/locking/qrwlock.c:86
Code: 00 00 fc ff df 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 <41> 0f b6 55 00 41 38 d7 7c eb 84 d2 74 e7 48 89 df e8 6c 0f 4f 00
RSP: 0018:
ffff88805f117bd8 EFLAGS:
00000206 ORIG_RAX:
ffffffffffffff13
RAX:
0000000000000300 RBX:
ffffffff89413ba0 RCX:
1ffffffff1282774
RDX:
0000000000000000 RSI:
0000000000000004 RDI:
ffffffff89413ba0
RBP:
ffff88805f117c70 R08:
1ffffffff1282774 R09:
fffffbfff1282775
R10:
fffffbfff1282774 R11:
ffffffff89413ba3 R12:
00000000000000ff
R13:
fffffbfff1282774 R14:
1ffff1100be22f7d R15:
0000000000000003
queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
__raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
_raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
x25_bind+0x273/0x340 net/x25/af_x25.c:705
__sys_bind+0x23f/0x290 net/socket.c:1505
__do_sys_bind net/socket.c:1516 [inline]
__se_sys_bind net/socket.c:1514 [inline]
__x64_sys_bind+0x73/0xb0 net/socket.c:1514
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007fafccd0dc78 EFLAGS:
00000246 ORIG_RAX:
0000000000000031
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
0000000000457e39
RDX:
0000000000000012 RSI:
0000000020000240 RDI:
0000000000000004
RBP:
000000000073bf00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007fafccd0e6d4
R13:
00000000004bdf8b R14:
00000000004ce4b8 R15:
00000000ffffffff
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8752 Comm: syz-executor4 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__x25_find_socket+0x78/0x120 net/x25/af_x25.c:328
Code: 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 a6 00 00 00 4d 8b 64 24 68 4d 85 e4 74 7f e8 03 97 3d fb 49 83 ec 68 74 74 e8 f8 96 3d fb <49> 8d bc 24 88 04 00 00 48 89 f8 48 c1 e8 03 0f b6 04 18 84 c0 74
RSP: 0018:
ffff8880639efc58 EFLAGS:
00000246
RAX:
0000000000040000 RBX:
dffffc0000000000 RCX:
ffffc9000e677000
RDX:
0000000000040000 RSI:
ffffffff863244b8 RDI:
ffff88806a764628
RBP:
ffff8880639efc80 R08:
ffff8880a80d05c0 R09:
fffffbfff1282775
R10:
fffffbfff1282774 R11:
ffffffff89413ba3 R12:
ffff88806a7645c0
R13:
0000000000000001 R14:
ffff88809f29ac00 R15:
0000000000000000
FS:
00007fe8d0c58700(0000) GS:
ffff8880ae900000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b32823000 CR3:
00000000672eb000 CR4:
00000000001406e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
x25_new_lci net/x25/af_x25.c:357 [inline]
x25_connect+0x374/0xdf0 net/x25/af_x25.c:786
__sys_connect+0x266/0x330 net/socket.c:1686
__do_sys_connect net/socket.c:1697 [inline]
__se_sys_connect net/socket.c:1694 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1694
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007fe8d0c57c78 EFLAGS:
00000246 ORIG_RAX:
000000000000002a
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
0000000000457e39
RDX:
0000000000000012 RSI:
0000000020000200 RDI:
0000000000000004
RBP:
000000000073bf00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00007fe8d0c586d4
R13:
00000000004be378 R14:
00000000004ceb00 R15:
00000000ffffffff
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>