Dan Williams [Thu, 30 Nov 2017 00:10:47 +0000 (16:10 -0800)]
IB/core: disable memory registration of filesystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:43 +0000 (16:10 -0800)]
v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:39 +0000 (16:10 -0800)]
mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.
[dan.j.williams@intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:35 +0000 (16:10 -0800)]
mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).
[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:32 +0000 (16:10 -0800)]
device-dax: implement ->split() to catch invalid munmap attempts
Similar to how device-dax enforces that the 'address', 'offset', and
'len' parameters to mmap() be aligned to the device's fundamental
alignment, the same constraints apply to munmap(). Implement ->split()
to fail munmap calls that violate the alignment constraint.
Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path
with crash signatures of the form:
vma
ffff8800b60c8a88 start
00007f88c0000000 end
00007f88c0e00000
next (null) prev (null) mm
ffff8800b61150c0
prot
8000000000000027 anon_vma (null) vm_ops
ffffffffa0091240
pgoff 0 file
ffff8800b638ef80 private_data (null)
flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2014!
[..]
RIP: 0010:__split_huge_pud+0x12a/0x180
[..]
Call Trace:
unmap_page_range+0x245/0xa40
? __vma_adjust+0x301/0x990
unmap_vmas+0x4c/0xa0
unmap_region+0xae/0x120
? __vma_rb_erase+0x11a/0x230
do_munmap+0x276/0x410
vm_munmap+0x6a/0xa0
SyS_munmap+0x1d/0x30
Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:28 +0000 (16:10 -0800)]
mm, hugetlbfs: introduce ->split() to vm_operations_struct
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint. Instead, these patches introduce a new ->split() vm
operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Liu, Changcheng [Thu, 30 Nov 2017 00:10:25 +0000 (16:10 -0800)]
scripts/faddr2line: extend usage on generic arch
When cross-compiling, fadd2line should use the binary tool used for the
target system, rather than that of the host.
Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia
Signed-off-by: Liu Changcheng <changcheng.liu@intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: NeilBrown <neilb@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:21 +0000 (16:10 -0800)]
mm: replace pte_write with pte_access_permitted in fault + gup paths
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pte_write is must be referencing user-memory.
Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:18 +0000 (16:10 -0800)]
mm: replace pmd_write with pmd_access_permitted in fault + gup paths
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pmd_write is must be referencing user-memory.
Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:14 +0000 (16:10 -0800)]
mm: replace pud_write with pud_access_permitted in fault + gup paths
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pud_write is must be referencing user-memory.
[dan.j.williams@intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:10 +0000 (16:10 -0800)]
mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:
did you consider using the other paradigm:
In arch include files:
#define pud_write pud_write
static inline int pud_write(pud_t pud)
.....
Then in include/asm-generic/pgtable.h:
#ifndef pud_write
tatic inline int pud_write(pud_t pud)
{
....
}
#endif
If you had, then the powerpc code would have worked ... ;-) and many
of the other interfaces in include/asm-generic/pgtable.h are
protected that way ...
Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 30 Nov 2017 00:10:06 +0000 (16:10 -0800)]
mm: fix device-dax pud write-faults triggered by get_user_pages()
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write. However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted. Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.
Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).
[dan.j.williams@intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Thu, 30 Nov 2017 00:10:01 +0000 (16:10 -0800)]
mm/cma: fix alloc_contig_range ret code/potential leak
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages. However, it is
possible for busy pages to become available between the calls to these
two routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were
not allocated and they are leaked.
Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wang Nan [Thu, 30 Nov 2017 00:09:58 +0000 (16:09 -0800)]
mm, oom_reaper: gather each vma to prevent leaking TLB entry
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space. In this case, tlb->fullmm is true. Some archs like arm64
doesn't flush TLB when tlb->fullmm is true:
commit
5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
Which causes leaking of tlb entries.
Will clarifies his patch:
"Basically, we tag each address space with an ASID (PCID on x86) which
is resident in the TLB. This means we can elide TLB invalidation when
pulling down a full mm because we won't ever assign that ASID to
another mm without doing TLB invalidation elsewhere (which actually
just nukes the whole TLB).
I think that means that we could potentially not fault on a kernel
uaccess, because we could hit in the TLB"
There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
flush TLB then frees pages. However, due to the above problem, the TLB
entries are not really flushed on arm64. Other threads are possible to
access these pages through TLB entries. Moreover, a copy_to_user() can
also write to these pages without generating page fault, causes
use-after-free bugs.
This patch gathers each vma instead of gathering full vm space. In this
case tlb->fullmm is not true. The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.
Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Bob Liu <liubo95@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 30 Nov 2017 00:09:54 +0000 (16:09 -0800)]
mm, memory_hotplug: do not back off draining pcp free pages from kworker context
drain_all_pages backs off when called from a kworker context since
commit
0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced
by a WQ based one and the check wanted to prevent from recursion and
inter workers dependencies. This has made some sense at the time
because the system WQ has been used and one worker holding the lock
could be blocked while waiting for new workers to emerge which can be a
problem under OOM conditions.
Since then commit
ce612879ddc7 ("mm: move pcp and lru-pcp draining into
single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
rescuer so we shouldn't depend on any other WQ activity to make a
forward progress so calling drain_all_pages from a worker context is
safe as long as this doesn't happen from mm_percpu_wq itself which is
not the case because all workers are required to _not_ depend on any MM
locks.
Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job
otherwise we can have dangling pages on pcp lists and fail the offline
operation (__test_page_isolated_in_pageblock would see a page with 0 ref
count but without PageBuddy set).
Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works
as expected.
Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 29 Nov 2017 00:22:10 +0000 (16:22 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- avoid potential bogus alignment for some AEAD operations
- fix crash in algif_aead
- avoid sleeping in softirq context with async af_alg
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: skcipher - Fix skcipher_walk_aead_common
crypto: af_alg - remove locking in async callback
crypto: algif_aead - skip SGL entries with NULL page
Linus Torvalds [Tue, 28 Nov 2017 18:01:15 +0000 (10:01 -0800)]
Merge tag 'drm-for-v4.15-part2-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
- TTM regression fix for some virt gpus (bochs vga)
- a few i915 stable fixes
- one vc4 fix
- one uapi fix
* tag 'drm-for-v4.15-part2-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/ttm: don't attempt to use hugepages if dma32 requested (v2)
drm/vblank: Pass crtc_id to page_flip_ioctl.
drm/i915: Fix init_clock_gating for resume
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915: Clear breadcrumb node when cancelling signaling
drm/i915/gvt: ensure -ve return value is handled correctly
drm/i915: Re-register PMIC bus access notifier on runtime resume
drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/vc4: Account for interrupts in flight
Takashi Iwai [Mon, 27 Nov 2017 09:59:40 +0000 (10:59 +0100)]
Revert "ALSA: usb-audio: Fix potential zero-division at parsing FU"
The commit
8428a8ebde2d ("ALSA: usb-audio: Fix potential zero-division
at parsing FU") is utterly bogus and breaks the case with csize=1
instead of fixing anything. Just take it back again.
Reported-by: Jörg Otte <jrg.otte@gmail.com>
Fixes: 8428a8ebde2d ("ALSA: usb-audio: Fix potential zero-division at parsing FU"
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 28 Nov 2017 00:45:56 +0000 (16:45 -0800)]
proc: don't report kernel addresses in /proc/<pid>/stack
This just changes the file to report them as zero, although maybe even
that could be removed. I checked, and at least procps doesn't actually
seem to parse the 'stack' file at all.
And since the file doesn't necessarily even exist (it requires
CONFIG_STACKTRACE), possibly other tools don't really use it either.
That said, in case somebody parses it with tools, just having that zero
there should keep such tools happy.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 27 Nov 2017 21:05:09 +0000 (13:05 -0800)]
Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Meyer [Thu, 10 Aug 2017 08:53:53 +0000 (10:53 +0200)]
auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM
This avoids the MODPOST error:
ERROR: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] undefined!
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Mon, 27 Nov 2017 03:21:26 +0000 (06:21 +0300)]
mm, thp: Do not make pmd/pud dirty without a reason
Currently we make page table entries dirty all the time regardless of
access type and don't even consider if the mapping is write-protected.
The reasoning is that we don't really need dirty tracking on THP and
making the entry dirty upfront may save some time on first write to the
page.
Unfortunately, such approach may result in false-positive
can_follow_write_pmd() for huge zero page or read-only shmem file.
Let's only make page dirty only if we about to write to the page anyway
(as we do for small pages).
I've restructured the code to make entry dirty inside
maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
write-protected.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Mon, 27 Nov 2017 03:21:25 +0000 (06:21 +0300)]
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 27 Nov 2017 00:01:47 +0000 (16:01 -0800)]
Linux 4.15-rc1
Linus Torvalds [Sun, 26 Nov 2017 23:03:49 +0000 (15:03 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- LPAE fixes for kernel-readonly regions
- Fix for get_user_pages_fast on LPAE systems
- avoid tying decompressor to a particular platform if DEBUG_LL is
enabled
- BUG if we attempt to return to userspace but the to-be-restored PSR
value keeps us in privileged mode (defeating an issue that ftracetest
found)
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: BUG if jumping to usermode address in kernel mode
ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
ARM: make decompressor debug output user selectable
ARM: fix get_user_pages_fast
Linus Torvalds [Sun, 26 Nov 2017 22:39:20 +0000 (14:39 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Glexiner:
- unbreak the irq trigger type check for legacy platforms
- a handful fixes for ARM GIC v3/4 interrupt controllers
- a few trivial fixes all over the place
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Make - vs ?: Precedence explicit
irqchip/imgpdc: Use resource_size function on resource object
irqchip/qcom: Fix u32 comparison with value less than zero
irqchip/exiu: Fix return value check in exiu_init()
irqchip/gic-v3-its: Remove artificial dependency on PCI
irqchip/gic-v4: Add forward definition of struct irq_domain_ops
irqchip/gic-v3: pr_err() strings should end with newlines
irqchip/s3c24xx: pr_err() strings should end with newlines
irqchip/gic-v3: Fix ppi-partitions lookup
irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
genirq: Track whether the trigger type has been set
Linus Torvalds [Sun, 26 Nov 2017 22:11:54 +0000 (14:11 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- topology enumeration fixes
- KASAN fix
- two entry fixes (not yet the big series related to KASLR)
- remove obsolete code
- instruction decoder fix
- better /dev/mem sanity checks, hopefully working better this time
- pkeys fixes
- two ACPI fixes
- 5-level paging related fixes
- UMIP fixes that should make application visible faults more debuggable
- boot fix for weird virtualization environment
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/decoder: Add new TEST instruction pattern
x86/PCI: Remove unused HyperTransport interrupt support
x86/umip: Fix insn_get_code_seg_params()'s return value
x86/boot/KASLR: Remove unused variable
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
x86/pkeys/selftests: Fix protection keys write() warning
x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
x86/mpx/selftests: Fix up weird arrays
x86/pkeys: Update documentation about availability
x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
x86/smpboot: Fix __max_logical_packages estimate
x86/topology: Avoid wasting 128k for package id array
perf/x86/intel/uncore: Cache logical pkg id in uncore driver
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
x86/acpi: Handle SCI interrupts above legacy space gracefully
x86/boot: Fix boot failure when SMP MP-table is based at 0
x86/mm: Limit mmap() of /dev/mem to valid physical addresses
x86/selftests: Add test for mapping placement for 5-level paging
...
Linus Torvalds [Sun, 26 Nov 2017 21:43:25 +0000 (13:43 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: a documentation fix, a Sparse warning fix and a debugging
fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Fix task state recording/printout
sched/deadline: Don't use dubious signed bitfields
sched/deadline: Fix the description of runtime accounting in the documentation
Linus Torvalds [Sun, 26 Nov 2017 21:41:48 +0000 (13:41 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: two PMU driver fixes and a memory leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix memory leak triggered by perf --namespace
perf/x86/intel/uncore: Add event constraint for BDX PCU
perf/x86/intel: Hide TSX events when RTM is not supported
Linus Torvalds [Sun, 26 Nov 2017 21:36:54 +0000 (13:36 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull static key fix from Ingo Molnar:
"Fix a boot warning related to bad init ordering of the static keys
self-test"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_label: Invoke jump_label_test() via early_initcall()
Linus Torvalds [Sun, 26 Nov 2017 21:11:18 +0000 (13:11 -0800)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"A handful of objtool fixes, most of them related to making the UAPI
header-syncing warnings easier to read and easier to act upon"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/headers: Sync objtool UAPI header
objtool: Fix cross-build
objtool: Move kernel headers/code sync check to a script
objtool: Move synced files to their original relative locations
objtool: Make unreachable annotation inline asms explicitly volatile
objtool: Add a comment for the unreachable annotation macros
Russell King [Fri, 24 Nov 2017 23:49:34 +0000 (23:49 +0000)]
ARM: BUG if jumping to usermode address in kernel mode
Detect if we are returning to usermode via the normal kernel exit paths
but the saved PSR value indicates that we are in kernel mode. This
could occur due to corrupted stack state, which has been observed with
"ftracetest".
This ensures that we catch the problem case before we get to user code.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Linus Torvalds [Sat, 25 Nov 2017 18:37:16 +0000 (08:37 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
- The final conversion of timer wheel timers to timer_setup().
A few manual conversions and a large coccinelle assisted sweep and
the removal of the old initialization mechanisms and the related
code.
- Remove the now unused VSYSCALL update code
- Fix permissions of /proc/timer_list. I still need to get rid of that
file completely
- Rename a misnomed clocksource function and remove a stale declaration
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
m68k/macboing: Fix missed timer callback assignment
treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
timer: Remove redundant __setup_timer*() macros
timer: Pass function down to initialization routines
timer: Remove unused data arguments from macros
timer: Switch callback prototype to take struct timer_list * argument
timer: Pass timer_list pointer to callbacks unconditionally
Coccinelle: Remove setup_timer.cocci
timer: Remove setup_*timer() interface
timer: Remove init_timer() interface
treewide: setup_timer() -> timer_setup() (2 field)
treewide: setup_timer() -> timer_setup()
treewide: init_timer() -> setup_timer()
treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
s390: cmm: Convert timers to use timer_setup()
lightnvm: Convert timers to use timer_setup()
drivers/net: cris: Convert timers to use timer_setup()
drm/vc4: Convert timers to use timer_setup()
block/laptop_mode: Convert timers to use timer_setup()
net/atm/mpc: Avoid open-coded assignment of timer callback function
...
Linus Torvalds [Sat, 25 Nov 2017 18:21:54 +0000 (08:21 -1000)]
Merge tag 'arc-4.15-rc1' of git://git./linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- more changes for HS48 cores: supporting MMUv5, detecting new
micro-arch gizmos
- axs10x platform wiring up reset driver merged in this cycle
- ARC perf driver optimizations
* tag 'arc-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: perf: avoid vmalloc backed mmap
ARCv2: perf: optimize given that num counters <= 32
ARCv2: perf: tweak overflow interrupt
ARC: [plat-axs10x] DTS: Add reset controller node to manage ethernet reset
ARCv2: boot log: updates for HS48: dual-issue, ECC, Loop Buffer
ARCv2: Accomodate HS48 MMUv5 by relaxing MMU ver checking
ARC: [plat-axs10x] auto-select AXS101 or AXS103 given the ISA config
Linus Torvalds [Sat, 25 Nov 2017 18:06:30 +0000 (08:06 -1000)]
Merge tag 'kbuild-v4.15-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- use 'pwd' instead of '/bin/pwd' for portability
- clean up Makefiles
- fix ld-option for clang
- fix malloc'ed data size in Kconfig
- fix parallel building along with coccicheck
- fix a minor issue of package building
- prompt to use "rpm-pkg" instead of "rpm"
- clean up *.i and *.lst patterns by "make clean"
* tag 'kbuild-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: drop $(extra-y) from real-objs-y
kbuild: clean up *.i and *.lst patterns by make clean
kbuild: rpm: prompt to use "rpm-pkg" if "rpm" target is used
kbuild: pkg: use --transform option to prefix paths in tar
coccinelle: fix parallel build with CHECK=scripts/coccicheck
kconfig/symbol.c: use correct pointer type argument for sizeof
kbuild: Set KBUILD_CFLAGS before incl. arch Makefile
kbuild: remove all dummy assignments to obj-
kbuild: create built-in.o automatically if parent directory wants it
kbuild: /bin/pwd -> pwd
Linus Torvalds [Sat, 25 Nov 2017 17:58:25 +0000 (07:58 -1000)]
Merge tag 'afs-fixes-
20171124' of git://git./linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Make AFS file locking work again.
- Don't write to a page that's being written out, but wait for it to
complete.
- Do d_drop() and d_add() in the right places.
- Put keys on error paths.
- Remove some redundant code.
* tag 'afs-fixes-
20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: remove redundant assignment of dvnode to itself
afs: cell: Remove unnecessary code in afs_lookup_cell
afs: Fix signal handling in some file ops
afs: Fix some dentry handling in dir ops and missing key_puts
afs: Make afs_write_begin() avoid writing to a page that's being stored
afs: Fix file locking
Linus Torvalds [Sat, 25 Nov 2017 05:44:25 +0000 (19:44 -1000)]
Merge tag 'kvm-4.15-2' of git://git./virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"Trimmed second batch of KVM changes for Linux 4.15:
- GICv4 Support for KVM/ARM
- re-introduce support for CPUs without virtual NMI (cc stable) and
allow testing of KVM without virtual NMI on available CPUs
- fix long-standing performance issues with assigned devices on AMD
(cc stable)"
* tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
kvm: vmx: Allow disabling virtual NMI support
kvm: vmx: Reinstate support for CPUs without virtual NMI
KVM: SVM: obey guest PAT
KVM: arm/arm64: Don't queue VLPIs on INV/INVALL
KVM: arm/arm64: Fix GICv4 ITS initialization issues
KVM: arm/arm64: GICv4: Theory of operations
KVM: arm/arm64: GICv4: Enable VLPI support
KVM: arm/arm64: GICv4: Prevent userspace from changing doorbell affinity
KVM: arm/arm64: GICv4: Prevent a VM using GICv4 from being saved
KVM: arm/arm64: GICv4: Enable virtual cpuif if VLPIs can be delivered
KVM: arm/arm64: GICv4: Hook vPE scheduling into vgic flush/sync
KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source
KVM: arm/arm64: GICv4: Add doorbell interrupt handling
KVM: arm/arm64: GICv4: Use pending_last as a scheduling hint
KVM: arm/arm64: GICv4: Handle INVALL applied to a vPE
KVM: arm/arm64: GICv4: Propagate property updates to VLPIs
KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE
KVM: arm/arm64: GICv4: Handle CLEAR applied to a VLPI
KVM: arm/arm64: GICv4: Propagate affinity changes to the physical ITS
KVM: arm/arm64: GICv4: Unmap VLPI when freeing an LPI
...
Linus Torvalds [Sat, 25 Nov 2017 05:40:12 +0000 (19:40 -1000)]
Merge tag 'powerpc-4.15-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A small batch of fixes, about 50% tagged for stable and the rest for
recently merged code.
There's one more fix for the >128T handling on hash. Once a process
had requested a single mmap above 128T we would then always search
above 128T. The correct behaviour is to consider the hint address in
isolation for each mmap request.
Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in
VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly
identify P9 DD2.1 but in code that is currently not used by default.
Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan,
Sukadev Bhattiprolu"
* tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features
powerpc/perf: Fix IMC_MAX_PMU macro
powerpc/perf: Fix pmu_count to count only nest imc pmus
powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX
powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()
powerpc/vas: Export chip_to_vas_id()
powerpc/64s/slice: Use addr limit when computing slice mask
Linus Torvalds [Sat, 25 Nov 2017 05:19:20 +0000 (19:19 -1000)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"This series is predominantly bug-fixes, with a few small improvements
that have been outstanding over the last release cycle.
As usual, the associated bug-fixes have CC' tags for stable.
Also, things have been particularly quiet wrt new developments the
last months, with most folks continuing to focus on stability atop 4.x
stable kernels for their respective production configurations.
Also at this point, the stable trees have been synced up with
mainline. This will continue to be a priority, as production users
tend to run exclusively atop stable kernels, a few releases behind
mainline.
The highlights include:
- Fix PR PREEMPT_AND_ABORT null pointer dereference regression in
v4.11+ (tangwenji)
- Fix OOPs during removing TCMU device (Xiubo Li + Zhang Zhuoyu)
- Add netlink command reply supported option for each device (Kenjiro
Nakayama)
- cxgbit: Abort the TCP connection in case of data out timeout (Varun
Prakash)
- Fix PR/ALUA file path truncation (David Disseldorp)
- Fix double se_cmd completion during ->cmd_time_out (Mike Christie)
- Fix QUEUE_FULL + SCSI task attribute handling in 4.1+ (Bryant Ly +
nab)
- Fix quiese during transport_write_pending_qf endless loop (nab)
- Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK in 3.14+
(Don White + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (35 commits)
tcmu: Add a missing unlock on an error path
tcmu: Fix some memory corruption
iscsi-target: Fix non-immediate TMR reference leak
iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
target: Fix quiese during transport_write_pending_qf endless loop
target: Fix caw_sem leak in transport_generic_request_failure
target: Fix QUEUE_FULL + SCSI task attribute handling
iSCSI-target: Use common error handling code in iscsi_decode_text_input()
target/iscsi: Detect conn_cmd_list corruption early
target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype
target/iscsi: Fix endianness in an error message
target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it
target/iscsi: Define OFFLOAD_BUF_SIZE once
target: Inline transport_put_cmd()
target: Suppress gcc 7 fallthrough warnings
target: Move a declaration of a global variable into a header file
tcmu: fix double se_cmd completion
target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES
...
Ondrej Mosnáček [Thu, 23 Nov 2017 12:49:06 +0000 (13:49 +0100)]
crypto: skcipher - Fix skcipher_walk_aead_common
The skcipher_walk_aead_common function calls scatterwalk_copychunks on
the input and output walks to skip the associated data. If the AD end
at an SG list entry boundary, then after these calls the walks will
still be pointing to the end of the skipped region.
These offsets are later checked for alignment in skcipher_walk_next,
so the skcipher_walk may detect the alignment incorrectly.
This patch fixes it by calling scatterwalk_done after the copychunks
calls to ensure that the offsets refer to the right SG list entry.
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Kees Cook [Thu, 23 Nov 2017 22:19:02 +0000 (14:19 -0800)]
m68k/macboing: Fix missed timer callback assignment
This fixes a missed function prototype callback from the timer conversions.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171123221902.GA75727@beast
Colin Ian King [Mon, 20 Nov 2017 13:58:20 +0000 (13:58 +0000)]
afs: remove redundant assignment of dvnode to itself
The assignment of dvnode to itself is redundant and can be removed.
Cleans up warning detected by cppcheck:
fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Gustavo A. R. Silva [Fri, 17 Nov 2017 22:40:32 +0000 (16:40 -0600)]
afs: cell: Remove unnecessary code in afs_lookup_cell
Due to recent changes this piece of code is no longer needed.
Addresses-Coverity-ID:
1462033
Link: https://lkml.kernel.org/r/4923.1510957307@warthog.procyon.org.uk
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 20 Nov 2017 22:41:00 +0000 (22:41 +0000)]
afs: Fix signal handling in some file ops
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop
the target dentry if a signal causes the operation to be killed immediately
before we try to contact the server.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 20 Nov 2017 23:04:08 +0000 (23:04 +0000)]
afs: Fix some dentry handling in dir ops and missing key_puts
Fix some of dentry handling in AFS directory ops:
(1) Do d_drop() on the new_dentry before assigning a new inode to it in
afs_vnode_new_inode(). It's fine to do this before calling afs_iget()
because the operation has taken place on the server.
(2) Replace d_instantiate()/d_rehash() with d_add().
(3) Don't d_drop() the new_dentry in afs_rename() on error.
Also fix afs_link() and afs_rename() to call key_put() on all error paths
where the key is taken.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Sat, 18 Nov 2017 00:13:30 +0000 (00:13 +0000)]
afs: Make afs_write_begin() avoid writing to a page that's being stored
Make afs_write_begin() wait for a page that's marked PG_writeback because:
(1) We need to avoid interference with the data being stored so that the
data on the server ends up in a defined state.
(2) page->private is used to track the window of dirty data within a page,
but it's also used by the storage code to track what's being written,
being cleared by the completion notification. Ownership can't be
relinquished by the storage code until completion because it a store
fails, the data must be remarked dirty.
Tracing shows something like the following (edited):
x86_64-linux-gn-15940 [1] afs_page_dirty: vn=
ffff8800bef33800 9c75 begin 0-125
kworker/u8:3-114 [2] afs_page_dirty: vn=
ffff8800bef33800 9c75 store+ 0-125
x86_64-linux-gn-15940 [1] afs_page_dirty: vn=
ffff8800bef33800 9c75 begin 0-2052
kworker/u8:3-114 [2] afs_page_dirty: vn=
ffff8800bef33800 9c75 clear 0-2052
kworker/u8:3-114 [2] afs_page_dirty: vn=
ffff8800bef33800 9c75 store 0-0
kworker/u8:3-114 [2] afs_page_dirty: vn=
ffff8800bef33800 9c75 WARN 0-0
The clear (completion) corresponding to the store+ (store continuation from
a previous page) happens between the second begin (afs_write_begin) and the
store corresponding to that. This results in the second store not seeing
any data to write back, leading to the following warning:
WARNING: CPU: 2 PID: 114 at ../fs/afs/write.c:403 afs_write_back_from_locked_page+0x19d/0x76c [kafs]
Modules linked in: kafs(E)
CPU: 2 PID: 114 Comm: kworker/u8:3 Tainted: G E 4.14.0-fscache+ #242
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: writeback wb_workfn (flush-afs-2)
task:
ffff8800cad72600 task.stack:
ffff8800cad44000
RIP: 0010:afs_write_back_from_locked_page+0x19d/0x76c [kafs]
RSP: 0018:
ffff8800cad47aa0 EFLAGS:
00010246
RAX:
0000000000000001 RBX:
ffff8800bef33a20 RCX:
0000000000000000
RDX:
000000000000000f RSI:
ffffffff81c5d0e0 RDI:
ffff8800cad72e78
RBP:
ffff8800d31ea1e8 R08:
ffff8800c1358000 R09:
ffff8800ca00e400
R10:
ffff8800cad47a38 R11:
ffff8800c5d9e400 R12:
0000000000000000
R13:
ffffea0002d9df00 R14:
ffffffffa0023c1c R15:
0000000000007fdf
FS:
0000000000000000(0000) GS:
ffff8800ca700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f85ac6c4000 CR3:
0000000001c10001 CR4:
00000000001606e0
Call Trace:
? clear_page_dirty_for_io+0x23a/0x267
afs_writepages_region+0x1be/0x286 [kafs]
afs_writepages+0x60/0x127 [kafs]
do_writepages+0x36/0x70
__writeback_single_inode+0x12f/0x635
writeback_sb_inodes+0x2cc/0x452
__writeback_inodes_wb+0x68/0x9f
wb_writeback+0x208/0x470
? wb_workfn+0x22b/0x565
wb_workfn+0x22b/0x565
? worker_thread+0x230/0x2ac
process_one_work+0x2cc/0x517
? worker_thread+0x230/0x2ac
worker_thread+0x1d4/0x2ac
? rescuer_thread+0x29b/0x29b
kthread+0x15d/0x165
? kthread_create_on_node+0x3f/0x3f
? call_usermodehelper_exec_async+0x118/0x11f
ret_from_fork+0x24/0x30
Signed-off-by: David Howells <dhowells@redhat.com>
Thomas Gleixner [Wed, 22 Nov 2017 12:05:48 +0000 (13:05 +0100)]
sched/debug: Fix task state recording/printout
The recent conversion of the task state recording to use task_state_index()
broke the sched_switch tracepoint task state output.
task_state_index() returns surprisingly an index (0-7) which is then
printed with __print_flags() applying bitmasks. Not really working and
resulting in weird states like 'prev_state=t' instead of 'prev_state=I'.
Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a
bitmask from the return value of task_state_index() and store it in
entry->prev_state, which makes __print_flags() work as expected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@vger.kernel.org
Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing")
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Fri, 24 Nov 2017 04:56:30 +0000 (13:56 +0900)]
x86/decoder: Add new TEST instruction pattern
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:
ffffffff8103dd2c
Warning:
ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked
1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
000 001 010 011 100 101 110 111
TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX
So, it seems TEST Ib is assigned to 001.
Add the new pattern.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Fri, 24 Nov 2017 07:18:46 +0000 (21:18 -1000)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho.
2) bpf offload bug fixes from Jakub Kicinski.
3) Fix bpf verifier to NOP out code which is dead at run time because
due to branch pruning the verifier will not explore such
instructions. From Alexei Starovoitov.
4) Fix crash when deleting secondary chains in packet scheduler
classifier. From Roman Kapl.
5) Fix buffer management bugs in smc, from Ursula Braun.
6) Fix regression in anycast route handling, from David Ahern.
7) Fix link settings regression in r8169, from Tobias Jakobi.
8) Add back enough UFO support so that live migration still works, from
Willem de Bruijn.
9) Linearize enough packet data for the full extent to which the ipvlan
code will inspect the packet headers, from Gao Feng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
ipvlan: Fix insufficient skb linear check for ipv6 icmp
ipvlan: Fix insufficient skb linear check for arp
geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
net: accept UFO datagrams from tuntap and packet
net: realtek: r8169: implement set_link_ksettings()
net: ipv6: Fixup device for anycast routes during copy
net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
net/smc: use sk_rcvbuf as start for rmb creation
ipv6: Do not consider linkdown nexthops during multipath
net: sched: fix crash when deleting secondary chains
net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
bpf: fix branch pruning logic
bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
bpf: introduce ARG_PTR_TO_MEM_OR_NULL
i40evf: Use smp_rmb rather than read_barrier_depends
fm10k: Use smp_rmb rather than read_barrier_depends
igb: Use smp_rmb rather than read_barrier_depends
...
Linus Torvalds [Fri, 24 Nov 2017 07:14:30 +0000 (21:14 -1000)]
Merge tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Fix two issues resulting from the dell-smbios refactoring and
introduction of the dell-smbios-wmi dispatcher.
The first ensures a proper error code is returned when kzalloc fails.
The second avoids an issue in older Dell BIOS implementations which
would fail if the more complex calls were made by limiting those
platforms to the simple calls such as those used by the existing
dell-laptop and dell-wmi drivers, preserving their functionality prior
to the addition of the dell-smbios-wmi dispatcher"
* tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-laptop: fix error return code in dell_init()
platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix
Linus Torvalds [Fri, 24 Nov 2017 07:12:58 +0000 (21:12 -1000)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two basic fixes: one for the sparse problem with the blacklist flags
and another for a hang forever in bnx2i"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Use 'blist_flags_t' for scsi_devinfo flags
scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
Linus Torvalds [Fri, 24 Nov 2017 07:09:41 +0000 (21:09 -1000)]
Merge tag 'sound-fix-4.15-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All commits found here are small fixes for regression or stable:
- PCM timestamp behavior fix that could be seen as a regression
- Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl
- HD-audio HDMI/DP channel mapping fix for 32bit archs
- Fix the previous fix for HD-audio initialization code
- More hardening USB-audio against malicious USB descriptors
- HD-audio quirks/fixes (Realtek codec, AMD controller)
- Missing help text for the recent Intel SST kconfig change"
* tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Add Raven PCI ID
ALSA: hda/realtek - Fix ALC700 family no sound issue
ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
ALSA: usb-audio: Add sanity checks in v2 clock parsers
ALSA: usb-audio: Fix potential zero-division at parsing FU
ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
ALSA: usb-audio: Add sanity checks to FE parser
ALSA: timer: Remove kernel warning at compat ioctl error paths
ALSA: pcm: update tstamp only if audio_tstamp changed
ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon
ALSA: hda: Fix too short HDMI/DP chmap reporting
ALSA: usb-audio: uac1: Invalidate ctl on interrupt
ALSA: hda/realtek - Fix ALC275 no sound issue
ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
Linus Torvalds [Fri, 24 Nov 2017 07:04:56 +0000 (21:04 -1000)]
Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux
Pull more drm updates from Dave Airlie:
"Fixes/cleanups for rc1, non-desktop flags for VR
- remove the MSM dt-bindings file Rob managed to push in the previous
pull.
- add a property/edid quirk to denote HMD devices, I had these
hanging around for a few weeks and Keith had done some work on
them, they are fairly self contained and small, and only affect
people using HTC Vive VR headsets so far.
- amdgpu, tegra, tilcdc, fsl fixes
- some imx-drm cleanups I missed, these seemed pretty small, and no
reason to hold off.
I have one TTM regression fix (fixes bochs-vga in qemu) sitting
locally awaiting review I'll probably send that in a separate pull
request tomorrow"
* tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits)
dt-bindings: remove file that was added accidentally
drm/edid: quirk HTC vive headset as non-desktop. [v2]
drm/fb: add support for not enabling fbcon on non-desktop displays [v2]
drm: add connector info/property for non-desktop displays [v2]
drm/amdgpu: fix rmmod KCQ disable failed error
drm/amdgpu: fix kernel hang when starting VNC server
drm/amdgpu: don't skip attributes when powerplay is enabled
drm/amd/pp: fix typecast error in powerplay.
drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support
drm/tegra: sor: Reimplement pad clock
Revert "drm/radeon: dont switch vt on suspend"
drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
drm/amd/powerplay: fix unfreeze level smc message for smu7
drm/amdgpu:fix memleak
drm/amdgpu:fix memleak in takedown
drm/amd/pp: fix dpm randomly failed on Vega10
drm/amdgpu: set f_mapping on exported DMA-bufs
drm/amdgpu: Properly allocate VM invalidate eng v2
drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
drm/fsl-dcu: avoid disabling pixel clock twice on suspend
...
Linus Torvalds [Fri, 24 Nov 2017 07:01:32 +0000 (21:01 -1000)]
Merge tag 'docs-4.15-2' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A few late-arriving docs updates that have no real reason to wait.
There's a new "Co-Developed-by" tag described by Greg, and a build
enhancement from Willy to generate docs warnings during a kernel build
(but only when additional warnings have been requested in general)"
* tag 'docs-4.15-2' of git://git.lwn.net/linux:
Add optional check for bad kernel-doc comments
Documentation: fix profile= options in kernel-parameters.txt
documentation/svga.txt: update outdated file
kokr/memory-barriers.txt: Fix typo in paring example
kokr/memory-barriers/txt: Replace uses of "transitive"
Documentation/process: add Co-Developed-by: tag for patches with multiple authors
Linus Torvalds [Fri, 24 Nov 2017 06:51:27 +0000 (20:51 -1000)]
Merge branch 'next-keys' of git://git./linux/kernel/git/jmorris/linux-security
Pull keys update from James Morris:
"There's nothing too controversial here:
- Doc fix for keyctl_read().
- time_t -> time64_t replacement.
- Set the module licence on things to prevent tainting"
* 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
pkcs7: Set the module licence to prevent tainting
security: keys: Replace time_t with time64_t for struct key_preparsed_payload
security: keys: Replace time_t/timespec with time64_t
KEYS: fix in-kernel documentation for keyctl_read()
Linus Torvalds [Fri, 24 Nov 2017 06:48:26 +0000 (20:48 -1000)]
Merge tag 'apparmor-pr-2017-11-21' of git://git./linux/kernel/git/jj/linux-apparmor
Pull apparmor updates from John Johansen:
"No features this time, just minor cleanups and bug fixes.
Cleanups:
- fix spelling mistake: "resoure" -> "resource"
- remove unused redundant variable stop
- Fix bool initialization/comparison
Bug Fixes:
- initialized returned struct aa_perms
- fix leak of null profile name if profile allocation fails
- ensure that undecidable profile attachments fail
- fix profile attachment for special unconfined profiles
- fix locking when creating a new complain profile.
- fix possible recursive lock warning in __aa_create_ns"
* tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix possible recursive lock warning in __aa_create_ns
apparmor: fix locking when creating a new complain profile.
apparmor: fix profile attachment for special unconfined profiles
apparmor: ensure that undecidable profile attachments fail
apparmor: fix leak of null profile name if profile allocation fails
apparmor: remove unused redundant variable stop
apparmor: Fix bool initialization/comparison
apparmor: initialized returned struct aa_perms
apparmor: fix spelling mistake: "resoure" -> "resource"
Stephan Mueller [Fri, 10 Nov 2017 12:20:55 +0000 (13:20 +0100)]
crypto: af_alg - remove locking in async callback
The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.
This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.
In addition, the sock_hold is moved before the AIO encrypt/decrypt
operation to ensure that the socket is always present. This avoids a
tiny race window where the socket is unprotected and yet used by the AIO
operation.
Finally, the release of resources for a crypto operation is moved into a
common function of af_alg_free_resources.
Cc: <stable@vger.kernel.org>
Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
Reported-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stephan Mueller [Fri, 10 Nov 2017 10:04:52 +0000 (11:04 +0100)]
crypto: algif_aead - skip SGL entries with NULL page
The TX SGL may contain SGL entries that are assigned a NULL page. This
may happen if a multi-stage AIO operation is performed where the data
for each stage is pointed to by one SGL entry. Upon completion of that
stage, af_alg_pull_tsgl will assign NULL to the SGL entry.
The NULL cipher used to copy the AAD from TX SGL to the destination
buffer, however, cannot handle the case where the SGL starts with an SGL
entry having a NULL page. Thus, the code needs to advance the start
pointer into the SGL to the first non-NULL entry.
This fixes a crash visible on Intel x86 32 bit using the libkcapi test
suite.
Cc: <stable@vger.kernel.org>
Fixes: 72548b093ee38 ("crypto: algif_aead - copy AAD from src to dst")
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dave Airlie [Fri, 24 Nov 2017 01:33:29 +0000 (11:33 +1000)]
Merge tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
4.15 merge window fixes 1
* tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc:
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/vc4: Account for interrupts in flight
Dave Airlie [Fri, 24 Nov 2017 01:33:12 +0000 (11:33 +1000)]
Merge tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 fixes for v4.15
* tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Fix init_clock_gating for resume
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915: Clear breadcrumb node when cancelling signaling
drm/i915/gvt: ensure -ve return value is handled correctly
drm/i915: Re-register PMIC bus access notifier on runtime resume
drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
Dave Airlie [Fri, 24 Nov 2017 01:32:29 +0000 (11:32 +1000)]
Merge tag 'drm-misc-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Fix crtc_id in page_flip event.
* tag 'drm-misc-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-misc:
drm/vblank: Pass crtc_id to page_flip_ioctl.
Dave Airlie [Thu, 23 Nov 2017 02:12:17 +0000 (12:12 +1000)]
drm/ttm: don't attempt to use hugepages if dma32 requested (v2)
The commit below introduced thp support for ttm allocations, however it didn't
take into account the case where dma32 was requested. Some drivers always request
dma32, and the bochs driver is one of those.
This fixes an oops:
[ 30.108507] ------------[ cut here ]------------
[ 30.108920] kernel BUG at ./include/linux/gfp.h:408!
[ 30.109356] invalid opcode: 0000 [#1] SMP
[ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net
[ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 30.118866] task:
ffff923a77e03380 task.stack:
ffffa78182228000
[ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[ 30.119810] RSP: 0000:
ffffa7818222bba8 EFLAGS:
00010202
[ 30.120250] RAX:
0000000000000001 RBX:
00000000014382c6 RCX:
0000000000000006
[ 30.120840] RDX:
0000000000000000 RSI:
0000000000000009 RDI:
0000000000000000
[ 30.121443] RBP:
ffff923a760d6000 R08:
0000000000000000 R09:
0000000000000006
[ 30.122039] R10:
0000000000000040 R11:
0000000000000300 R12:
ffff923a729273c0
[ 30.122629] R13:
0000000000000000 R14:
0000000000000000 R15:
ffff923a7483d400
[ 30.123223] FS:
00007fe48da7dac0(0000) GS:
ffff923a7cc00000(0000) knlGS:
0000000000000000
[ 30.123896] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 30.124373] CR2:
00007fe457b73000 CR3:
0000000078313000 CR4:
00000000000006f0
[ 30.124968] Call Trace:
[ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm]
[ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm]
[ 30.125964] __do_fault+0x19/0x11e
[ 30.126255] __handle_mm_fault+0xcd3/0x1260
[ 30.126609] handle_mm_fault+0x14c/0x310
[ 30.126947] __do_page_fault+0x28c/0x530
[ 30.127282] do_page_fault+0x32/0x270
[ 30.127593] async_page_fault+0x22/0x30
[ 30.127922] RIP: 0033:0x7fe48aae39a8
[ 30.128225] RSP: 002b:
00007ffc21c4d928 EFLAGS:
00010206
[ 30.128664] RAX:
00007fe457b73000 RBX:
000055cd4c1041a0 RCX:
00007fe457b73040
[ 30.129259] RDX:
0000000000300000 RSI:
0000000000000000 RDI:
00007fe457b73000
[ 30.129855] RBP:
0000000000000300 R08:
000000000000000c R09:
0000000100000000
[ 30.130457] R10:
0000000000000001 R11:
0000000000000246 R12:
000055cd4c1041a0
[ 30.131054] R13:
000055cd4bdfe990 R14:
000055cd4c104110 R15:
0000000000000400
[ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP:
ffffa7818222bba8
[ 30.133836] ---[ end trace
d4f1deb60784f40a ]---
v2: handle free path as well.
Reported-by: Laura Abbott <labbott@redhat.com>
Reported-by: Adam Williamson <awilliam@redhat.com>
Fixes: 0284f1ead87463bc17cf5e81a24fc65c052486f3 (drm/ttm: add transparent huge page support for cached allocations v2)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
James Morris [Fri, 24 Nov 2017 00:54:11 +0000 (11:54 +1100)]
Merge tag 'keys-next-
20171123' of git://git./linux/kernel/git/dhowells/linux-fs into next-keys
Merge keys subsystem changes from David Howells, for v4.15.
Bjorn Helgaas [Wed, 22 Nov 2017 22:13:37 +0000 (16:13 -0600)]
x86/PCI: Remove unused HyperTransport interrupt support
There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left. Remove the unused entry point and all the
supporting code.
See
8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt
support").
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-pci@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
Borislav Petkov [Thu, 23 Nov 2017 09:19:51 +0000 (10:19 +0100)]
x86/umip: Fix insn_get_code_seg_params()'s return value
In order to save on redundant structs definitions
insn_get_code_seg_params() was made to return two 4-bit values in a char
but clang complains:
arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char'
changes value from 132 to -124 [-Wconstant-conversion]
return INSN_CODE_SEG_PARAMS(4, 8);
~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS'
#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
Those two values do get picked apart afterwards the opposite way of how
they were ORed so wrt to the LSByte, the return value is the same.
But this function returns -EINVAL in the error case, which is an int. So
make it return an int which is the native word size anyway and thus fix
the clang warning.
Reported-by: Kees Cook <keescook@google.com>
Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ricardo.neri-calderon@linux.intel.com
Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
Chao Fan [Thu, 23 Nov 2017 09:08:47 +0000 (17:08 +0800)]
x86/boot/KASLR: Remove unused variable
There are two variables "rc" in mem_avoid_memmap. One at the top of the
function and another one inside the while() loop. Drop the outer one as it
is unused. Cleanup some whitespace damage while at it.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: n-horiguchi@ah.jp.nec.com
Cc: keescook@chromium.org
Link: https://lkml.kernel.org/r/20171123090847.15293-1-fanc.fnst@cn.fujitsu.com
Kees Cook [Wed, 22 Nov 2017 20:56:45 +0000 (12:56 -0800)]
genirq/matrix: Make - vs ?: Precedence explicit
Noticed with a Clang build. This improves the readability of the ?:
expression, as it has lower precedence than the - expression. Show
explicitly that - is evaluated first.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast
Vasyl Gomonovych [Mon, 20 Nov 2017 22:02:41 +0000 (23:02 +0100)]
irqchip/imgpdc: Use resource_size function on resource object
drivers/irqchip/irq-imgpdc.c:327:20-23: WARNING: Suspicious code.
resource_size is maybe missing with res_regs
Generated by: scripts/coccinelle/api/resource_size.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: marc.zyngier@arm.com
Cc: jason@lakedaemon.net
Link: https://lkml.kernel.org/r/1511215361-8279-1-git-send-email-gomonovych@gmail.com
Colin Ian King [Fri, 17 Nov 2017 18:35:53 +0000 (18:35 +0000)]
irqchip/qcom: Fix u32 comparison with value less than zero
The comparison of u32 nregs being less than zero is never true since
nregs is unsigned. Fix this by making nregs a signed integer.
Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: https://lkml.kernel.org/r/20171117183553.2739-1-colin.king@canonical.com
David S. Miller [Thu, 23 Nov 2017 18:37:03 +0000 (03:37 +0900)]
Merge branch 'ipvlan-Fix-insufficient-skb-linear-check'
Gao Feng says:
====================
ipvlan: Fix insufficient skb linear check
The current ipvlan codes use pskb_may_pull to get the skb linear header in
func ipvlan_get_L3_hdr, but the size isn't enough for arp and ipv6 icmp.
So it may access the unexpected momory in ipvlan_addr_lookup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 23 Nov 2017 03:47:12 +0000 (11:47 +0800)]
ipvlan: Fix insufficient skb linear check for ipv6 icmp
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for ipv6 header. But it
would use the latter memory directly without linear check when it is icmp.
So it still may access the unepxected memory in ipvlan_addr_lookup.
Now invoke the pskb_may_pull again if it is ipv6 icmp.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 23 Nov 2017 03:47:11 +0000 (11:47 +0800)]
ipvlan: Fix insufficient skb linear check for arp
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for arp header. But it
would access the arp payload in func ipvlan_addr_lookup. So it still may
access the unepxected memory.
Now use arp_hdr_len(port->dev) instead of the arp header as the param.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 23 Nov 2017 03:27:24 +0000 (11:27 +0800)]
geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't
make sense if we haven't enabled CONFIG_IPV6. Fix it by adding
if IS_ENABLED(CONFIG_IPV6) check.
Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink")
Fixes: fd7eafd02121 ("geneve: fix fill_info when link down")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Nov 2017 18:06:42 +0000 (03:06 +0900)]
Merge tag 'wireless-drivers-for-davem-2017-11-22' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.15
First set of fixes for 4.15. Most important here is the iwlwifi fix
for scan command firmware interface change.
ath10k
* fix CCMP-256, GCMP and GCMP-256 in raw mode, it was never working
wcn36xx
* fix device tree node search
iwlwifi
* fix a regression with firmware API change of scan cmd (introduced in
firmware version 34)
* add a bunch of PCI IDs and fix configuration structs for A000 devices
* fix the exported firmware name strings for 9000 and A000 devices
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Nov 2017 17:53:38 +0000 (02:53 +0900)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2017-11-21
This series contains fixes for igb/vf, ixgbe/vf, i40e/vf and fm10k.
Jake fixes a regression issue with older firmware, where we were using
the NVM lock to synchronize NVM reads for all devices and firmware
versions, yet this caused issues with older firmware prior to version
1.5. Fixed this by only grabbing the lock for newer devices and firmware
version 1.5 or newer.
Zijie Pan fixes the calculation of the i40e VF MAC addresses, where it was
possible to increment to the next MAC entry without calling
i40e_add_mac_filter().
Amritha removes the upper limit of 64 queues on a channel VSI since the
upper bound is determined by the VSI's num_queue_pairs.
Filip fixes an issue during FLR resets, where should have been checking
for upcoming core reset and if so, just return with I40E_ERR_NOT_READY.
Alan fixes the notifying clients of l2 parameters by copying the
parameters to the client instance struct and re-organizes the priority
in which the client tasks fire so that if the flag for notifying l2
params is set, it will trigger before the client open task. Also fixed
the promiscuous settings after reset for all the VSI's.
Brian King from IBM fixes an issue seen on Power systems which would
result in skb list corruption and eventual kernel oops. Brian
provides the same fix for nearly all our drivers, to replace the
read_barrier_depends with smp_rmb() to ensure loads are ordered with
respect to the load of tx_buffer->next_to_watch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 22 Nov 2017 01:37:46 +0000 (17:37 -0800)]
net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
The PHY on BCM7278 has an additional bit that needs to be cleared:
IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out
of suspend/resume cycles.
Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Nov 2017 17:33:01 +0000 (02:33 +0900)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2017-11-23
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Several BPF offloading fixes, from Jakub. Among others:
- Limit offload to cls_bpf and XDP program types only.
- Move device validation into the driver and don't make
any assumptions about the device in the classifier due
to shared blocks semantics.
- Don't pass offloaded XDP program into the driver when
it should be run in native XDP instead. Offloaded ones
are not JITed for the host in such cases.
- Don't destroy device offload state when moved to
another namespace.
- Revert dumping offload info into user space for now,
since ifindex alone is not sufficient. This will be
redone properly for bpf-next tree.
2) Fix test_verifier to avoid using bpf_probe_write_user()
helper in test cases, since it's dumping a warning into
kernel log which may confuse users when only running tests.
Switch to use bpf_trace_printk() instead, from Yonghong.
3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
before it becomes uabi, from Gianluca. More specifically:
- Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
by bpf_csum_diff(), where the argument is either a
valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
then enforces a valid pointer in case of non-0 size
or a valid pointer or NULL in case of size 0. Given
that, the semantics for ARG_PTR_TO_MEM in combination
with ARG_CONST_SIZE_OR_ZERO are now such that in case
of size 0, the pointer must always be valid and cannot
be NULL. This fix in semantics allows for bpf_probe_read()
to drop the recently added size == 0 check in the helper
that would become part of uabi otherwise once released.
At the same time we can then fix bpf_probe_read_str() and
bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
instead of ARG_CONST_SIZE in order to fix recently
reported issues by Arnaldo et al, where LLVM optimizes
two boundary checks into a single one for unknown
variables where the verifier looses track of the variable
bounds and thus rejects valid programs otherwise.
4) A fix for the verifier for the case when it detects
comparison of two constants where the branch is guaranteed
to not be taken at runtime. Verifier will rightfully prune
the exploration of such paths, but we still pass the program
to JITs, where they would complain about using reserved
fields, etc. Track such dead instructions and sanitize
them with mov r0,r0. Rejection is not possible since LLVM
may generate them for valid C code and doesn't do as much
data flow analysis as verifier. For bpf-next we might
implement removal of such dead code and adjust branches
instead. Fix from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 21 Nov 2017 15:22:25 +0000 (10:22 -0500)]
net: accept UFO datagrams from tuntap and packet
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.
Partially revert the UFO removal from
182e0b6b5846~1..
d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.
It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit
939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit
8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").
(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.
Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.
A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0
guest:
dd if=/dev/zero of=payload.txt bs=1 count=2000
nc -u 192.16.1.1 8000 < payload.txt
Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
packets arriving fragmented:
./with_tap_pair.sh ./tap_send_ufo tap0 tap1
(from https://github.com/wdebruij/kerneltools/tree/master/tests)
Changes
v1 -> v2
- simplified set_offload change (review comment)
- documented test procedure
Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Jakobi [Tue, 21 Nov 2017 15:15:57 +0000 (16:15 +0100)]
net: realtek: r8169: implement set_link_ksettings()
Commit
6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially
implemented the new ethtool API, by replacing get_settings()
with get_link_ksettings(). This breaks ethtool, since the
userspace tool (according to the new API specs) never tries
the legacy set() call, when the new get() call succeeds.
All attempts to chance some setting from userspace result in:
> Cannot set new settings: Operation not supported
Implement the missing set() call.
Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 21 Nov 2017 15:08:57 +0000 (07:08 -0800)]
net: ipv6: Fixup device for anycast routes during copy
Florian reported a breakage with anycast routes due to commit
4832c30d5458 ("net: ipv6: put host and anycast routes on device with
address"). Prior to this commit anycast routes were added against the
loopback device causing repetitive route entries with no insight into
why they existed. e.g.:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
The point of commit
4832c30d5458 is to add the routes using the device
with the address which is causing the route to be added. e.g.,:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth1 proto kernel metric 0 pref medium
For traffic to work as it did before, the dst device needs to be switched
to the loopback when the copy is created similar to local routes.
Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Nov 2017 16:33:34 +0000 (01:33 +0900)]
Merge branch 'smc-fixes-for-smc-buffer-handling'
Ursula Braun says:
====================
net/smc: fixes for smc buffer handling
here are 2 cleanup patches for smc buffer handling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Geert Uytterhoeven [Tue, 21 Nov 2017 12:23:54 +0000 (13:23 +0100)]
net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
With gcc-4.1.2:
net/smc/smc_core.c: In function ‘__smc_buf_create’:
net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function
Indeed, if the for-loop is never executed, bufsize is used
uninitialized. In addition, buf_desc is stored for later use, while it
is still a NULL pointer.
Before, error handling was done by checking if buf_desc is non-NULL.
The cleanup changed this to an error check, but forgot to update the
preinitialization of buf_desc to an error pointer.
Update the preinitializatin of buf_desc to fix this.
Fixes: b33982c3a6838d13 ("net/smc: cleanup function __smc_buf_create()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Tue, 21 Nov 2017 12:23:53 +0000 (13:23 +0100)]
net/smc: use sk_rcvbuf as start for rmb creation
Commit
3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
merged handling of SMC receive and send buffers. It introduced sk_buf_size
as merged start value for size determination. But since sk_buf_size is not
used at all, sk_sndbuf is erroneously used as start for rmb creation.
This patch makes sure, sk_buf_size is really used as intended, and
sk_rcvbuf is used as start value for rmb creation.
Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 21 Nov 2017 07:50:12 +0000 (09:50 +0200)]
ipv6: Do not consider linkdown nexthops during multipath
When the 'ignore_routes_with_linkdown' sysctl is set, we should not
consider linkdown nexthops during route lookup.
While the code correctly verifies that the initially selected route
('match') has a carrier, it does not perform the same check in the
subsequent multipath selection, resulting in a potential packet loss.
In case the chosen route does not have a carrier and the sysctl is set,
choose the initially selected route.
Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Kapl [Mon, 20 Nov 2017 21:21:13 +0000 (22:21 +0100)]
net: sched: fix crash when deleting secondary chains
If you flush (delete) a filter chain other than chain 0 (such as when
deleting the device), the kernel may run into a use-after-free. The
chain refcount must not be decremented unless we are sure we are done
with the chain.
To reproduce the bug, run:
ip link add dtest type dummy
tc qdisc add dev dtest ingress
tc filter add dev dtest chain 1 parent ffff: flower
ip link del dtest
Introduced in: commit
f93e1cdcf42c ("net/sched: fix filter flushing"),
but unless you have KAsan or luck, you won't notice it until
commit
0dadc117ac8b ("cls_flower: use tcf_exts_get_net() before call_rcu()")
Fixes: f93e1cdcf42c ("net/sched: fix filter flushing")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Roman Kapl <code@rkapl.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Chan [Mon, 20 Nov 2017 20:57:42 +0000 (12:57 -0800)]
net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
This change resolves a new compile-time warning
when built as a loadable module:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/phy/cortina.o
see include/linux/module.h for more information
This adds the license as "GPL", which matches the header of the file.
MODULE_DESCRIPTION and MODULE_AUTHOR are also added.
Signed-off-by: Jesse Chan <jc@linux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Thu, 23 Nov 2017 15:29:05 +0000 (16:29 +0100)]
Merge tag 'for-linus-timers-conversion-final-v4.15-rc1' of git://git./linux/kernel/git/kees/linux into timers/urgent
Pull the last batch of manual timer conversions from Kees Cook:
- final batch of "non trivial" timer conversions (multi-tree dependencies,
things Coccinelle couldn't handle, etc).
- treewide conversions via Coccinelle, in 4 steps:
- DEFINE_TIMER() functions converted to struct timer_list * argument
- init_timer() -> setup_timer()
- setup_timer() -> timer_setup()
- setup_timer() -> timer_setup() (with a single embedded structure)
- deprecated timer API removals (init_timer(), setup_*timer())
- finalization of new API (remove global casts)
Masahiro Yamada [Thu, 23 Nov 2017 14:25:26 +0000 (23:25 +0900)]
kbuild: drop $(extra-y) from real-objs-y
$(real-objs-y) in only used in scripts/Makefile.build to form
"targets", but $(extra-y) is added to "targets" in another line.
We do not need to add $(extra-y) twice.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Thu, 16 Nov 2017 16:49:13 +0000 (01:49 +0900)]
kbuild: clean up *.i and *.lst patterns by make clean
*.i and *.lst are supported by the single target build. Clean up them.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Wed, 15 Nov 2017 09:19:20 +0000 (18:19 +0900)]
kbuild: rpm: prompt to use "rpm-pkg" if "rpm" target is used
The "rpm" has been kept for backward compatibility since pre-git era.
I am planning to remove it after the Linux 4.18 release. Annouce the
end of the support, prompting to use "rpm-pkg" instead.
If you use "rpm", it will work like "rpm-pkg", but warning messages
will be displayed as follows:
WARNING: "rpm" target will be removed after Linux 4.18
Please use "rpm-pkg" instead.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Wed, 15 Nov 2017 09:17:07 +0000 (18:17 +0900)]
kbuild: pkg: use --transform option to prefix paths in tar
For rpm-pkg and deb-pkg, a source tar file is created. All paths in
the archive must be prefixed with the base name of the tar so that
everything is contained in the directory when you extract it.
Currently, scripts/package/Makefile uses a symlink for that, and
removes it after the tar is created.
If you terminate the build during the tar creation, the symlink is
left over. Then, at the next package build, you will see a warning
like follows:
ln: '.' and 'kernel-4.14.0+/.' are the same file
It is possible to fix it by adding -n (--no-dereference) option to
the "ln" command, but a cleaner way is to use --transform option
of "tar" command. This option is GNU extension, but it should not
hurt to use it in the Linux build system.
The 'S' flag is needed to exclude symlinks from the path fixup.
Without it, symlinks in the kernel are broken.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Masahiro Yamada [Tue, 14 Nov 2017 11:38:07 +0000 (20:38 +0900)]
coccinelle: fix parallel build with CHECK=scripts/coccicheck
The command "make -j8 C=1 CHECK=scripts/coccicheck" produces
lots of "coccicheck failed" error messages.
Julia Lawall explained the Coccinelle behavior as follows:
"The problem on the Coccinelle side is that it uses a subdirectory
with the name of the semantic patch to store standard output and
standard error for the different threads. I didn't want to use a
name with the pid, so that one could easily find this information
while Coccinelle is running. Normally the subdirectory is cleaned
up when Coccinelle completes, so there is only one of them at a time.
Maybe it is best to just add the pid. There is the risk that these
subdirectories will accumulate if Coccinelle crashes in a way such
that they don't get cleaned up, but Coccinelle could print a warning
if it detects this case, rather than failing."
When scripts/coccicheck is used as CHECK tool and -j option is given
to Make, the whole of build process runs in parallel. So, multiple
processes try to get access to the same subdirectory.
I notice spatch creates the subdirectory only when it runs in parallel
(i.e. --jobs <N> is given and <N> is greater than 1).
Setting NPROC=1 is a reasonable solution; spatch does not create the
subdirectory. Besides, ONLINE=1 mode takes a single file input for
each spatch invocation, so there is no reason to parallelize it in
the first place.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Julia Lawall <Julia.Lawall@lip6.fr>
Heinrich Schuchardt [Wed, 8 Nov 2017 21:09:59 +0000 (22:09 +0100)]
kconfig/symbol.c: use correct pointer type argument for sizeof
sym_arr is of type struct symbol **.
So in malloc we need sizeof(struct symbol *).
The problem was indicated by coccinelle.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Maarten Lankhorst [Thu, 23 Nov 2017 10:37:37 +0000 (11:37 +0100)]
drm/vblank: Pass crtc_id to page_flip_ioctl.
We added crtc_id to the atomic ioctl, but forgot to add it for vblank
and page flip events. Commit
bd386e518056 ("drm: Reorganize
drm_pending_event to support future event types [v2]") added it to
the vblank event, but page flip event was still missing.
Correct this and add a test for making sure we always set crtc_id correctly.
Fixes: bd386e518056 ("drm: Reorganize drm_pending_event to support future event types [v2]")
Fixes: 5db06a8a98f5 ("drm: Pass CRTC ID in userspace vblank events")
Cc: Daniel Stone <daniels@collabora.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #irc
Testcase: igt/kms_vblank/crtc_id
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123103737.47138-1-maarten.lankhorst@linux.intel.com
Alexei Starovoitov [Thu, 23 Nov 2017 00:42:05 +0000 (16:42 -0800)]
bpf: fix branch pruning logic
when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.
Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Vijendar Mukunda [Thu, 23 Nov 2017 14:37:00 +0000 (20:07 +0530)]
ALSA: hda: Add Raven PCI ID
This commit adds PCI ID for Raven platform
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Kailang Yang [Wed, 22 Nov 2017 07:21:32 +0000 (15:21 +0800)]
ALSA: hda/realtek - Fix ALC700 family no sound issue
It maybe the typo for ALC700 support patch.
To fix the bit value on this patch.
Fixes: 6fbae35a3170 ("ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703")
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Thu, 23 Nov 2017 07:09:18 +0000 (21:09 -1000)]
Merge tag 'pwm/for-4.15-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"The changes for this release include power management improvements for
the pwm-img driver, support for the backup mode on pwm-atmel-tcb as
well as support for more hardware with the R-Car and Mediatek drivers.
To round things off there's a bit of cleanup for sunxi and stm32-lp"
* tag 'pwm/for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: stm32-lp: Remove pwm_is_enabled() check before calling pwm_disable()
pwm: mediatek: Add MT2712/MT7622 support
pwm: sunxi: Use of_device_get_match_data()
pwm: atmel-tcb: Support backup mode
dt-bindings: pwm: Add R-Car D3 device tree bindings
pwm: img: Add runtime PM
pwm: img: Add suspend / resume handling
Linus Torvalds [Thu, 23 Nov 2017 06:58:23 +0000 (20:58 -1000)]
Merge tag 'rtc-4.15' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There is nothing scary this cycle, mostly driver fixes and updates.
The core fix has been in for a while and has been tested on multiple
kernel revisions by multiple teams.
Core:
- Fix setting the alarm to the next expiring timer
New drivers:
- Mediatek MT7622 RTC
- NXP PCF85363
- Spreadtrum SC27xx PMIC RTC
Drivers updates:
- Use generic nvmem to expose the Non volatile ram for ds1305,
ds1511, m48t86 and omap
- abx80x: solve possible race condition at probe
- armada38x: support trimming the RTC oscillator
- at91rm9200: fix reading the alarm value at boot
- ds1511: allow waking platform
- m41t80: rework square wave output
- pcf8523: support trimming the RTC oscillator
- pcf8563: fix clock output rate
- pl031: make interrupt optional
- xgene: fix suspend/resume"
* tag 'rtc-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
dt-bindings: rtc: imxdi: Improve the bindings text
rtc: sc27xx: Add Spreadtrum SC27xx PMIC RTC driver
dt-bindings: rtc: Add Spreadtrum SC27xx RTC documentation
rtc: at91rm9200: fix reading alarm value
rtc: at91rm9200: stop calculating yday in at91_rtc_readalarm
rtc: sysfs: Use time64_t variables to set time/alarm
rtc: xgene: mark PM functions as __maybe_unused
rtc: xgene: Fix suspend/resume
rtc: pcf8563: don't alway enable the alarm
rtc: pcf8563: fix output clock rate
rtc: rx8010: Fix for incorrect return value
rtc: rx8010: Specify correct address for RX8010_RESV31
rtc: rx8010: Remove duplicate define
rtc: m41t80: remove unneeded checks from m41t80_sqw_set_rate
rtc: m41t80: avoid i2c read in m41t80_sqw_is_prepared
rtc: m41t80: avoid i2c read in m41t80_sqw_recalc_rate
rtc: m41t80: fix m41t80_sqw_round_rate return value
rtc: m41t80: m41t80_sqw_set_rate should return 0 on success
rtc: add support for NXP PCF85363 real-time clock
rtc: omap: Support scratch registers
...
Andy Lutomirski [Thu, 23 Nov 2017 04:39:16 +0000 (20:39 -0800)]
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
Running this code with IRQs enabled (where dummy_lock is a spinlock):
static void check_load_gs_index(void)
{
/* This will fail. */
load_gs_index(0xffff);
spin_lock(&dummy_lock);
spin_unlock(&dummy_lock);
}
Will generate a lockdep warning. The issue is that the actual write
to %gs would cause an exception with IRQs disabled, and the exception
handler would, as an inadvertent side effect, update irqflag tracing
to reflect the IRQs-off status. native_load_gs_index() would then
turn IRQs back on and return with irqflag tracing still thinking that
IRQs were off. The dummy lock-and-unlock causes lockdep to notice the
error and warn.
Fix it by adding the missing tracing.
Apparently nothing did this in a context where it mattered. I haven't
tried to find a code path that would actually exhibit the warning if
appropriately nasty user code were running.
I suspect that the security impact of this bug is very, very low --
production systems don't run with lockdep enabled, and the warning is
mostly harmless anyway.
Found during a quick audit of the entry code to try to track down an
unrelated bug that Ingo found in some still-in-development code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e1aeb0e6ba8dd430ec36c8a35e63b429698b4132.1511411918.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>