git.openwrt.org Git - openwrt/staging/blogic.git/log

projects / openwrt / staging / blogic.git / log

Christoffer Dall [Fri, 2 Nov 2018 07:53:22 +0000 (08:53 +0100)]

KVM: arm/arm64: Fix unintended stage 2 PMD mappings

There are two things we need to take care of when we create block
mappings in the stage 2 page tables:

  (1) The alignment within a PMD between the host address range and the
  guest IPA range must be the same, since otherwise we end up mapping
  pages with the wrong offset.

  (2) The head and tail of a memory slot may not cover a full block
  size, and we have to take care to not map those with block
  descriptors, since we could expose memory to the guest that the host
  did not intend to expose.

So far, we have been taking care of (1), but not (2), and our commentary
describing (1) was somewhat confusing.

This commit attempts to factor out the checks of both into a common
function, and if we don't pass the check, we won't attempt any PMD
mappings for neither hugetlbfs nor THP.

Note that we used to only check the alignment for THP, not for
hugetlbfs, but as far as I can tell the check needs to be applied to
both scenarios.

Cc: Ralph Palutke <ralph.palutke@fau.de>
Cc: Lukas Braun <koomi@moshbit.net>
Reported-by: Lukas Braun <koomi@moshbit.net>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Marc Zyngier [Tue, 18 Dec 2018 14:59:09 +0000 (14:59 +0000)]

arm/arm64: KVM: vgic: Force VM halt when changing the active state of GICv3 PPIs/SGIs

We currently only halt the guest when a vCPU messes with the active
state of an SPI. This is perfectly fine for GICv2, but isn't enough
for GICv3, where all vCPUs can access the state of any other vCPU.

Let's broaden the condition to include any GICv3 interrupt that
has an active state (i.e. all but LPIs).

Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Marc Zyngier [Tue, 4 Dec 2018 10:44:22 +0000 (10:44 +0000)]

arm64: KVM: Add trapped system register access tracepoint

We're pretty blind when it comes to system register tracing,
and rely on the ESR value displayed by kvm_handle_sys, which
isn't much.

Instead, let's add an actual name to the sysreg entries, so that
we can finally print it as we're about to perform the access
itself.

The new tracepoint is conveniently called kvm_sys_access.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Thu, 29 Nov 2018 11:20:01 +0000 (12:20 +0100)]

KVM: arm64: Make vcpu const in vcpu_read_sys_reg

vcpu_read_sys_reg should not be modifying the VCPU structure.
Eventually, to handle EL2 sysregs for nested virtualization, we will
call vcpu_read_sys_reg from places that have a const vcpu pointer, which
will complain about the lack of the const modifier on the read path.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Tue, 11 Dec 2018 08:54:11 +0000 (09:54 +0100)]

KVM: arm/arm64: arch_timer: Simplify kvm_timer_vcpu_terminate

kvm_timer_vcpu_terminate can only be called in two scenarios:

1. As part of cleanup during a failed VCPU create
2. As part of freeing the whole VM (struct kvm refcount == 0)

In the first case, we cannot have programmed any timers or mapped any
IRQs, and therefore we do not have to cancel anything or unmap anything.

In the second case, the VCPU will have gone through kvm_timer_vcpu_put,
which will have canceled the emulated physical timer's hrtimer, and we
do not need to that here as well. We also do not care if the irq is
recorded as mapped or not in the VGIC data structure, because the whole
VM is going away. That leaves us only with having to ensure that we
cancel the bg_timer if we were blocking the last time we called
kvm_timer_vcpu_put().

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Tue, 27 Nov 2018 12:48:08 +0000 (13:48 +0100)]

KVM: arm/arm64: Remove arch timer workqueue

The use of a work queue in the hrtimer expire function for the bg_timer
is a leftover from the time when we would inject interrupts when the
bg_timer expired.

Since we are no longer doing that, we can instead call
kvm_vcpu_wake_up() directly from the hrtimer function and remove all
workqueue functionality from the arch timer code.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Mon, 3 Dec 2018 20:31:24 +0000 (21:31 +0100)]

KVM: arm/arm64: Fixup the kvm_exit tracepoint

The kvm_exit tracepoint strangely always reported exits as being IRQs.
This seems to be because either the __print_symbolic or the tracepoint
macros use a variable named idx.

Take this chance to update the fields in the tracepoint to reflect the
concepts in the arm64 architecture that we pass to the tracepoint and
move the exception type table to the same location and header files as
the exits code.

We also clear out the exception code to 0 for IRQ exits (which
translates to UNKNOWN in text) to make it slighyly less confusing to
parse the trace output.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Sat, 1 Dec 2018 21:21:47 +0000 (13:21 -0800)]

KVM: arm/arm64: vgic: Consider priority and active state for pending irq

When checking if there are any pending IRQs for the VM, consider the
active state and priority of the IRQs as well.

Otherwise we could be continuously scheduling a guest hypervisor without
it seeing an IRQ.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Gustavo A. R. Silva [Wed, 12 Dec 2018 20:11:23 +0000 (14:11 -0600)]

KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq()

When using the nospec API, it should be taken into account that:

"...if the CPU speculates past the bounds check then
* array_index_nospec() will clamp the index within the range of [0,
* size)."

The above is part of the header for macro array_index_nospec() in
linux/nospec.h

Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI
or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up
returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead
of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic:

/* SGIs and PPIs */
if (intid <= VGIC_MAX_PRIVATE)
return &vcpu->arch.vgic_cpu.private_irqs[intid];

/* SPIs */
if (intid <= VGIC_MAX_SPI)
return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];

are valid values for intid.

Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1
and VGIC_MAX_SPI + 1 as arguments for its parameter size.

Fixes: 41b87599c743 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
[dropped the SPI part which was fixed separately]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Marc Zyngier [Tue, 4 Dec 2018 17:11:19 +0000 (17:11 +0000)]

KVM: arm/arm64: vgic: Cap SPIs to the VM-defined maximum

SPIs should be checked against the VMs specific configuration, and
not the architectural maximum.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Tue, 6 Nov 2018 12:33:38 +0000 (13:33 +0100)]

KVM: arm64: Clarify explanation of STAGE2_PGTABLE_LEVELS

In attempting to re-construct the logic for our stage 2 page table
layout I found the reasoning in the comment explaining how we calculate
the number of levels used for stage 2 page tables a bit backwards.

This commit attempts to clarify the comment, to make it slightly easier
to read without having the Arm ARM open on the right page.

While we're at it, fixup a typo in a comment that was recently changed.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Julien Thierry [Mon, 26 Nov 2018 18:26:44 +0000 (18:26 +0000)]

KVM: arm/arm64: vgic: Do not cond_resched_lock() with IRQs disabled

To change the active state of an MMIO, halt is requested for all vcpus of
the affected guest before modifying the IRQ state. This is done by calling
cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
disabled at this point and we cannot reschedule a vcpu.

We actually don't need any of this, as kvm_arm_halt_guest ensures that
all the other vcpus are out of the guest. Let's just drop that useless
code.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:41 +0000 (17:10 +0000)]

KVM: arm64: Add support for creating PUD hugepages at stage 2

KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.

Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on cores that support mapping
larger block sizes in the TLB entries.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ]
Signed-off-by: Suzuki Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:40 +0000 (17:10 +0000)]

KVM: arm64: Update age handlers to support PUD hugepages

In preparation for creating larger hugepages at Stage 2, add support
to the age handling notifiers for PUD hugepages when encountered.

Provide trivial helpers for arm32 to allow sharing code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:39 +0000 (17:10 +0000)]

KVM: arm64: Support handling access faults for PUD hugepages

In preparation for creating larger hugepages at Stage 2, extend the
access fault handling at Stage 2 to support PUD hugepages when
encountered.

Provide trivial helpers for arm32 to allow sharing of code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON(1) in PUD helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:38 +0000 (17:10 +0000)]

KVM: arm64: Support PUD hugepage in stage2_is_exec()

In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.

Provide trivial implementations of arm32 helpers to allow sharing of
code.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:37 +0000 (17:10 +0000)]

KVM: arm64: Support dirty page tracking for PUD hugepages

In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.

Also, provide trivial implementations of required kvm_s2pud_* helpers
to allow sharing of code with arm32.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[ Replaced BUG() => WARN_ON() in arm32 pud helpers ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:36 +0000 (17:10 +0000)]

KVM: arm/arm64: Introduce helpers to manipulate page table entries

Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.

The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do not exist on arm.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:35 +0000 (17:10 +0000)]

KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault

Stage 2 fault handler marks a page as executable if it is handling an
execution fault or if it was a permission fault in which case the
executable bit needs to be preserved.

The logic to decide if the page should be marked executable is
duplicated for PMD and PTE entries. To avoid creating another copy
when support for PUD hugepages is introduced refactor the code to
share the checks needed to mark a page table entry as executable.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Punit Agrawal [Tue, 11 Dec 2018 17:10:34 +0000 (17:10 +0000)]

KVM: arm/arm64: Share common code in user_mem_abort()

The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.

Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them across the different pagesizes.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Tue, 11 Dec 2018 11:51:03 +0000 (12:51 +0100)]

KVM: arm/arm64: vgic-v2: Set active_source to 0 when restoring state

When restoring the active state from userspace, we don't know which CPU
was the source for the active state, and this is not architecturally
exposed in any of the register state.

Set the active_source to 0 in this case. In the future, we can expand
on this and exposse the information as additional information to
userspace for GICv2 if anyone cares.

Cc: stable@vger.kernel.org
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Mark Rutland [Thu, 6 Dec 2018 12:31:44 +0000 (12:31 +0000)]

KVM: arm/arm64: Log PSTATE for unhandled sysregs

When KVM traps an unhandled sysreg/coproc access from a guest, it logs
the guest PC. To aid debugging, it would be helpful to know which
exception level the trap came from, along with other PSTATE/CPSR bits,
so let's log the PSTATE/CPSR too.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Christoffer Dall [Tue, 11 Dec 2018 12:23:57 +0000 (13:23 +0100)]

KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less

We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.

However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.

As far as I can tell, this can lead to the same kind of race:

  VM 0, VCPU 0 VM 0, VCPU 1
  ------------ ------------
  update_vttbr (vmid 254)
   update_vttbr (vmid 1) // roll over
read_lock(kvm_vmid_lock);
force_vm_exit()
  local_irq_disable
  need_new_vmid_gen == false //because vmid gen matches

  enter_guest (vmid 254)
   kvm_arch.vttbr = <PGD>:<VMID 1>
read_unlock(kvm_vmid_lock);

   enter_guest (vmid 1)

Which results in running two VCPUs in the same VM with different VMIDs
and (even worse) other VCPUs from other VMs could now allocate clashing
VMID 254 from the new generation as long as VCPU 0 is not exiting.

Attempt to solve this by making sure vttbr is updated before another CPU
can observe the updated VMID generation.

Cc: stable@vger.kernel.org
Fixes: f0cf47d939d0 "KVM: arm/arm64: Close VMID generation race"
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Mark Rutland [Fri, 9 Nov 2018 15:07:11 +0000 (15:07 +0000)]

arm64: KVM: Consistently advance singlestep when emulating instructions

When we emulate a guest instruction, we don't advance the hardware
singlestep state machine, and thus the guest will receive a software
step exception after a next instruction which is not emulated by the
host.

We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
whether userspace requested a single step, and fake a debug exception
from within the kernel. Other times, we advance the HW singlestep state
rely on the HW to generate the exception for us. Thus, the observed step
behaviour differs for host and guest.

Let's make this simpler and consistent by always advancing the HW
singlestep state machine when we skip an instruction. Thus we can rely
on the hardware to generate the singlestep exception for us, and never
need to explicitly check for an active-pending step, nor do we need to
fake a debug exception from the guest.

Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Mark Rutland [Fri, 9 Nov 2018 15:07:10 +0000 (15:07 +0000)]

arm64: KVM: Skip MMIO insn after emulation

When we emulate an MMIO instruction, we advance the CPU state within
decode_hsr(), before emulating the instruction effects.

Having this logic in decode_hsr() is opaque, and advancing the state
before emulation is problematic. It gets in the way of applying
consistent single-step logic, and it prevents us from being able to fail
an MMIO instruction with a synchronous exception.

Clean this up by only advancing the CPU state *after* the effects of the
instruction are emulated.

Cc: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

commit | commitdiff | tree

Linus Torvalds [Sun, 25 Nov 2018 22:19:31 +0000 (14:19 -0800)]

Linux 4.20-rc4

commit | commitdiff | tree

Linus Torvalds [Sun, 25 Nov 2018 17:24:40 +0000 (09:24 -0800)]

Merge tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
"Two dma-direct / swiotlb regressions fixes:

   - zero is a valid physical address on some arm boards, we can't use
     it as the error value

   - don't try to cache flush the error return value (no matter what it
     is)"

* tag 'dma-mapping-4.20-3' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Skip cache maintenance on map error
  dma-direct: Make DIRECT_MAPPING_ERROR viable for SWIOTLB

commit | commitdiff | tree

Linus Torvalds [Sun, 25 Nov 2018 17:19:58 +0000 (09:19 -0800)]

Merge tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

- Fix a NFSv4 state manager deadlock when returning a delegation

- NFSv4.2 copy do not allocate memory under the lock

- flexfiles: Use the correct stateid for IO in the tightly coupled case

* tag 'nfs-for-4.20-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  flexfiles: use per-mirror specified stateid for IO
  NFSv4.2 copy do not allocate memory under the lock
  NFSv4: Fix a NFSv4 state manager deadlock

commit | commitdiff | tree

Luc Van Oostenryck [Sun, 25 Nov 2018 16:34:05 +0000 (17:34 +0100)]

MAINTAINERS: change Sparse's maintainer

I'm taking over the maintainance of Sparse so add myself as
maintainer and move Christopher's info to CREDITS.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 25 Nov 2018 02:44:01 +0000 (18:44 -0800)]

Merge tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax

Pull XArray updates from Matthew Wilcox:
"We found some bugs in the DAX conversion to XArray (and one bug which
  predated the XArray conversion). There were a couple of bugs in some
  of the higher-level functions, which aren't actually being called in
  today's kernel, but surfaced as a result of converting existing radix
  tree & IDR users over to the XArray.

  Some of the other changes to how the higher-level APIs work were also
  motivated by converting various users; again, they're not in use in
  today's kernel, so changing them has a low probability of introducing
  a bug.

  Dan can still trigger a bug in the DAX code with hot-offline/online,
  and we're working on tracking that down"

* tag 'xarray-4.20-rc4' of git://git.infradead.org/users/willy/linux-dax:
  XArray tests: Add missing locking
  dax: Avoid losing wakeup in dax_lock_mapping_entry
  dax: Fix huge page faults
  dax: Fix dax_unlock_mapping_entry for PMD pages
  dax: Reinstate RCU protection of inode
  dax: Make sure the unlocking entry isn't locked
  dax: Remove optimisation from dax_lock_mapping_entry
  XArray tests: Correct some 64-bit assumptions
  XArray: Correct xa_store_range
  XArray: Fix Documentation
  XArray: Handle NULL pointers differently for allocation
  XArray: Unify xa_store and __xa_store
  XArray: Add xa_store_bh() and xa_store_irq()
  XArray: Turn xa_erase into an exported function
  XArray: Unify xa_cmpxchg and __xa_cmpxchg
  XArray: Regularise xa_reserve
  nilfs2: Use xa_erase_irq
  XArray: Export __xa_foo to non-GPL modules
  XArray: Fix xa_for_each with a single element at 0

commit | commitdiff | tree

Linus Torvalds [Sat, 24 Nov 2018 20:58:47 +0000 (12:58 -0800)]

Merge branch 'for-linus' of git://git./linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- revert of the high-resolution scrolling feature, as it breaks certain
   hardware due to incompatibilities between Logitech and Microsoft
   worlds. Peter Hutterer is working on a fixed implementation. Until
   that is finished, revert by Benjamin Tissoires.

- revert of incorrect strncpy->strlcpy conversion in uhid, from David
   Herrmann

- fix for buggy sendfile() implementation on uhid device node, from
   Eric Biggers

- a few assorted device-ID specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  Revert "Input: Add the `REL_WHEEL_HI_RES` event code"
  Revert "HID: input: Create a utility class for counting scroll events"
  Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""
  Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"
  Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"
  Revert "HID: logitech: fix a used uninitialized GCC warning"
  Revert "HID: input: simplify/fix high-res scroll event handling"
  HID: Add quirk for Primax PIXART OEM mice
  HID: i2c-hid: Disable runtime PM for LG touchscreen
  HID: multitouch: Add pointstick support for Cirque Touchpad
  HID: steam: remove input device when a hid client is running.
  Revert "HID: uhid: use strlcpy() instead of strncpy()"
  HID: uhid: forbid UHID_CREATE under KERNEL_DS or elevated privileges
  HID: input: Ignore battery reported by Symbol DS4308
  HID: Add quirk for Microsoft PIXART OEM mouse

commit | commitdiff | tree

Linus Torvalds [Sat, 24 Nov 2018 17:42:32 +0000 (09:42 -0800)]

Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas::

- Fix wrong conflict resolution around CONFIG_ARM64_SSBD

- Fix sparse warning on unsigned long constant

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
arm64: sysreg: fix sparse warnings

commit | commitdiff | tree

Linus Torvalds [Sat, 24 Nov 2018 17:19:38 +0000 (09:19 -0800)]

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.

2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.

3) Fix socket wmem accounting in SCTP, from Xin Long.

4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.

5) qed driver passes bytes instead of bits into second arg of
    bitmap_weight(). From Denis Bolotin.

6) Fix reset deadlock in ibmvnic, from Juliet Kim.

7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
    Machata.

8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
    compression during this period, from Eric Dumazet.

9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.

10) Don't leave dangling error pointer if bpf_prog_add() fails in
    thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
    headers, set sq->tso_hdrs to NULL.

11) Fix race condition over state variables in act_police, from Davide
    Caratti.

12) Disable guest csum in the presence of XDP in virtio_net, from Jason
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: gemini: Fix copy/paste error
  net: phy: mscc: fix deadlock in vsc85xx_default_config
  dt-bindings: dsa: Fix typo in "probed"
  net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
  net: amd: add missing of_node_put()
  team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
  virtio-net: fail XDP set if guest csum is negotiated
  virtio-net: disable guest csum during XDP set
  net/sched: act_police: add missing spinlock initialization
  net: don't keep lonely packets forever in the gro hash
  net/ipv6: re-do dad when interface has IFF_NOARP flag change
  packet: copy user buffers before orphan or clone
  ibmvnic: Update driver queues after change in ring size support
  ibmvnic: Fix RX queue buffer cleanup
  net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
  net/dim: Update DIM start sample after each DIM iteration
  net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
  net/smc: use after free fix in smc_wr_tx_put_slot()
  net/smc: atomic SMCD cursor handling
  net/smc: add SMC-D shutdown signal
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 24 Nov 2018 17:11:52 +0000 (09:11 -0800)]

Merge tag 'xfs-4.20-fixes-2' of git://git./fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"Dave and I have continued our work fixing corruption problems that can
  be found when running long-term burn-in exercisers on xfs. Here are
  some patches fixing most of the problems, but there will likely be
  more. :/

   - Numerous corruption fixes for copy on write

   - Numerous corruption fixes for blocksize < pagesize writes

   - Don't miscalculate AG reservations for small final AGs

   - Fix page cache truncation to work properly for reflink and extent
     shifting

   - Fix use-after-free when retrying failed inode/dquot buffer logging

   - Fix corruptions seen when using copy_file_range in directio mode"

* tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: readpages doesn't zero page tail beyond EOF
  vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
  iomap: dio data corruption and spurious errors when pipes fill
  iomap: sub-block dio needs to zeroout beyond EOF
  iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
  xfs: delalloc -> unwritten COW fork allocation can go wrong
  xfs: flush removing page cache in xfs_reflink_remap_prep
  xfs: extent shifting doesn't fully invalidate page cache
  xfs: finobt AG reserves don't consider last AG can be a runt
  xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
  xfs: uncached buffer tracing needs to print bno
  xfs: make xfs_file_remap_range() static
  xfs: fix shared extent data corruption due to missing cow reservation

commit | commitdiff | tree

Andreas Fiedler [Fri, 23 Nov 2018 23:16:34 +0000 (00:16 +0100)]

net: gemini: Fix copy/paste error

The TX stats should be started with the tx_stats_syncp,
there seems to be a copy/paste error in the driver.

Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Quentin Schulz [Fri, 23 Nov 2018 18:01:51 +0000 (19:01 +0100)]

net: phy: mscc: fix deadlock in vsc85xx_default_config

The vsc85xx_default_config function called in the vsc85xx_config_init
function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
mistakenly calls phy_read and phy_write in-between phy_select_page and
phy_restore_page.

phy_select_page and phy_restore_page actually take and release the MDIO
bus lock and phy_write and phy_read take and release the lock to write
or read to a PHY register.

Let's fix this deadlock by using phy_modify_paged which handles
correctly a read followed by a write in a non-standard page.

Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions")
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Fabio Estevam [Fri, 23 Nov 2018 17:46:50 +0000 (15:46 -0200)]

dt-bindings: dsa: Fix typo in "probed"

The correct form is "can be probed", so fix the typo.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Bianconi [Fri, 23 Nov 2018 17:28:01 +0000 (18:28 +0100)]

net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue

Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
since it is used to check if tso dma descriptor queue has been previously
allocated. The issue can be triggered with the following reproducer:

$ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
$ip link set dev enP2p1s0v0 xdpdrv off

[  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
[  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
[  341.526654] pc : __vunmap+0x98/0xe0
[  341.530132] lr : __vunmap+0x98/0xe0
[  341.533609] sp : ffff00001c5db860
[  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
[  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
[  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
[  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
[  341.558117] x21: 0000000000000000 x20: 0000000000000000
[  341.563418] x19: ffff000017e57000 x18: 0000000000000000
[  341.568719] x17: 0000000000000000 x16: 0000000000000000
[  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
[  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
[  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
[  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
[  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
[  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
[  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
[  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
[  341.616428] Call trace:
[  341.618866]  __vunmap+0x98/0xe0
[  341.621997]  vunmap+0x3c/0x50
[  341.624961]  arch_dma_free+0x68/0xa0
[  341.628534]  dma_direct_free+0x50/0x80
[  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
[  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
[  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
[  341.647066]  __dev_close_many+0x9c/0x108
[  341.650977]  dev_close_many+0xa4/0x158
[  341.654720]  rollback_registered_many+0x140/0x530
[  341.659414]  rollback_registered+0x54/0x80
[  341.663499]  unregister_netdevice_queue+0x9c/0xe8
[  341.668192]  unregister_netdev+0x28/0x38
[  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
[  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
[  341.680630]  pci_device_shutdown+0x44/0x88
[  341.684720]  device_shutdown+0x144/0x250
[  341.688640]  kernel_restart_prepare+0x44/0x50
[  341.692986]  kernel_restart+0x20/0x68
[  341.696638]  __se_sys_reboot+0x210/0x238
[  341.700550]  __arm64_sys_reboot+0x24/0x30
[  341.704555]  el0_svc_handler+0x94/0x110
[  341.708382]  el0_svc+0x8/0xc
[  341.711252] ---[ end trace 3f4019c8439959c9 ]---
[  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
[  341.723872] flags: 0x1fffe000000000()
[  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
[  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

where xdp_dummy.c is a simple bpf program that forwards the incoming
frames to the network stack (available here:
https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)

Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support")
Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yangtao Li [Thu, 22 Nov 2018 12:34:41 +0000 (07:34 -0500)]

net: amd: add missing of_node_put()

of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.
This place doesn't do that, so fix it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hangbin Liu [Thu, 22 Nov 2018 08:15:28 +0000 (16:15 +0800)]

team: no need to do team_notify_peers or team_mcast_rejoin when disabling port

team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin()
will send multicast join group message to notify peers. We should do this when
enabling/changed to a new port. But it doesn't make sense to do it when a port
is disabled.

On the other hand, when we set mcast_rejoin_count to 2, and do a failover,
team_port_disable() will increase mcast_rejoin.count_pending to 2 and then
team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send
4 mcast rejoin messages at latest, which will make user confused. The same
with notify_peers.count.

Fix it by deleting team_notify_peers() and team_mcast_rejoin() in
team_port_disable().

Reported-by: Liang Li <liali@redhat.com>
Fixes: fc423ff00df3a ("team: add peer notification")
Fixes: 492b200efdd20 ("team: add support for sending multicast rejoins")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Thu, 22 Nov 2018 06:36:31 +0000 (14:36 +0800)]

virtio-net: fail XDP set if guest csum is negotiated

We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jason Wang [Thu, 22 Nov 2018 06:36:30 +0000 (14:36 +0800)]

virtio-net: disable guest csum during XDP set

We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:

- It could be overridden by header adjustment, thus is might be not
correct after XDP processing.
- There's no way to pass such metadata information through
XDP_REDIRECT to another driver.
- XDP does not support checksum offload right now.

So simply disable guest csum if possible in this the case of XDP.

Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 19:24:55 +0000 (11:24 -0800)]

Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client

Pullk ceph fix from Ilya Dryomov:
"A messenger fix, marked for stable"

* tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client:
libceph: fall back to sendmsg for slab pages

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 19:20:14 +0000 (11:20 -0800)]

Merge tag 'for-linus-20181123' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
"Just a single fix for this week, fixing an issue with nvme-fc"

* tag 'for-linus-20181123' of git://git.kernel.dk/linux-block:
nvme-fc: resolve io failures during connect

commit | commitdiff | tree

Davide Caratti [Wed, 21 Nov 2018 17:23:53 +0000 (18:23 +0100)]

net/sched: act_police: add missing spinlock initialization

commit f2cbd4852820 ("net/sched: act_police: fix race condition on state
variables") introduces a new spinlock, but forgets its initialization.
Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
action is newly created, to avoid the following lockdep splat:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
<...>
Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x581/0x590
  __lock_acquire+0xd4/0x1330
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? tcf_police_init+0x55a/0x650 [act_police]
  _raw_spin_lock_bh+0x34/0x40
  ? tcf_police_init+0x2fa/0x650 [act_police]
  tcf_police_init+0x2fa/0x650 [act_police]
  tcf_action_init_1+0x384/0x4c0
  tcf_action_init+0xf6/0x160
  tcf_action_add+0x73/0x170
  tc_ctl_action+0x122/0x160
  rtnetlink_rcv_msg+0x2a4/0x490
  ? netlink_deliver_tap+0x99/0x400
  ? validate_linkmsg+0x370/0x370
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x196/0x230
  netlink_sendmsg+0x2e5/0x3e0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x280/0x2f0
  ? _raw_spin_unlock+0x24/0x30
  ? handle_pte_fault+0xafe/0xf30
  ? find_held_lock+0x2d/0x90
  ? syscall_trace_enter+0x1df/0x360
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1841c7cf10
Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000

Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Paolo Abeni [Wed, 21 Nov 2018 17:21:35 +0000 (18:21 +0100)]

net: don't keep lonely packets forever in the gro hash

Eric noted that with UDP GRO and NAPI timeout, we could keep a single
UDP packet inside the GRO hash forever, if the related NAPI instance
calls napi_gro_complete() at an higher frequency than the NAPI timeout.
Willem noted that even TCP packets could be trapped there, till the
next retransmission.
This patch tries to address the issue, flushing the old packets -
those with a NAPI_GRO_CB age before the current jiffy - before scheduling
the NAPI timeout. The rationale is that such a timeout should be
well below a jiffy and we are not flushing packets eligible for sane GRO.

v1 -> v2:
- clarified the commit message and comment

RFC -> v1:
- added 'Fixes tags', cleaned-up the wording.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hangbin Liu [Wed, 21 Nov 2018 13:52:33 +0000 (21:52 +0800)]

net/ipv6: re-do dad when interface has IFF_NOARP flag change

When we add a new IPv6 address, we should also join corresponding solicited-node
multicast address, unless the interface has IFF_NOARP flag, as function
addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
not do dad and add the mcast address. So we will drop corresponding neighbour
discovery message that came from other nodes.

A typical example is after creating a ipvlan with mode l3, setting up an ipv6
address and changing the mode to l2. Then we will not be able to ping this
address as the interface doesn't join related solicited-node mcast address.

Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
corresponding mcast group and check if there is a duplicate address on the
network.

Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 19:15:27 +0000 (11:15 -0800)]

Merge tag 'iommu-fixes-v4.20-rc3' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

- Two fixes for the Intel VT-d driver to fix a NULL-ptr dereference and
   an unbalance in an allocate/free path (allocated with memremap, freed
   with iounmap)

- Fix for a crash in the Renesas IOMMU driver

- Fix for the Advanced Virtual Interrupt Controler (AVIC) code in the
   AMD IOMMU driver

* tag 'iommu-fixes-v4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Use memunmap to free memremap
  amd/iommu: Fix Guest Virtual APIC Log Tail Address Register
  iommu/ipmmu-vmsa: Fix crash on early domain free
  iommu/vt-d: Fix NULL pointer dereference in prq_event_thread()

commit | commitdiff | tree

Willem de Bruijn [Tue, 20 Nov 2018 18:00:18 +0000 (13:00 -0500)]

packet: copy user buffers before orphan or clone

tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.

This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).

Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).

Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.

Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.

The fix relies on features introduced in commit 52267790ef52 ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.

Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 18:56:16 +0000 (10:56 -0800)]

Merge tag 'acpi-4.20-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Prevent the ACPI core from registering a platform device for the
SMB0001 HID to avoid IRQ allocation issues (Hans de Goede)"

* tag 'acpi-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / platform: Add SMB0001 HID to forbidden_id_list

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 18:52:57 +0000 (10:52 -0800)]

Merge tag 'pm-4.20-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix two issues in the Operating Performance Points (OPP)
  framework, one cpufreq driver issue, one problem related to the tasks
  freezer and a few build-related issues in the cpupower utility.

  Specifics:

   - Fix tasks freezer deadlock in de_thread() that occurs if one of its
     sub-threads has been frozen already (Chanho Min).

   - Avoid registering a platform device by the ti-cpufreq driver on
     platforms that cannot use it (Dave Gerlach).

   - Fix a mistake in the ti-opp-supply operating performance points
     (OPP) driver that caused an incorrect reference voltage to be used
     and make it adjust the minimum voltage dynamically to avoid hangs
     or crashes in some cases (Keerthy).

   - Fix issues related to compiler flags in the cpupower utility and
     correct a linking problem in it by renaming a file with a duplicate
     name (Jiri Olsa, Konstantin Khlebnikov)"

* tag 'pm-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  exec: make de_thread() freezable
  cpufreq: ti-cpufreq: Only register platform_device when supported
  opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call
  opp: ti-opp-supply: Dynamically update u_volt_min
  tools cpupower: Override CFLAGS assignments
  tools cpupower debug: Allow to use outside build flags
  tools/power/cpupower: fix compilation with STATIC=true

commit | commitdiff | tree

Will Deacon [Wed, 21 Nov 2018 15:07:00 +0000 (15:07 +0000)]

arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block

When merging support for SSBD and the CRC32 instructions, the conflict
resolution for the new capability entries in arm64_features[]
inadvertedly predicated the availability of the CRC32 instructions on
CONFIG_ARM64_SSBD, despite the functionality being entirely unrelated.

Move the #ifdef CONFIG_ARM64_SSBD down so that it only covers the SSBD
capability.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 18:40:19 +0000 (10:40 -0800)]

Merge tag 'gpio-v4.20-2' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Minor stuff except the IDA leak which was kind of important to fix.
  Also new maintainers, yay.

   - Do not lose an IDA on the gpiochip register errorpath.

   - Fix the PXA non-pincontrol GPIO-using platforms.

   - Fix the direction on the mockup GPIO driver.

   - Add some MAINTAINERS stuff: Bartosz stepped up as GPIO
     co-maintainer, and Andy established an Intel git tree"

* tag 'gpio-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  MAINTAINERS: Do maintain Intel GPIO drivers via separate tree
  gpio: mockup: fix indicated direction
  gpio: pxa: fix legacy non pinctrl aware builds again
  gpio: don't free unallocated ida on gpiochip_add_data_with_key() error path
  MAINTAINERS: add myself as co-maintainer of gpiolib

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 18:36:02 +0000 (10:36 -0800)]

Merge tag 'mmc-v4.20-rc2' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC host:

   - sdhci-pci: Fixup card detect lookup

   - sdhci-pci: Workaround GLK firmware bug for tuning"

* tag 'mmc-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-pci: Workaround GLK firmware failing to restore the tuning value
  mmc: sdhci-pci: Try "cd" for card-detect lookup before using NULL

commit | commitdiff | tree

Linus Torvalds [Fri, 23 Nov 2018 18:03:08 +0000 (10:03 -0800)]

Merge tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Regular drm fixes:

  amdgpu:
   - Vega20 fixes
   - firmware loading fix
   - panel display fix
   - override fix

  i915:
   - Sandybridge lockup fix
   - fastboot DSI panel fix
   - GPU hang on Broxton
   - GPU reloc fixes on pineview/bearlake

  ast:
   - screen blurring fix
   - cursor appearance fix

  udmabuf:
   - mmap fix

  vc4:
   - NULL deref fix
   - async cursor update fix

  All seems pretty normal at this stage"

* tag 'drm-fixes-2018-11-23' of git://anongit.freedesktop.org/drm/drm:
  drm/ast: fixed cursor may disappear sometimes
  drm/ast: change resolution may cause screen blurred
  drm/i915: Add rotation readout for plane initial config
  drm/i915: Force a LUT update in intel_initial_commit()
  drm/fb-helper: Blacklist writeback when adding connectors to fbdev
  drm/i915: Write GPU relocs harder with gen3
  drm/amdgpu: Enable HDP memory light sleep
  drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
  drm/amd/pp: handle negative values when reading OD
  drm/amdgpu: Add missing firmware entry for HAINAN
  drm/amd/powerplay: disable Vega20 DS related features
  drm/amdgpu: Fix oops when pp_funcs->switch_power_profile is unset
  drm/i915: Disable LP3 watermarks on all SNB machines
  drm/ast: Remove existing framebuffers before loading driver
  udmabuf: set read/write flag when exporting
  drm/amd/display: Support amdgpu "max bpc" connector property (v2)
  drm/amdgpu: Add amdgpu "max bpc" connector property (v2)
  drm/vc4: Set ->legacy_cursor_update to false when doing non-async updates
  drm/vc4: Fix NULL pointer dereference in the async update path

commit | commitdiff | tree

Sergey Matyukevich [Fri, 16 Nov 2018 18:21:30 +0000 (21:21 +0300)]

arm64: sysreg: fix sparse warnings

Specify correct type for the constants to avoid
the following sparse complaints:

./arch/arm64/include/asm/sysreg.h:471:42: warning: constant 0xffffffffffffffff is so big it is unsigned long
./arch/arm64/include/asm/sysreg.h:512:42: warning: constant 0xffffffffffffffff is so big it is unsigned long

Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 23 Nov 2018 09:32:49 +0000 (10:32 +0100)]

Merge branches 'pm-cpufreq' and 'pm-sleep'

* pm-cpufreq:
cpufreq: ti-cpufreq: Only register platform_device when supported

* pm-sleep:
exec: make de_thread() freezable

commit | commitdiff | tree

Rafael J. Wysocki [Fri, 23 Nov 2018 09:32:22 +0000 (10:32 +0100)]

Merge branches 'pm-opp' and 'pm-tools'

* pm-opp:
  opp: ti-opp-supply: Correct the supply in _get_optimal_vdd_voltage call
  opp: ti-opp-supply: Dynamically update u_volt_min

* pm-tools:
  tools cpupower: Override CFLAGS assignments
  tools cpupower debug: Allow to use outside build flags
  tools/power/cpupower: fix compilation with STATIC=true

commit | commitdiff | tree

Dave Airlie [Fri, 23 Nov 2018 01:03:20 +0000 (11:03 +1000)]

Merge tag 'drm-intel-fixes-2018-11-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix for fastboot DSI panel boot time flicker regression, also fixes Bugzilla #108225
- Fix Bugzilla #101269 to avoid GPU hangs on Sandybridge machines
- Avoid GPU hang on error capture on Broxton with Vt-d enabled
- Avoid missing GPU relocations on Pineview and Bearlake (Gen3)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181122120555.GA18282@jlahtine-desk.ger.corp.intel.com

commit | commitdiff | tree

David S. Miller [Thu, 22 Nov 2018 19:53:26 +0000 (11:53 -0800)]

Merge branch 'ibmvnic-Fix-queue-and-buffer-accounting-errors'

Thomas Falcon says:

====================
ibmvnic: Fix queue and buffer accounting errors

This series includes two small fixes. The first resolves a typo bug
in the code to clean up unused RX buffers during device queue removal.
The second ensures that device queue memory is updated to reflect new
supported queue ring sizes after migration to other backing hardware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Falcon [Wed, 21 Nov 2018 17:17:59 +0000 (11:17 -0600)]

ibmvnic: Update driver queues after change in ring size support

During device reset, queue memory is not being updated to accommodate
changes in ring buffer sizes supported by backing hardware. Track
any differences in ring buffer sizes following the reset and update
queue memory when possible.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Thomas Falcon [Wed, 21 Nov 2018 17:17:58 +0000 (11:17 -0600)]

ibmvnic: Fix RX queue buffer cleanup

The wrong index is used when cleaning up RX buffer objects during release
of RX queues. Update to use the correct index counter.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Bianconi [Wed, 21 Nov 2018 15:32:10 +0000 (16:32 +0100)]

net: thunderx: set xdp_prog to NULL if bpf_prog_add fails

Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine
reports the error code instead of NULL in case of failure and xdp_prog
pointer value is used in the driver to verify if XDP is currently
enabled.
Moreover report the error code to userspace if nicvf_xdp_setup fails

Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tal Gilboa [Wed, 21 Nov 2018 14:28:23 +0000 (16:28 +0200)]

net/dim: Update DIM start sample after each DIM iteration

On every iteration of net_dim, the algorithm may choose to
check for the system state by comparing current data sample
with previous data sample. After each of these comparison,
regardless of the action taken, the sample used as baseline
is needed to be updated.

This patch fixes a bug that causes DIM to take wrong decisions,
due to never updating the baseline sample for comparison between
iterations. This way, DIM always compares current sample with
zeros.

Although this is a functional fix, it also improves and stabilizes
performance as the algorithm works properly now.

Performance:
Tested single UDP TX stream with pktgen:
samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
-m 24:8a:07:88:26:8b -f 3 -b 128

ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
Also, toggling between profiles is less frequent with the fix.

Fixes: 8115b750dbcb ("net/dim: use struct net_dim_sample as arg to net_dim")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tigran Mkrtchyan [Wed, 21 Nov 2018 11:25:41 +0000 (12:25 +0100)]

flexfiles: use per-mirror specified stateid for IO

rfc8435 says:

For tight coupling, ffds_stateid provides the stateid to be used by
the client to access the file.

However current implementation replaces per-mirror provided stateid with
by open or lock stateid.

Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and
nfs4_ff_layout_prepare_ds.

Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

commit | commitdiff | tree

Olga Kornievskaia [Wed, 21 Nov 2018 16:24:22 +0000 (11:24 -0500)]

NFSv4.2 copy do not allocate memory under the lock

Bruce pointed out that we shouldn't allocate memory while holding
a lock in the nfs4_callback_offload() and handle_async_copy()
that deal with a racing CB_OFFLOAD and reply to COPY case.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

commit | commitdiff | tree

Linus Torvalds [Thu, 22 Nov 2018 16:45:44 +0000 (08:45 -0800)]

Merge tag 'sound-4.20-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The only significant change is for OSS PCM emulation to convert with
  kvcalloc() to address both performance and security issues. It's a
  pretty straightforward change, which should be safe.

  The rest are, as usual, device-specific small fixes for HD-audio"

* tag 'sound-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/ca0132 - fix AE-5 pincfg
  ALSA: hda/ca0132 - Add new ZxR quirk
  ALSA: hda/ca0132 - Call pci_iounmap() instead of iounmap()
  ALSA: hda/realtek - Add quirk entry for HP Pavilion 15
  ALSA: oss: Use kvzalloc() for local buffer allocations

commit | commitdiff | tree

Linus Torvalds [Thu, 22 Nov 2018 16:43:06 +0000 (08:43 -0800)]

Merge tag 'char-misc-4.20-rc4' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for issues that have been
  reported.

  Nothing major, highlights include:

   - gnss sync write fixes

   - uio oops fix

   - nvmem fixes

   - other minor fixes and some documentation/maintainers updates

  Full details are in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Documentation/security-bugs: Postpone fix publication in exceptional cases
  MAINTAINERS: Add Sasha as a stable branch maintainer
  gnss: sirf: fix synchronous write timeout
  gnss: serial: fix synchronous write timeout
  uio: Fix an Oops on load
  test_firmware: fix error return getting clobbered
  nvmem: core: fix regression in of_nvmem_cell_get()
  misc: atmel-ssc: Fix section annotation on atmel_ssc_get_driver_data
  drivers/misc/sgi-gru: fix Spectre v1 vulnerability
  Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
  slimbus: ngd: remove unnecessary check

commit | commitdiff | tree

Linus Torvalds [Thu, 22 Nov 2018 16:39:29 +0000 (08:39 -0800)]

Merge tag 'usb-4.20-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are a number of small USB fixes for 4.20-rc4.

  There's the usual xhci and dwc2/3 fixes as well as a few minor other
  issues resolved for problems that have been reported. Full details are
  in the shortlog.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdc-acm: add entry for Hiro (Conexant) modem
  usb: xhci: Prevent bus suspend if a port connect change or polling state is detected
  usb: core: Fix hub port connection events lost
  usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
  Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers"
  usb: dwc2: pci: Fix an error code in probe
  usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
  xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
  usb: xhci: fix timeout for transition from RExit to U0
  usb: xhci: fix uninitialized completion when USB3 port got wrong status
  xhci: Add check for invalid byte size error when UAS devices are connected.
  xhci: handle port status events for removed USB3 hcd
  xhci: Fix leaking USB3 shared_hcd at xhci removal
  USB: misc: appledisplay: add 20" Apple Cinema Display
  USB: quirks: Add no-lpm quirk for Raydium touchscreens
  usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB
  USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub
  usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
  usb: dwc3: core: Clean up ULPI device

commit | commitdiff | tree

Linus Torvalds [Thu, 22 Nov 2018 16:35:30 +0000 (08:35 -0800)]

Merge tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Boris Brezillon:
"SPI NOR fixes:

   - Various fixes related to the SFDP parsing code merged in 4.20

   - Fix for a page fault in the cadence-qspi

  NAND fixes:

   - Fix a macro name conflict between the QCOM NAND controller driver
     and the RISC-V asm headers

   - Fix of-node handling in the atmel driver"

* tag 'mtd/fixes-for-4.20-rc4' of git://git.infradead.org/linux-mtd:
  mtd: spi-nor: fix selection of uniform erase type in flexible conf
  mtd: spi-nor: Fix Cadence QSPI page fault kernel panic
  mtd: rawnand: qcom: Namespace prefix some commands
  mtd: rawnand: atmel: fix OF child-node lookup
  mtd: spi_nor: pass DMA-able buffer to spi_nor_read_raw()
  mtd: spi-nor: don't overwrite errno in spi_nor_get_map_in_use()
  mtd: spi-nor: fix iteration over smpt array
  mtd: spi-nor: don't drop sfdp data if optional parsers fail

commit | commitdiff | tree

Linus Torvalds [Thu, 22 Nov 2018 16:31:46 +0000 (08:31 -0800)]

Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two small fixes.

  The qla2xxx is a regression from 4.18 and the ufs one is a device
  enablement fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: Fix hynix ufs bug with quirk on hi36xx SoC
  scsi: qla2xxx: Timeouts occur on surprise removal of QLogic adapter

commit | commitdiff | tree

Pan Bian [Wed, 21 Nov 2018 09:53:47 +0000 (17:53 +0800)]

iommu/vt-d: Use memunmap to free memremap

memunmap() should be used to free the return of memremap(), not
iounmap().

Fixes: dfddb969edf0 ('iommu/vt-d: Switch from ioremap_cache to memremap')
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:12 +0000 (16:27 +0100)]

Revert "Input: Add the `REL_WHEEL_HI_RES` event code"

This reverts commit aaf9978c3c0291ef3beaa97610bc9c3084656a85.

Quoting Peter:

There is a HID feature report called "Resolution Multiplier"
Described in the "Enhanced Wheel Support in Windows" doc and
the "USB HID Usage Tables" page 30.

http://download.microsoft.com/download/b/d/1/bd1f7ef4-7d72-419e-bc5c-9f79ad7bb66e/wheel.docx
https://www.usb.org/sites/default/files/documents/hut1_12v2.pdf

This was new for Windows Vista, so we're only a decade behind here. I only
accidentally found this a few days ago while debugging a stuck button on a
Microsoft mouse.

The docs above describe it like this: a wheel control by default sends
value 1 per notch. If the resolution multiplier is active, the wheel is
expected to send a value of $multiplier per notch (e.g. MS Sculpt mouse) or
just send events more often, i.e. for less physical motion (e.g. MS Comfort
mouse).

For the latter, you need the right HW of course. The Sculpt mouse has
tactile wheel clicks, so nothing really changes. The Comfort mouse has
continuous motion with no tactile clicks. Similar to the free-wheeling
Logitech mice but without any inertia.

Note that the doc also says that Vista and onwards *always* enable this
feature where available.

An example HID definition looks like this:

       Usage Page Generic Desktop (0x01)
       Usage Resolution Multiplier (0x48)
       Logical Minimum 0
       Logical Maximum 1
       Physical Minimum 1
       Physical Maximum 16
       Report Size 2 # in bits
       Report Count 1
       Feature (Data, Var, Abs)

So the actual bits have values 0 or 1 and that reflects real values 1 or 16.
We've only seen single-bits so far, so there's low-res and hi-res, but
nothing in between.

The multiplier is available for HID usages "Wheel" and "AC Pan" (horiz wheel).
Microsoft suggests that

> Vendors should ship their devices with smooth scrolling disabled and allow
> Windows to enable it. This ensures that the device works like a regular HID
> device on legacy operating systems that do not support smooth scrolling.
(see the wheel doc linked above)

The mice that we tested so far do reset on unplug.

Device Support looks to be all (?) Microsoft mice but nothing else

Not supported:
- Logitech G500s, G303
- Roccat Kone XTD
- all the cheap Lenovo, HP, Dell, Logitech USB mice that come with a
  workstation that I could find don't have it.
- Etekcity something something
- Razer Imperator

Supported:
- Microsoft Comfort Optical Mouse 3000 - yes, physical: 1:4
- Microsoft Sculpt Ergonomic Mouse - yes, physical: 1:12
- Microsoft Surface mouse - yes, physical: 1:4

So again, I think this is really just available on Microsoft mice, but
probably all decent MS mice released over the last decade.

Looking at the hardware itself:

- no noticeable notches in the weel
- low-res: 18 events per 360deg rotation (click angle 20 deg)
- high-res: 72 events per 360deg → matches multiplier of 4

- I can feel the notches during wheel turns
- low-res: 24 events per 360 deg rotation (click angle 15 deg)
  - horiz wheel is tilt-based, continuous output value 1
- high-res: 24 events per 360deg with value 12 → matches multiplier of 12
  - horiz wheel output rate doubles/triples?, values is 3

- It's a touch strip, not a wheel so no notches
- high-res: events have value 4 instead of 1
  a bit strange given that it doesn't actually have notches.

Ok, why is this an issue for the current API? First, because the logitech
multiplier used in Harry's patches looks suspiciously like the Resolution
Multiplier so I think we should assume it's the same thing. Nestor, can you
shed some light on that?

- `REL_WHEEL` is defined as the number of notches, emulated where needed.
- `REL_WHEEL_HI_RES` is the movement of the user's finger in microns.
- `WM_MOUSEWHEEL` (Windows) is is a multiple of 120, defined as "the threshold
  for action to be taken and one such action"
  https://docs.microsoft.com/en-us/windows/desktop/inputdev/wm-mousewheel

If the multiplier is set to M, this means we need an accumulated value of M
until we can claim there was a wheel click. So after enabling the multiplier
and setting it to the maximum (like Windows):
- M units are 15deg rotation → 1 unit is 2620/M micron (see below). This is
  the `REL_WHEEL_HI_RES` value.
  - wheel diameter 20mm: 15 deg rotation is 2.62mm, 2620 micron (pi * 20mm /
    (360deg/15deg))
- For every M units accumulated, send one `REL_WHEEL` event

The problem here is that we've now hardcoded 20mm/15 deg into the kernel and
we have no way of getting the size of the wheel or the click angle into the
kernel.

In userspace we now have to undo the kernel's calculation. If our click angle
is e.g. 20 degree we have to undo the (lossy) calculation from the kernel and
calculate the correct angle instead. This also means the 15 is a hardcoded
option forever and cannot be changed.

In hid-logitech-hidpp.c, the microns per unit is hardcoded per device.
Harry, did you measure those by hand? We'd need to update the kernel for
every device and there are 10 years worth of devices from MS alone.

The multiplier default is 8 which is in the right ballpark, so I'm pretty
sure this is the same as the Resolution Multiplier, just in HID++ lingo. And
given that the 120 magic factor is what Windows uses in the end, I can't
imagine Logitech rolling their own thing here. Nestor?

And we're already fairly inaccurate with the microns anyway. The MX Anywhere
2S has a click angle of 20 degrees (18 stops) and a 17mm wheel, so a wheel
notch is approximately 2.67mm, one event at multiplier 8 (1/8 of a notch)
would be 334 micron. That's only 80% of the fallback value of 406 in the
kernel. Multiplier 6 gives us 445micron (10% off). I'm assuming multiplier 7
doesn't exist because it's not a factor of 120.

Summary:

Best option may be to simply do what Windows is doing, all the HW manufacturers
have to use that approach after all. Switch `REL_WHEEL_HI_RES` to report in
fractions of 120, with 120 being one notch and divide that by the multiplier
for the actual events. So e.g. the Logitech multiplier 8 would send value 15
for each event in hi-res mode. This can be converted in userspace to
whatever userspace needs (combined with a hwdb there that tells you wheel
size/click angle/...).

Conflicts:
include/uapi/linux/input-event-codes.h -> I kept the new
         reserved event in the code, so I had to adapt the revert
         slightly

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:11 +0000 (16:27 +0100)]

Revert "HID: input: Create a utility class for counting scroll events"

This reverts commit 1ff2e1a44e02d4bdbb9be67c7d9acc240a67141f.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:10 +0000 (16:27 +0100)]

Revert "HID: logitech: Add function to enable HID++ 1.0 "scrolling acceleration""

This reverts commit 051dc9b0579602bd63e9df74d0879b5293e71581.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:09 +0000 (16:27 +0100)]

Revert "HID: logitech: Enable high-resolution scrolling on Logitech mice"

This reverts commit d56ca9855bf924f3bc9807a3e42f38539df3f41f.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:08 +0000 (16:27 +0100)]

Revert "HID: logitech: Use LDJ_DEVICE macro for existing Logitech mice"

This reverts commit 3fe1d6bbcd16f384d2c7dab2caf8e4b2df9ea7e6.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:07 +0000 (16:27 +0100)]

Revert "HID: logitech: fix a used uninitialized GCC warning"

This reverts commit 5fe2ccbef9d7aecf5c4402c753444f1a12096cfd.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Benjamin Tissoires [Wed, 21 Nov 2018 15:27:06 +0000 (16:27 +0100)]

Revert "HID: input: simplify/fix high-res scroll event handling"

This reverts commit 044ee890286153a1aefb40cb8b6659921aecb38b.

It turns out the current API is not that compatible with
some Microsoft mice, so better start again from scratch.

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Harry Cutts <hcutts@chromium.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>

commit | commitdiff | tree

Dave Airlie [Thu, 22 Nov 2018 01:19:20 +0000 (11:19 +1000)]

Merge branch 'drm-fixes-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

- OD fixes for powerplay
- Vega20 fixes
- KFD fix for Kaveri
- add missing firmware declaration for hainan (SI chip)
- Fix DC user experience regressions related to panels that support >8 bpc

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181121163647.2847-1-alexander.deucher@amd.com

commit | commitdiff | tree

Dave Airlie [Thu, 22 Nov 2018 01:17:46 +0000 (11:17 +1000)]

Merge tag 'drm-misc-fixes-2018-11-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

- vc4: Fix NULL deref in async path (Boris)
- vc4: Avoid taking async path for cursor updates when impossible (Boris)
- udmabuf: Fix mmap with PROT_WRITE (Gerd)
- fb-helper: Don't use writeback connectors for fbdev (Paul)

Cc: Boris Brezillon <boris.brezillon@bootlin.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20181121155248.GA241511@art_vandelay

commit | commitdiff | tree

Y.C. Chen [Tue, 30 Oct 2018 03:34:46 +0000 (11:34 +0800)]

drm/ast: fixed cursor may disappear sometimes

Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

Vincent Chen [Wed, 21 Nov 2018 01:38:11 +0000 (09:38 +0800)]

net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts

In the original ftmac100_interrupt(), the interrupts are only disabled when
the condition "netif_running(netdev)" is true. However, this condition
causes kerenl hang in the following case. When the user requests to
disable the network device, kernel will clear the bit __LINK_STATE_START
from the dev->state and then call the driver's ndo_stop function. Network
device interrupts are not blocked during this process. If an interrupt
occurs between clearing __LINK_STATE_START and stopping network device,
kernel cannot disable the interrupts due to the condition
"netif_running(netdev)" in the ISR. Hence, kernel will hang due to the
continuous interruption of the network device.

In order to solve the above problem, the interrupts of the network device
should always be disabled in the ISR without being restricted by the
condition "netif_running(netdev)".

[V2]
Remove unnecessary curly braces.

Signed-off-by: Vincent Chen <vincentc@andestech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Y.C. Chen [Wed, 3 Oct 2018 06:57:47 +0000 (14:57 +0800)]

drm/ast: change resolution may cause screen blurred

The value of pitches is not correct while calling mode_set.
The issue we found so far on following system:
- Debian8 with XFCE Desktop
- Ubuntu with KDE Desktop
- SUSE15 with KDE Desktop

Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com>
Cc: <stable@vger.kernel.org>
Tested-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

commit | commitdiff | tree

David S. Miller [Thu, 22 Nov 2018 00:14:56 +0000 (16:14 -0800)]

Merge branch 'smc-fixes'

Ursula Braun says:

====================
net/smc: fixes 2018-11-12

here is V4 of some net/smc fixes in different areas for the net tree.

v1->v2:
   do not define 8-byte alignment for union smcd_cdc_cursor in
   patch 4/5 "net/smc: atomic SMCD cursor handling"
v2->v3:
   stay with 8-byte alignment for union smcd_cdc_cursor in
   patch 4/5 "net/smc: atomic SMCD cursor handling", but get rid of
   __packed for struct smcd_cdc_msg
v3->v4:
   get rid of another __packed for struct smc_cdc_msg in
   patch 4/5 "net/smc: atomic SMCD cursor handling"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ursula Braun [Tue, 20 Nov 2018 15:46:43 +0000 (16:46 +0100)]

net/smc: use after free fix in smc_wr_tx_put_slot()

In smc_wr_tx_put_slot() field pend->idx is used after being
cleared. That means always idx 0 is cleared in the wr_tx_mask.
This results in a broken administration of available WR send
payload buffers.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ursula Braun [Tue, 20 Nov 2018 15:46:42 +0000 (16:46 +0100)]

net/smc: atomic SMCD cursor handling

Running uperf tests with SMCD on LPARs results in corrupted cursors.
SMCD cursors should be treated atomically to fix cursor corruption.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hans Wippel [Tue, 20 Nov 2018 15:46:41 +0000 (16:46 +0100)]

net/smc: add SMC-D shutdown signal

When a SMC-D link group is freed, a shutdown signal should be sent to
the peer to indicate that the link group is invalid. This patch adds the
shutdown signal to the SMC code.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Karsten Graul [Tue, 20 Nov 2018 15:46:40 +0000 (16:46 +0100)]

net/smc: use queue pair number when matching link group

When searching for an existing link group the queue pair number is also
to be taken into consideration. When the SMC server sends a new number
in a CLC packet (keeping all other values equal) then a new link group
is to be created on the SMC client side.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hans Wippel [Tue, 20 Nov 2018 15:46:39 +0000 (16:46 +0100)]

net/smc: abort CLC connection in smc_release

In case of a non-blocking SMC socket, the initial CLC handshake is
performed over a blocking TCP connection in a worker. If the SMC socket
is released, smc_release has to wait for the blocking CLC socket
operations (e.g., kernel_connect) inside the worker.

This patch aborts a CLC connection when the respective non-blocking SMC
socket is released to avoid waiting on socket operations or timeouts.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 21 Nov 2018 23:51:16 +0000 (15:51 -0800)]

Merge tag 'wireless-drivers-for-davem-2018-11-20' of git://git./linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.20

First set of fixes for 4.20, this time we have quite a few them but
all very small.

ath9k

* fix a locking regression found by a static checker

wlcore

* fix a crash which was a regression with wakeirq handling

brcm80211

* yet another fix for 160 MHz channel handling

mt76

* fix a longstaning build problem when CONFIG_LEDS_CLASS is disabled

* don't use uninitialised mutex

iwlwifi

* do note that the iwlwifi merge tag (commit 4ec321c14693) seems to
contain wrong list of changes so ignore that

* fix ACPI data handling, a memory leak and other smaller fixes

ath10k

* fix a crash during suspend which was a recent regression
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 20 Nov 2018 13:53:59 +0000 (05:53 -0800)]

tcp: defer SACK compression after DupThresh

Jean-Louis reported a TCP regression and bisected to recent SACK
compression.

After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).

Sender waits a full RTO timer before recovering losses.

While RFC 6675 says in section 5, "Algorithm Details",

   (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
       indicating at least three segments have arrived above the current
       cumulative acknowledgment point, which is taken to indicate loss
       -- go to step (4).
...
   (4) Invoke fast retransmit and enter loss recovery as follows:

there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.

While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.

This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.

Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.

Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.

Fixes: 5d9f4262b7ea ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Petr Machata [Tue, 20 Nov 2018 11:39:56 +0000 (11:39 +0000)]

net: skb_scrub_packet(): Scrub offload_fwd_mark

When a packet is trapped and the corresponding SKB marked as
already-forwarded, it retains this marking even after it is forwarded
across veth links into another bridge. There, since it ingresses the
bridge over veth, which doesn't have offload_fwd_mark, it triggers a
warning in nbp_switchdev_frame_mark().

Then nbp_switchdev_allowed_egress() decides not to allow egress from
this bridge through another veth, because the SKB is already marked, and
the mark (of 0) of course matches. Thus the packet is incorrectly
blocked.

Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That
function is called from tunnels and also from veth, and thus catches the
cases where traffic is forwarded between bridges and transformed in a
way that invalidates the marking.

Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices")
Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Linus Torvalds [Wed, 21 Nov 2018 19:28:20 +0000 (11:28 -0800)]

Merge tag 'riscv-for-linus-4.20-rc4' of git://git./linux/kernel/git/palmer/riscv-linux

Pull RISC-V fixes from Palmer Dabbelt:
"This week is a bit bigger than I expected. That's my fault, as I
  missed a few patches while I was at Plumbers last week. We have:

   - A fix to a quite embarassing issue where raw_copy_to_user() was
     implemented with asm_copy_from_user() (and vice versa).

   - Improvements to our makefile to allow flat binaries to be
     generated.

   - A build fix that predeclares "struct module" at the top of
     <asm/module.h>, which triggers warnings later in that header.

   - The addition of our own <uapi/asm/unistd> header, which is
     necessary to align our stat ABI on 32-bit systems.

   - A fix to avoid printing a warning when the S or U bits are set in
     print_isa().

  I already have one patch in the queue for next week"

* tag 'riscv-for-linus-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISC-V: recognize S/U mode bits in print_isa
  riscv: add asm/unistd.h UAPI header
  riscv: fix warning in arch/riscv/include/asm/module.h
  RISC-V: Build flat and compressed kernel images
  RISC-V: Fix raw_copy_{to,from}_user()

commit | commitdiff | tree

Dave Chinner [Wed, 21 Nov 2018 16:06:37 +0000 (08:06 -0800)]

iomap: readpages doesn't zero page tail beyond EOF

When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.

However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.

Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.

The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.

Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.

This patch results in correct EOF detection through readpages:

xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864

As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.

The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:

xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296

Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.

This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

commit | commitdiff | tree

Dave Chinner [Mon, 19 Nov 2018 21:31:12 +0000 (13:31 -0800)]

vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP

It returns EINVAL when the operation is not supported by the
filesystem. Fix it to return EOPNOTSUPP to be consistent with
the man page and clone_file_range().

Clean up the inconsistent error return handling while I'm there.
(I know, lipstick on a pig, but every little bit helps...)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

commit | commitdiff | tree

Dave Chinner [Mon, 19 Nov 2018 21:31:11 +0000 (13:31 -0800)]

iomap: dio data corruption and spurious errors when pipes fill

When doing direct IO to a pipe for do_splice_direct(), then pipe is
trivial to fill up and overflow as it can only hold 16 pages. At
this point bio_iov_iter_get_pages() then returns -EFAULT, and we
abort the IO submission process. Unfortunately, iomap_dio_rw()
propagates the error back up the stack.

The error is converted from the EFAULT to EAGAIN in
generic_file_splice_read() to tell the splice layers that the pipe
is full. do_splice_direct() completely fails to handle EAGAIN errors
(it aborts on error) and returns EAGAIN to the caller.

copy_file_write() then completely fails to handle EAGAIN as well,
and so returns EAGAIN to userspace, having failed to copy the data
it was asked to.

Avoid this whole steaming pile of fail by having iomap_dio_rw()
silently swallow EFAULT errors and so do short reads.

To make matters worse, iomap_dio_actor() has a stale data exposure
bug bio_iov_iter_get_pages() fails - it does not zero the tail block
that it may have been left uncovered by partial IO. Fix the error
handling case to drop to the sub-block zeroing rather than
immmediately returning the -EFAULT error.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

commit | commitdiff | tree

Dave Chinner [Mon, 19 Nov 2018 21:31:10 +0000 (13:31 -0800)]

iomap: sub-block dio needs to zeroout beyond EOF

If we are doing sub-block dio that extends EOF, we need to zero
the unused tail of the block to initialise the data in it it. If we
do not zero the tail of the block, then an immediate mmap read of
the EOF block will expose stale data beyond EOF to userspace. Found
with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.

Fix this by detecting if the end of the DIO write is beyond EOF
and zeroing the tail if necessary.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

commit | commitdiff | tree

Dave Chinner [Mon, 19 Nov 2018 21:31:10 +0000 (13:31 -0800)]

iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents

When we write into an unwritten extent via direct IO, we dirty
metadata on IO completion to convert the unwritten extent to
written. However, when we do the FUA optimisation checks, the inode
may be clean and so we issue a FUA write into the unwritten extent.
This means we then bypass the generic_write_sync() call after
unwritten extent conversion has ben done and we don't force the
modified metadata to stable storage.

This violates O_DSYNC semantics. The window of exposure is a single
IO, as the next DIO write will see the inode has dirty metadata and
hence will not use the FUA optimisation. Calling
generic_write_sync() after completion of the second IO will also
sync the first write and it's metadata.

Fix this by avoiding the FUA optimisation when writing to unwritten
extents.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

commit | commitdiff | tree

Dave Chinner [Tue, 20 Nov 2018 06:50:08 +0000 (22:50 -0800)]

xfs: delalloc -> unwritten COW fork allocation can go wrong

Long saga. There have been days spent following this through dead end
after dead end in multi-GB event traces. This morning, after writing
a trace-cmd wrapper that enabled me to be more selective about XFS
trace points, I discovered that I could get just enough essential
tracepoints enabled that there was a 50:50 chance the fsx config
would fail at ~115k ops. If it didn't fail at op 115547, I stopped
fsx at op 115548 anyway.

That gave me two traces - one where the problem manifested, and one
where it didn't. After refining the traces to have the necessary
information, I found that in the failing case there was a real
extent in the COW fork compared to an unwritten extent in the
working case.

Walking back through the two traces to the point where the CWO fork
extents actually diverged, I found that the bad case had an extra
unwritten extent in it. This is likely because the bug it led me to
had triggered multiple times in those 115k ops, leaving stray
COW extents around. What I saw was a COW delalloc conversion to an
unwritten extent (as they should always be through
xfs_iomap_write_allocate()) resulted in a /written extent/:

xfs_writepage:        dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0
xfs_iext_remove:      dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real
xfs_bmap_pre_update:  dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real
xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex

Basically, Cow fork before:

0 1            32          52
+H+DDDDDDDDDDDD+UUUUUUUUUUU+
   PREV RIGHT

COW delalloc conversion allocates:

  1        32
  +uuuuuuuuuuuu+
  NEW

And the result according to the xfs_bmap_post_update trace was:

0 1            32          52
+H+wwwwwwwwwwwwwwwwwwwwwwww+
   PREV

Which is clearly wrong - it should be a merged unwritten extent,
not an unwritten extent.

That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG
case in xfs_bmap_add_extent_delay_real(), and sure enough, there's
the bug.

It takes the old delalloc extent (PREV) and adds the length of the
RIGHT extent to it, takes the start block from NEW, removes the
RIGHT extent and then updates PREV with the new extent.

What it fails to do is update PREV.br_state. For delalloc, this is
always XFS_EXT_NORM, while in this case we are converting the
delayed allocation to unwritten, so it needs to be updated to
XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so
the resultant extent is always written.

And that's the bug I've been chasing for a week - a bmap btree bug,
not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced
with the recent in core extent tree scalability enhancements.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

John Crispins staging tree

RSS Atom