Will Deacon [Mon, 10 Dec 2018 18:57:17 +0000 (18:57 +0000)]
Merge branch 'for-next/kexec' into aarch64/for-next/core
Merge in kexec_file_load() support from Akashi Takahiro.
Will Deacon [Mon, 10 Dec 2018 18:53:03 +0000 (18:53 +0000)]
Merge branch 'kvm/cortex-a76-erratum-
1165522' into aarch64/for-next/core
Pull in KVM workaround for A76 erratum #116522.
Conflicts:
arch/arm64/include/asm/cpucaps.h
Suzuki K Poulose [Mon, 10 Dec 2018 18:07:33 +0000 (18:07 +0000)]
arm64: smp: Handle errors reported by the firmware
The __cpu_up() routine ignores the errors reported by the firmware
for a CPU bringup operation and looks for the error status set by the
booting CPU. If the CPU never entered the kernel, we could end up
in assuming stale error status, which otherwise would have been
set/cleared appropriately by the booting CPU.
Reported-by: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Mon, 10 Dec 2018 14:21:13 +0000 (14:21 +0000)]
arm64: smp: Rework early feature mismatched detection
Rather than add additional variables to detect specific early feature
mismatches with secondary CPUs, we can instead dedicate the upper bits
of the CPU boot status word to flag specific mismatches.
This allows us to communicate both granule and VA-size mismatches back
to the primary CPU without the need for additional book-keeping.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Mon, 10 Dec 2018 14:15:15 +0000 (14:15 +0000)]
arm64: Kconfig: Re-jig CONFIG options for 52-bit VA
Enabling 52-bit VAs for userspace is pretty confusing, since it requires
you to select "48-bit" virtual addressing in the Kconfig.
Rework the logic so that 52-bit user virtual addressing is advertised in
the "Virtual address space size" choice, along with some help text to
describe its interaction with Pointer Authentication. The EXPERT-only
option to force all user mappings to the 52-bit range is then made
available immediately below the VA size selection.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:42 +0000 (22:50 +0000)]
arm64: mm: Allow forcing all userspace addresses to 52-bit
On arm64 52-bit VAs are provided to userspace when a hint is supplied to
mmap. This helps maintain compatibility with software that expects at
most 48-bit VAs to be returned.
In order to help identify software that has 48-bit VA assumptions, this
patch allows one to compile a kernel where 52-bit VAs are returned by
default on HW that supports it.
This feature is intended to be for development systems only.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:41 +0000 (22:50 +0000)]
arm64: mm: introduce 52-bit userspace support
On arm64 there is optional support for a 52-bit virtual address space.
To exploit this one has to be running with a 64KB page size and be
running on hardware that supports this.
For an arm64 kernel supporting a 48 bit VA with a 64KB page size,
some changes are needed to support a 52-bit userspace:
* TCR_EL1.T0SZ needs to be 12 instead of 16,
* TASK_SIZE needs to reflect the new size.
This patch implements the above when the support for 52-bit VAs is
detected at early boot time.
On arm64 userspace addresses translation is controlled by TTBR0_EL1. As
well as userspace, TTBR0_EL1 controls:
* The identity mapping,
* EFI runtime code.
It is possible to run a kernel with an identity mapping that has a
larger VA size than userspace (and for this case __cpu_set_tcr_t0sz()
would set TCR_EL1.T0SZ as appropriate). However, when the conditions for
52-bit userspace are met; it is possible to keep TCR_EL1.T0SZ fixed at
12. Thus in this patch, the TCR_EL1.T0SZ size changing logic is
disabled.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:40 +0000 (22:50 +0000)]
arm64: mm: Prevent mismatched 52-bit VA support
For cases where there is a mismatch in ARMv8.2-LVA support between CPUs
we have to be careful in allowing secondary CPUs to boot if 52-bit
virtual addresses have already been enabled on the boot CPU.
This patch adds code to the secondary startup path. If the boot CPU has
enabled 52-bit VAs then ID_AA64MMFR2_EL1 is checked to see if the
secondary can also enable 52-bit support. If not, the secondary is
prevented from booting and an error message is displayed indicating why.
Technically this patch could be implemented using the cpufeature code
when considering 52-bit userspace support. However, we employ low level
checks here as the cpufeature code won't be able to run if we have
mismatched 52-bit kernel va support.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:39 +0000 (22:50 +0000)]
arm64: mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD
Enabling 52-bit VAs on arm64 requires that the PGD table expands from 64
entries (for the 48-bit case) to 1024 entries. This quantity,
PTRS_PER_PGD is used as follows to compute which PGD entry corresponds
to a given virtual address, addr:
pgd_index(addr) -> (addr >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)
Userspace addresses are prefixed by 0's, so for a 48-bit userspace
address, uva, the following is true:
(uva >> PGDIR_SHIFT) & (1024 - 1) == (uva >> PGDIR_SHIFT) & (64 - 1)
In other words, a 48-bit userspace address will have the same pgd_index
when using PTRS_PER_PGD = 64 and 1024.
Kernel addresses are prefixed by 1's so, given a 48-bit kernel address,
kva, we have the following inequality:
(kva >> PGDIR_SHIFT) & (1024 - 1) != (kva >> PGDIR_SHIFT) & (64 - 1)
In other words a 48-bit kernel virtual address will have a different
pgd_index when using PTRS_PER_PGD = 64 and 1024.
If, however, we note that:
kva = 0xFFFF << 48 + lower (where lower[63:48] == 0b)
and, PGDIR_SHIFT = 42 (as we are dealing with 64KB PAGE_SIZE)
We can consider:
(kva >> PGDIR_SHIFT) & (1024 - 1) - (kva >> PGDIR_SHIFT) & (64 - 1)
= (0xFFFF << 6) & 0x3FF - (0xFFFF << 6) & 0x3F // "lower" cancels out
= 0x3C0
In other words, one can switch PTRS_PER_PGD to the 52-bit value globally
provided that they increment ttbr1_el1 by 0x3C0 * 8 = 0x1E00 bytes when
running with 48-bit kernel VAs (TCR_EL1.T1SZ = 16).
For kernel configuration where 52-bit userspace VAs are possible, this
patch offsets ttbr1_el1 and sets PTRS_PER_PGD corresponding to the
52-bit value.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
[will: added comment to TTBR1_BADDR_4852_OFFSET calculation]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:38 +0000 (22:50 +0000)]
arm64: mm: Define arch_get_mmap_end, arch_get_mmap_base
Now that we have DEFAULT_MAP_WINDOW defined, we can arch_get_mmap_end
and arch_get_mmap_base helpers to allow for high addresses in mmap.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:37 +0000 (22:50 +0000)]
arm64: mm: Introduce DEFAULT_MAP_WINDOW
We wish to introduce a 52-bit virtual address space for userspace but
maintain compatibility with software that assumes the maximum VA space
size is 48 bit.
In order to achieve this, on 52-bit VA systems, we make mmap behave as
if it were running on a 48-bit VA system (unless userspace explicitly
requests a VA where addr[51:48] != 0).
On a system running a 52-bit userspace we need TASK_SIZE to represent
the 52-bit limit as it is used in various places to distinguish between
kernelspace and userspace addresses.
Thus we need a new limit for mmap, stack, ELF loader and EFI (which uses
TTBR0) to represent the non-extended VA space.
This patch introduces DEFAULT_MAP_WINDOW and DEFAULT_MAP_WINDOW_64 and
switches the appropriate logic to use that instead of TASK_SIZE.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Steve Capper [Thu, 6 Dec 2018 22:50:36 +0000 (22:50 +0000)]
mm: mmap: Allow for "high" userspace addresses
This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).
Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of arch_get_unmapped_* functions. However,
on arm64 we use the generic versions of these functions.
Rather than duplicate the generic arch_get_unmapped_* implementations
for arm64, this patch instead introduces two architectural helper macros
and applies them to arch_get_unmapped_*:
arch_get_mmap_end(addr) - get mmap upper limit depending on addr hint
arch_get_mmap_base(addr, base) - get mmap_base depending on addr hint
If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Qian Cai [Fri, 7 Dec 2018 22:34:49 +0000 (17:34 -0500)]
arm64: kasan: Increase stack size for KASAN_EXTRA
If the kernel is configured with KASAN_EXTRA, the stack size is
increased significantly due to setting the GCC -fstack-reuse option to
"none" [1]. As a result, it can trigger a stack overrun quite often with
32k stack size compiled using GCC 8. For example, this reproducer
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c
can trigger a "corrupted stack end detected inside scheduler" very
reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are other
reports at:
https://lore.kernel.org/lkml/
1542144497.12945.29.camel@gmx.us/
https://lore.kernel.org/lkml/
721E7B42-2D55-4866-9C1A-
3E8D64F33F9C@gmx.us/
There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks. Some noticiable ones
are,
size
7536 shrink_inactive_list
7440 shrink_page_list
6560 fscache_stats_show
3920 jbd2_journal_commit_transaction
3216 try_to_unmap_one
3072 migrate_page_move_mapping
3584 migrate_misplaced_transhuge_page
3920 ip_vs_lblcr_schedule
4304 lpfc_nvme_info_show
3888 lpfc_debugfs_nvmestat_data.constprop
There are other 49 functions over 2k in size while compiling kernel with
"-Wframe-larger-than=" on this machine. Hence, it is too much work to
change Makefiles for each object to compile without
-fsanitize-address-use-after-scope individually.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Mon, 10 Dec 2018 13:39:48 +0000 (13:39 +0000)]
arm64: Fix minor issues with the dcache_by_line_op macro
The dcache_by_line_op macro suffers from a couple of small problems:
First, the GAS directives that are currently being used rely on
assembler behavior that is not documented, and probably not guaranteed
to produce the correct behavior going forward. As a result, we end up
with some undefined symbols in cache.o:
$ nm arch/arm64/mm/cache.o
...
U civac
...
U cvac
U cvap
U cvau
This is due to the fact that the comparisons used to select the
operation type in the dcache_by_line_op macro are comparing symbols
not strings, and even though it seems that GAS is doing the right
thing here (undefined symbols by the same name are equal to each
other), it seems unwise to rely on this.
Second, when patching in a DC CVAP instruction on CPUs that support it,
the fallback path consists of a DC CVAU instruction which may be
affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE.
Solve these issues by unrolling the various maintenance routines and
using the conditional directives that are documented as operating on
strings. To avoid the complexity of nested alternatives, we move the
DC CVAP patching to __clean_dcache_area_pop, falling back to a branch
to __clean_dcache_area_poc if DCPOP is not supported by the CPU.
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:26 +0000 (17:31 +0000)]
arm64: Add configuration/documentation for Cortex-A76 erratum
1165522
Now that the infrastructure to handle erratum
1165522 is in place,
let's make it a selectable option and add the required documentation.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:25 +0000 (17:31 +0000)]
arm64: KVM: Handle ARM erratum
1165522 in TLB invalidation
In order to avoid TLB corruption whilst invalidating TLBs on CPUs
affected by erratum
1165522, we need to prevent S1 page tables
from being usable.
For this, we set the EL1 S1 MMU on, and also disable the page table
walker (by setting the TCR_EL1.EPD* bits to 1).
This ensures that once we switch to the EL1/EL0 translation regime,
speculated AT instructions won't be able to parse the page tables.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:24 +0000 (17:31 +0000)]
arm64: KVM: Add synchronization on translation regime change for erratum
1165522
In order to ensure that slipping HCR_EL2.TGE is done at the right
time when switching translation regime, let insert the required ISBs
that will be patched in when erratum
1165522 is detected.
Take this opportunity to add the missing include of asm/alternative.h
which was getting there by pure luck.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:23 +0000 (17:31 +0000)]
arm64: KVM: Force VHE for systems affected by erratum
1165522
In order to easily mitigate ARM erratum
1165522, we need to force
affected CPUs to run in VHE mode if using KVM.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:22 +0000 (17:31 +0000)]
arm64: Add TCR_EPD{0,1} definitions
We are soon going to play with TCR_EL1.EPD{0,1}, so let's add the
relevant definitions.
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:21 +0000 (17:31 +0000)]
arm64: KVM: Install stage-2 translation before enabling traps
It is a bit odd that we only install stage-2 translation after having
cleared HCR_EL2.TGE, which means that there is a window during which
AT requests could fail as stage-2 is not configured yet.
Let's move stage-2 configuration before we clear TGE, making the
guest entry sequence clearer: we first configure all the guest stuff,
then only switch to the guest translation regime.
While we're at it, do the same thing for !VHE. It doesn't hurt,
and keeps things symmetric.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:20 +0000 (17:31 +0000)]
KVM: arm64: Rework detection of SVE, !VHE systems
An SVE system is so far the only case where we mandate VHE. As we're
starting to grow this requirements, let's slightly rework the way we
deal with that situation, allowing for easy extension of this check.
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Marc Zyngier [Thu, 6 Dec 2018 17:31:19 +0000 (17:31 +0000)]
arm64: KVM: Make VHE Stage-2 TLB invalidation operations non-interruptible
Contrary to the non-VHE version of the TLB invalidation helpers, the VHE
code has interrupts enabled, meaning that we can take an interrupt in
the middle of such a sequence, and start running something else with
HCR_EL2.TGE cleared.
That's really not a good idea.
Take the heavy-handed option and disable interrupts in
__tlb_switch_to_guest_vhe, restoring them in __tlb_switch_to_host_vhe.
The latter also gain an ISB in order to make sure that TGE really has
taken effect.
Cc: stable@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:23 +0000 (18:08 +0000)]
arm64: remove arm64ksyms.c
Now that arm64ksyms.c has been reduced to a stub, let's remove it
entirely. New exports should be associated with their function
definition.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:22 +0000 (18:08 +0000)]
arm64: frace: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the ftrace exports
to the assembly files the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:21 +0000 (18:08 +0000)]
arm64: string: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the string routine
exports to the assembly files the functions are defined in. Routines
which should only be exported for !KASAN builds are exported using the
EXPORT_SYMBOL_NOKASAN() helper.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:20 +0000 (18:08 +0000)]
arm64: uaccess: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the uaccess exports
to the assembly files the functions are defined in. As we have to
include <asm/assembler.h>, the existing includes are fixed to follow the
usual ordering conventions.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:19 +0000 (18:08 +0000)]
arm64: page: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the copy_page and
clear_page exports to the assembly files the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:18 +0000 (18:08 +0000)]
arm64: smccc: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the SMCCC exports to
the assembly file the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:17 +0000 (18:08 +0000)]
arm64: tishift: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.
As a step towards removing arm64ksyms.c, let's move the tishift exports
to the assembly file the functions are defined in.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:16 +0000 (18:08 +0000)]
arm64: add EXPORT_SYMBOL_NOKASAN()
So that we can export symbols directly from assembly files, let's make
use of the generic <asm/export.h>. We have a few symbols that we'll want
to conditionally export for !KASAN kernel builds, so we add a helper for
that in <asm/assembler.h>.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:15 +0000 (18:08 +0000)]
arm64: move memstart_addr export inline
Since we define memstart_addr in a C file, we can have the export
immediately after the definition of the symbol, as we do elsewhere.
As a step towards removing arm64ksyms.c, move the export of
memstart_addr to init.c, where the symbol is defined.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Fri, 7 Dec 2018 18:08:14 +0000 (18:08 +0000)]
arm64: remove bitop exports
Now that the arm64 bitops are inlines built atop of the regular atomics,
we don't need to export anything.
Remove the redundant exports.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Tue, 18 Sep 2018 08:39:55 +0000 (09:39 +0100)]
arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraint
The "L" AArch64 machine constraint, which we use for the "old" value in
an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit
logical instruction. However, for cmpxchg() operations on types smaller
than 64 bits, this constraint can result in an invalid instruction which
is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff.
Whilst we could special-case the constraint based on the cmpxchg size,
it's far easier to change the constraint to "K" and put up with using
a register for large 64-bit immediates. For out-of-line LL/SC atomics,
this is all moot anyway.
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 13 Sep 2018 14:56:16 +0000 (15:56 +0100)]
arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics
Our percpu code is a bit of an inconsistent mess:
* It rolls its own xchg(), but reuses cmpxchg_local()
* It uses various different flavours of preempt_{enable,disable}()
* It returns values even for the non-returning RmW operations
* It makes no use of LSE atomics outside of the cmpxchg() ops
* There are individual macros for different sizes of access, but these
are all funneled through a switch statement rather than dispatched
directly to the relevant case
This patch rewrites the per-cpu operations to address these shortcomings.
Whilst the new code is a lot cleaner, the big advantage is that we can
use the non-returning ST- atomic instructions when we have LSE.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 13 Sep 2018 13:28:33 +0000 (14:28 +0100)]
arm64: Avoid masking "old" for LSE cmpxchg() implementation
The CAS instructions implicitly access only the relevant bits of the "old"
argument, so there is no need for explicit masking via type-casting as
there is in the LL/SC implementation.
Move the casting into the LL/SC code and remove it altogether for the LSE
implementation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 13 Sep 2018 12:30:45 +0000 (13:30 +0100)]
arm64: Avoid redundant type conversions in xchg() and cmpxchg()
Our atomic instructions (either LSE atomics of LDXR/STXR sequences)
natively support byte, half-word, word and double-word memory accesses
so there is no need to mask the data register prior to being stored.
Signed-off-by: Will Deacon <will.deacon@arm.com>
James Morse [Fri, 7 Dec 2018 10:14:39 +0000 (10:14 +0000)]
arm64: kexec_file: forbid kdump via kexec_file_load()
Now that kexec_walk_memblock() can do the crash-kernel placement itself
architectures that don't support kdump via kexe_file_load() need to
explicitly forbid it.
We don't support this on arm64 until the kernel can add the elfcorehdr
and usable-memory-range fields to the DT. Without these the crash-kernel
overwrites the previous kernel's memory during startup.
Add a check to refuse crash image loading.
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 20 Sep 2018 09:26:40 +0000 (10:26 +0100)]
arm64: preempt: Provide our own implementation of asm/preempt.h
The asm-generic/preempt.h implementation doesn't make use of the
PREEMPT_NEED_RESCHED flag, since this can interact badly with load/store
architectures which rely on the preempt_count word being unchanged across
an interrupt.
However, since we're a 64-bit architecture and the preempt count is
only 32 bits wide, we can simply pack it next to the resched flag and
load the whole thing in one go, so that a dec-and-test operation doesn't
need to load twice.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Wed, 19 Sep 2018 12:39:26 +0000 (13:39 +0100)]
preempt: Move PREEMPT_NEED_RESCHED definition into arch code
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch
code where it can potentially be implemented using either a different
bit in the preempt count or as an entirely separate entity.
Cc: Robert Love <rml@tech9.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Allen Pais [Tue, 23 Oct 2018 01:06:57 +0000 (06:36 +0530)]
arm64: hugetlb: Register hugepages during arch init
Add hstate for each supported hugepage size using arch initcall.
* no hugepage parameters
Without hugepage parameters, only a default hugepage size is
available for dynamic allocation. It's different, for example, from
x86_64 and sparc64 where all supported hugepage sizes are available.
* only default_hugepagesz= is specified and set not to HPAGE_SIZE
In spite of the fact that default_hugepagesz= is set to a valid
hugepage size, it's treated as unsupported and reverted to
HPAGE_SIZE. Such behaviour is also different from x86_64 and
sparc64.
Acked-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Dmitry Klochkov <dmitry.klochkov@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jackie Liu [Tue, 4 Dec 2018 01:43:23 +0000 (09:43 +0800)]
arm64: crypto: add NEON accelerated XOR implementation
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:
[ 93.837726] xor: measuring software checksum speed
[ 93.874039] 8regs : 7123.200 MB/sec
[ 93.914038] 32regs : 7180.300 MB/sec
[ 93.954043] arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)
I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jackie Liu [Tue, 4 Dec 2018 01:43:22 +0000 (09:43 +0800)]
arm64/neon: add workaround for ambiguous C99 stdint.h types
In a way similar to ARM commit
09096f6a0ee2 ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.
This patch comes from: https://patchwork.kernel.org/patch/
3540001/
Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
We mark this file as a private file and don't have to override asm/types.h
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Tue, 19 Jun 2018 13:08:24 +0000 (14:08 +0100)]
arm64: entry: Remove confusing comment
The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being
context-synchronizing is confusing and misplaced with kpti. Given that
this is already documented under Documentation/ (see arch-support.txt
for membarrier), remove the comment altogether.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 14 Jun 2018 10:23:38 +0000 (11:23 +0100)]
arm64: entry: Place an SB sequence following an ERET instruction
Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.
This patch emits an SB sequence after each ERET so that speculation is
held up on exception return.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 14 Jun 2018 10:21:34 +0000 (11:21 +0100)]
arm64: Add support for SB barrier and patch in over DSB; ISB sequences
We currently use a DSB; ISB sequence to inhibit speculation in set_fs().
Whilst this works for current CPUs, future CPUs may implement a new SB
barrier instruction which acts as an architected speculation barrier.
On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB
sequence and advertise the presence of the new instruction to userspace.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 6 Dec 2018 15:02:45 +0000 (15:02 +0000)]
arm64: kexec_file: Refactor setup_dtb() to consolidate error checking
setup_dtb() is a little difficult to read. This is largely because it
duplicates the FDT -> Linux errno conversion for every intermediate
return value, but also because of silly cosmetic things like naming
and formatting.
Given that this is all brand new, refactor the function to get us off on
the right foot.
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:55 +0000 (14:52 +0900)]
arm64: kexec_file: add kaslr support
Adding "kaslr-seed" to dtb enables triggering kaslr, or kernel virtual
address randomization, at secondary kernel boot. We always do this as
it will have no harm on kaslr-incapable kernel.
We don't have any "switch" to turn off this feature directly, but still
can suppress it by passing "nokaslr" as a kernel boot argument.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: Use rng_is_initialized()]
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:54 +0000 (14:52 +0900)]
arm64: kexec_file: add kernel signature verification support
With this patch, kernel verification can be done without IMA security
subsystem enabled. Turn on CONFIG_KEXEC_VERIFY_SIG instead.
On x86, a signature is embedded into a PE file (Microsoft's format) header
of binary. Since arm64's "Image" can also be seen as a PE file as far as
CONFIG_EFI is enabled, we adopt this format for kernel signing.
You can create a signed kernel image with:
$ sbsign --key ${KEY} --cert ${CERT} Image
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
[will: removed useless pr_debug()]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:06 +0000 (17:18 +0000)]
arm64: capabilities: Batch cpu_enable callbacks
We use a stop_machine call for each available capability to
enable it on all the CPUs available at boot time. Instead
we could batch the cpu_enable callbacks to a single stop_machine()
call to save us some time.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:05 +0000 (17:18 +0000)]
arm64: capabilities: Use linear array for detection and verification
Use the sorted list of capability entries for the detection and
verification.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:04 +0000 (17:18 +0000)]
arm64: capabilities: Optimize this_cpu_has_cap
Make use of the sorted capability list to access the capability
entry in this_cpu_has_cap() to avoid iterating over the two
tables.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:03 +0000 (17:18 +0000)]
arm64: capabilities: Speed up capability lookup
We maintain two separate tables of capabilities, errata and features,
which decide the system capabilities. We iterate over each of these
tables for various operations (e.g, detection, verification etc.).
We do not have a way to map a system "capability" to its entry,
(i.e, cap -> struct arm64_cpu_capabilities) which is needed for
this_cpu_has_cap(). So we iterate over the table one by one to
find the entry and then do the operation. Also, this prevents
us from optimizing the way we "enable" the capabilities on the
CPUs, where we now issue a stop_machine() for each available
capability.
One solution is to merge the two tables into a single table,
sorted by the capability. But this is has the following
disadvantages:
- We loose the "classification" of an errata vs. feature
- It is quite easy to make a mistake when adding an entry,
unless we sort the table at runtime.
So we maintain a list of pointers to the capability entry, sorted
by the "cap number" in a separate array, initialized at boot time.
The only restriction is that we can have one "entry" per capability.
While at it, remove the duplicate declaration of arm64_errata table.
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:53 +0000 (14:52 +0900)]
include: pe.h: remove message[] from mz header definition
message[] field won't be part of the definition of mz header.
This change is crucial for enabling kexec_file_load on arm64 because
arm64's "Image" binary, as in PE format, doesn't have any data for it and
accordingly the following check in pefile_parse_binary() will fail:
chkaddr(cursor, mz->peaddr, sizeof(*pe));
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:52 +0000 (14:52 +0900)]
arm64: kexec_file: invoke the kernel without purgatory
On arm64, purgatory would do almost nothing. So just invoke secondary
kernel directly by jumping into its entry code.
While, in this case, cpu_soft_restart() must be called with dtb address
in the fifth argument, the behavior still stays compatible with kexec_load
case as long as the argument is null.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:50 +0000 (14:52 +0900)]
arm64: kexec_file: allow for loading Image-format kernel
This patch provides kexec_file_ops for "Image"-format kernel. In this
implementation, a binary is always loaded with a fixed offset identified
in text_offset field of its header.
Regarding signature verification for trusted boot, this patch doesn't
contains CONFIG_KEXEC_VERIFY_SIG support, which is to be added later
in this series, but file-attribute-based verification is still a viable
option by enabling IMA security subsystem.
You can sign(label) a to-be-kexec'ed kernel image on target file system
with:
$ evmctl ima_sign --key /path/to/private_key.pem Image
On live system, you must have IMA enforced with, at least, the following
security policy:
"appraise func=KEXEC_KERNEL_CHECK appraise_type=imasig"
See more details about IMA here:
https://sourceforge.net/p/linux-ima/wiki/Home/
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:49 +0000 (14:52 +0900)]
arm64: kexec_file: load initrd and device-tree
load_other_segments() is expected to allocate and place all the necessary
memory segments other than kernel, including initrd and device-tree
blob (and elf core header for crash).
While most of the code was borrowed from kexec-tools' counterpart,
users may not be allowed to specify dtb explicitly, instead, the dtb
presented by the original boot loader is reused.
arch_kimage_kernel_post_load_cleanup() is responsible for freeing arm64-
specific data allocated in load_other_segments().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:48 +0000 (14:52 +0900)]
arm64: enable KEXEC_FILE config
Modify arm64/Kconfig to enable kexec_file_load support.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:47 +0000 (14:52 +0900)]
arm64: cpufeature: add MMFR0 helper functions
Those helper functions for MMFR0 register will be used later by kexec_file
loader.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:46 +0000 (14:52 +0900)]
arm64: add image head flag definitions
Those image head's flags will be used later by kexec_file loader.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:44 +0000 (14:52 +0900)]
kexec_file: kexec_walk_memblock() only walks a dedicated region at kdump
In kdump case, there exists only one dedicated memblock region as usable
memory (crashk_res). With this patch, kexec_walk_memblock() runs a given
callback function on this region.
Cosmetic change: 0 to MEMBLOCK_NONE at for_each_free_mem_range*()
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:43 +0000 (14:52 +0900)]
powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()
Memblock list is another source for usable system memory layout.
So move powerpc's arch_kexec_walk_mem() to common code so that other
memblock-based architectures, particularly arm64, can also utilise it.
A moved function is now renamed to kexec_walk_memblock() and integrated
into kexec_locate_mem_hole(), which will now be usable for all
architectures with no need for overriding arch_kexec_walk_mem().
With this change, arch_kexec_walk_mem() need no longer be a weak function,
and was now renamed to kexec_walk_resources().
Since powerpc doesn't support kdump in its kexec_file_load(), the current
kexec_walk_memblock() won't work for kdump either in this form, this will
be fixed in the next patch.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:42 +0000 (14:52 +0900)]
s390, kexec_file: drop arch_kexec_mem_walk()
Since s390 already knows where to locate buffers, calling
arch_kexec_mem_walk() has no sense. So we can just drop it as kbuf->mem
indicates this while all other architectures sets it to 0 initially.
This change is a preparatory work for the next patch, where all the
variant memory walks, either on system resource or memblock, will be
put in one common place so that it will satisfy all the architectures'
need.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:41 +0000 (14:52 +0900)]
kexec_file: make kexec_image_post_load_cleanup_default() global
Change this function from static to global so that arm64 can implement
its own arch_kimage_file_post_load_cleanup() later using
kexec_image_post_load_cleanup_default().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AKASHI Takahiro [Thu, 15 Nov 2018 05:52:40 +0000 (14:52 +0900)]
asm-generic: add kexec_file_load system call to unistd.h
The initial user of this system call number is arm64.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:02 +0000 (17:18 +0000)]
arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
Remove duplicate entries for Qualcomm erratum 1003. Since the entries
are not purely based on generic MIDR checks, use the multi_cap_entry
type to merge the entries.
Cc: Christopher Covington <cov@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:01 +0000 (17:18 +0000)]
arm64: capabilities: Merge duplicate Cavium erratum entries
Merge duplicate entries for a single capability using the midr
range list for Cavium errata 30115 and 27456.
Cc: Andrew Pinski <apinski@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Suzuki K Poulose [Fri, 30 Nov 2018 17:18:00 +0000 (17:18 +0000)]
arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :
1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
2) ARM Errata 819472 on A53 r0p[01]
Both have the same work around. Merge these entries to avoid
duplicate entries for a single capability. Add a new Kconfig
entry to control the "capability" entry to make it easier
to handle combinations of the CONFIGs.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Mon, 3 Dec 2018 19:58:05 +0000 (20:58 +0100)]
arm64: relocatable: fix inconsistencies in linker script and options
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):
readelf: Warning: [21]: Link field (0) should index a symtab section.
readelf: Warning: [21]: Info field (0) should index a relocatable section.
Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.
- Drop the -pie linker argument and just leave -shared. In ld.bfd,
the differences between them are unclear (except for the ELF type
of the produced image [0]) but lld chokes on seeing both at the
same time.
- Rename the .rela output section to .rela.dyn, as is customary for
shared libraries and PIE executables, so that it is not misidentified
by readelf as a static relocation section (producing the warnings
above).
- Pass the -z notext and -z norelro options to explicitly instruct the
linker to permit text relocations, and to omit the RELRO program
header (which requires a certain section layout that we don't adhere
to in the kernel). These are the defaults for current versions of
ld.bfd.
- Discard .eh_frame and .gnu.hash sections to avoid them from being
emitted between .head.text and .text, screwing up the section layout.
These changes only affect the ELF image, and produce the same binary
image.
[0]
b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...")
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Tue, 27 Nov 2018 17:42:55 +0000 (18:42 +0100)]
arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.
Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.
Tested using the following test program:
#include <stdlib.h>
extern void crc32_le(unsigned short, char const*, int);
int main(void)
{
static const char buf[4096];
srand(
20181126);
for (int i = 0; i < 100 * 1000 * 1000; i++)
crc32_le(0, buf, rand() % 1024);
return 0;
}
On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from
$ time ./crc32
real 0m10.149s
user 0m10.149s
sys 0m0.000s
to
$ time ./crc32
real 0m7.915s
user 0m7.915s
sys 0m0.000s
Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:42:03 +0000 (22:42 +0000)]
arm64: ftrace: always pass instrumented pc in x0
The core ftrace hooks take the instrumented PC in x0, but for some
reason arm64's prepare_ftrace_return() takes this in x1.
For consistency, let's flip the argument order and always pass the
instrumented PC in x0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:42:02 +0000 (22:42 +0000)]
arm64: ftrace: remove return_regs macros
The save_return_regs and restore_return_regs macros are only used by
return_to_handler, and having them defined out-of-line only serves to
obscure the logic.
Before we complicate, let's clean this up and fold the logic directly
into return_to_handler, saving a few lines of macro boilerplate in the
process. At the same time, a missing trailing space is added to the
comments, fixing a code style violation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:42:01 +0000 (22:42 +0000)]
arm64: ftrace: don't adjust the LR value
The core ftrace code requires that when it is handed the PC of an
instrumented function, this PC is the address of the instrumented
instruction. This is necessary so that the core ftrace code can identify
the specific instrumentation site. Since the instrumented function will
be a BL, the address of the instrumented function is LR - 4 at entry to
the ftrace code.
This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers,
which acquire the PC of the instrumented function.
The mcount_get_lr helper is used to acquire the LR of the instrumented
function, whose value does not require this adjustment, and cannot be
adjusted to anything meaningful. No adjustment of this value is made on
other architectures, including arm. However, arm64 adjusts this value by
4.
This patch brings arm64 in line with other architectures and removes the
adjustment of the LR value.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:42:00 +0000 (22:42 +0000)]
arm64: ftrace: enable graph FP test
The core frace code has an optional sanity check on the frame pointer
passed by ftrace_graph_caller and return_to_handler. This is cheap,
useful, and enabled unconditionally on x86, sparc, and riscv.
Let's do the same on arm64, so that we can catch any problems early.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:41:59 +0000 (22:41 +0000)]
arm64: ftrace: use GLOBAL()
The global exports of ftrace_call and ftrace_graph_call are somewhat
painful to read. Let's use the generic GLOBAL() macro to ameliorate
matters.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Thu, 15 Nov 2018 22:41:58 +0000 (22:41 +0000)]
linkage: add generic GLOBAL() macro
Declaring a global symbol in assembly is tedious, error-prone, and
painful to read. While ENTRY() exists, this is supposed to be used for
function entry points, and this affects alignment in a potentially
undesireable manner.
Instead, let's add a generic GLOBAL() macro for this, as x86 added
locally in commit:
95695547a7db44b8 ("x86: asm linkage - introduce GLOBAL macro")
... thus allowing us to use this more freely in the kernel.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Fri, 30 Nov 2018 11:35:58 +0000 (12:35 +0100)]
arm64: drop linker script hack to hide __efistub_ symbols
Commit
1212f7a16af4 ("scripts/kallsyms: filter arm64's __efistub_
symbols") updated the kallsyms code to filter out symbols with
the __efistub_ prefix explicitly, so we no longer require the
hack in our linker script to emit them as absolute symbols.
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 29 Nov 2018 16:31:04 +0000 (16:31 +0000)]
arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
As of commit
6460d3201471 ("arm64: io: Ensure calls to delay routines
are ordered against prior readX()"), MMIO reads smaller than 64 bits
fail to compile under clang because we end up mixing 32-bit and 64-bit
register operands for the same data processing instruction:
./include/asm-generic/io.h:695:9: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
return readb(addr);
^
./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb'
^
./include/asm-generic/io.h:695:9: note: use constraint modifier "w"
./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb'
^
./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb'
asm volatile("eor %0, %1, %1\n" \
^
Fix the build by casting the macro argument to 'unsigned long' when used
as an input to the inline asm.
Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Mon, 19 Nov 2018 18:08:49 +0000 (18:08 +0000)]
arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE
In order to reduce the possibility of soft lock-ups, we bound the
maximum number of TLBI operations performed by a single call to
flush_tlb_range() to an arbitrary constant of 1024.
Whilst this does the job of avoiding lock-ups, we can actually be a bit
smarter by defining this as PTRS_PER_PTE. Due to the structure of our
page tables, using PTRS_PER_PTE means that an outer loop calling
flush_tlb_range() for entire table entries will end up performing just a
single TLBI operation for each entry. As an example, mremap()ing a 1GB
range mapped using 4k pages now requires only 512 TLBI operations when
moving the page tables as opposed to 262144 operations (512*512) when
using the current threshold of 1024.
Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Thu, 22 Nov 2018 08:46:46 +0000 (09:46 +0100)]
arm64/module: switch to ADRP/ADD sequences for PLT entries
Now that we have switched to the small code model entirely, and
reduced the extended KASLR range to 4 GB, we can be sure that the
targets of relative branches that are out of range are in range
for a ADRP/ADD pair, which is one instruction shorter than our
current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it
is more likely to be implemented efficiently by micro-architectures.
So switch over the ordinary PLT code and the special handling of
the Cortex-A53 ADRP errata, as well as the ftrace trampline
handling.
Reviewed-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Added a couple of comments in the plt equality check]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Thu, 22 Nov 2018 08:46:45 +0000 (09:46 +0100)]
arm64/insn: add support for emitting ADR/ADRP instructions
Add support for emitting ADR and ADRP instructions so we can switch
over our PLT generation code in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
James Morse [Tue, 27 Nov 2018 15:35:21 +0000 (15:35 +0000)]
arm64: Use a raw spinlock in __install_bp_hardening_cb()
__install_bp_hardening_cb() is called via stop_machine() as part
of the cpu_enable callback. To force each CPU to take its turn
when allocating slots, they take a spinlock.
With the RT patches applied, the spinlock becomes a mutex,
and we get warnings about sleeping while in stop_machine():
| [ 0.319176] CPU features: detected: RAS Extension Support
| [ 0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002
| [ 0.319955] Modules linked in:
| [ 0.319958] Preemption disabled at:
| [ 0.319969] [<
ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108
| [ 0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted
4.19.1-rt3-00250-g330fc2c2a880 #2
| [ 0.319975] Hardware name: linux,dummy-virt (DT)
| [ 0.319976] Call trace:
| [ 0.319981] dump_backtrace+0x0/0x148
| [ 0.319983] show_stack+0x14/0x20
| [ 0.319987] dump_stack+0x80/0xa4
| [ 0.319989] __schedule_bug+0x94/0xb0
| [ 0.319991] __schedule+0x510/0x560
| [ 0.319992] schedule+0x38/0xe8
| [ 0.319994] rt_spin_lock_slowlock_locked+0xf0/0x278
| [ 0.319996] rt_spin_lock_slowlock+0x5c/0x90
| [ 0.319998] rt_spin_lock+0x54/0x58
| [ 0.320000] enable_smccc_arch_workaround_1+0xdc/0x260
| [ 0.320001] __enable_cpu_capability+0x10/0x20
| [ 0.320003] multi_cpu_stop+0x84/0x108
| [ 0.320004] cpu_stopper_thread+0x84/0x108
| [ 0.320008] smpboot_thread_fn+0x1e8/0x2b0
| [ 0.320009] kthread+0x124/0x128
| [ 0.320010] ret_from_fork+0x10/0x18
Switch this to a raw spinlock, as we know this is only called with
IRQs masked.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jeremy Linton [Tue, 27 Nov 2018 17:59:12 +0000 (17:59 +0000)]
arm64: acpi: Prepare for longer MADTs
The BAD_MADT_GICC_ENTRY check is a little too strict because
it rejects MADT entries that don't match the currently known
lengths. We should remove this restriction to avoid problems
if the table length changes. Future code which might depend on
additional fields should be written to validate those fields
before using them, rather than trying to globally check
known MADT version lengths.
Link: https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
[lorenzo.pieralisi@arm.com: added MADT macro comments]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Stone <ahs3@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Wed, 7 Nov 2018 23:06:15 +0000 (23:06 +0000)]
arm64: io: Ensure calls to delay routines are ordered against prior readX()
A relatively standard idiom for ensuring that a pair of MMIO writes to a
device arrive at that device with a specified minimum delay between them
is as follows:
writel_relaxed(42, dev_base + CTL1);
readl(dev_base + CTL1);
udelay(10);
writel_relaxed(42, dev_base + CTL2);
the intention being that the read-back from the device will push the
prior write to CTL1, and the udelay will hold up the write to CTL1 until
at least 10us have elapsed.
Unfortunately, on arm64 where the underlying delay loop is implemented
as a read of the architected counter, the CPU does not guarantee
ordering from the readl() to the delay loop and therefore the delay loop
could in theory be speculated and not provide the desired interval
between the two writes.
Fix this in a similar manner to PowerPC by introducing a dummy control
dependency on the output of readX() which, combined with the ISB in the
read of the architected counter, guarantees that a subsequent delay loop
can not be executed until the readX() has returned its result.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Alex Van Brunt [Mon, 29 Oct 2018 09:25:58 +0000 (14:55 +0530)]
arm64: mm: Don't wait for completion of TLB invalidation when page aging
When transitioning a PTE from young to old as part of page aging, we
can avoid waiting for the TLB invalidation to complete and therefore
drop the subsequent DSB instruction. Whilst this opens up a race with
page reclaim, where a PTE in active use via a stale, young TLB entry
does not update the underlying descriptor, the worst thing that happens
is that the page is reclaimed and then immediately faulted back in.
Given that we have a DSB in our context-switch path, the window for a
spurious reclaim is fairly limited and eliding the barrier claims to
boost NVMe/SSD accesses by over 10% on some platforms.
A similar optimisation was made for x86 in commit
b13b1d2d8692 ("x86/mm:
In the PTE swapout page reclaim case clear the accessed bit instead of
flushing the TLB").
Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
[will: rewrote patch]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Jessica Yu [Mon, 5 Nov 2018 18:53:23 +0000 (19:53 +0100)]
arm64/module: use plt section indices for relocations
Instead of saving a pointer to the .plt and .init.plt sections to apply
plt-based relocations, save and use their section indices instead.
The mod->arch.{core,init}.plt pointers were problematic for livepatch
because they pointed within temporary section headers (provided by the
module loader via info->sechdrs) that would be freed after module load.
Since livepatch modules may need to apply relocations post-module-load
(for example, to patch a module that is loaded later), using section
indices to offset into the section headers (instead of accessing them
through a saved pointer) allows livepatch modules on arm64 to pass in
their own copy of the section headers to apply_relocate_add() to apply
delayed relocations.
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Wed, 7 Nov 2018 10:36:20 +0000 (11:36 +0100)]
arm64: mm: apply r/o permissions of VM areas to its linear alias as well
On arm64, we use block mappings and contiguous hints to map the linear
region, to minimize the TLB footprint. However, this means that the
entire region is mapped using read/write permissions, which we cannot
modify at page granularity without having to take intrusive measures to
prevent TLB conflicts.
This means the linear aliases of pages belonging to read-only mappings
(executable or otherwise) in the vmalloc region are also mapped read/write,
and could potentially be abused to modify things like module code, bpf JIT
code or other read-only data.
So let's fix this, by extending the set_memory_ro/rw routines to take
the linear alias into account. The consequence of enabling this is
that we can no longer use block mappings or contiguous hints, so in
cases where the TLB footprint of the linear region is a bottleneck,
performance may be affected.
Therefore, allow this feature to be runtime en/disabled, by setting
rodata=full (or 'on' to disable just this enhancement, or 'off' to
disable read-only mappings for code and r/o data entirely) on the
kernel command line. Also, allow the default value to be set via a
Kconfig option.
Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Wed, 7 Nov 2018 10:36:19 +0000 (11:36 +0100)]
arm64: mm: purge lazily unmapped vm regions before changing permissions
Call vm_unmap_aliases() every time we apply any changes to permission
attributes of mappings in the vmalloc region. This avoids any potential
issues resulting from lingering writable or executable aliases of
mappings that should be read-only or non-executable, respectively.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Sun, 18 Nov 2018 21:33:44 +0000 (13:33 -0800)]
Linux 4.20-rc3
Linus Torvalds [Sun, 18 Nov 2018 20:21:09 +0000 (12:21 -0800)]
Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A small batch of fixes for v4.20-rc3.
The overflow continuation fix addresses something that has been broken
for several releases. Arguably it could wait even longer, but it's a
one line fix and this finishes the last of the known address range
scrub bug reports. The revert addresses a lockdep regression. The unit
tests are not critical to fix, but no reason to hold this fix back.
Summary:
- Address Range Scrub overflow continuation handling has been broken
since it was initially merged. It was only recently that error
injection and platform-BIOS support enabled this corner case to be
exercised.
- The recent attempt to provide more isolation for the kernel Address
Range Scrub state machine from userapace initiated sessions
triggers a lockdep report. Revert and try again at the next merge
window.
- Fix a kasan reported buffer overflow in libnvdimm unit test
infrastrucutre (nfit_test)"
* tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
Revert "acpi, nfit: Further restrict userspace ARS start requests"
acpi, nfit: Fix ARS overflow continuation
tools/testing/nvdimm: Fix the array size for dimm devices.
Linus Torvalds [Sun, 18 Nov 2018 19:31:26 +0000 (11:31 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
mm, page_alloc: check for max order in hot path
scripts/spdxcheck.py: make python3 compliant
tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
mm/vmstat.c: fix NUMA statistics updates
mm/gup.c: fix follow_page_mask() kerneldoc comment
ocfs2: free up write context when direct IO failed
scripts/faddr2line: fix location of start_kernel in comment
mm: don't reclaim inodes with many attached pages
mm, memory_hotplug: check zone_movable in has_unmovable_pages
mm/swapfile.c: use kvzalloc for swap_info_struct allocation
MAINTAINERS: update OMAP MMC entry
hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
kernel/sched/psi.c: simplify cgroup_move_task()
z3fold: fix possible reclaim races
Linus Torvalds [Sun, 18 Nov 2018 18:58:20 +0000 (10:58 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix an exec() related scalability/performance regression, which was
caused by incorrectly calculating load and migrating tasks on exec()
when they shouldn't be"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix cpu_util_wake() for 'execl' type workloads
Linus Torvalds [Sun, 18 Nov 2018 18:54:59 +0000 (10:54 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Fix uncore PMU enumeration for CofeeLake CPUs"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Support CoffeeLake 8th CBOX
perf/x86/intel/uncore: Add more IMC PCI IDs for KabyLake and CoffeeLake CPUs
Linus Torvalds [Sun, 18 Nov 2018 18:52:26 +0000 (10:52 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Misc fixes: two warning splat fixes, a leak fix and persistent memory
allocation fixes for ARM"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Permit calling efi_mem_reserve_persistent() from atomic context
efi/arm: Defer persistent reservations until after paging_init()
efi/arm/libstub: Pack FDT after populating it
efi/arm: Revert deferred unmap of early memmap mapping
efi: Fix debugobjects warning on 'efi_rts_work'
Linus Torvalds [Sun, 18 Nov 2018 18:45:09 +0000 (10:45 -0800)]
Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM spectre updates from Russell King:
"These are the currently known final bits that resolve the Spectre
issues. big.Little systems used to be sufficiently identical in that
there were no differences between individual CPUs in the system that
mattered to the kernel. With the advent of the Spectre problem, the
CPUs now have differences in how the workaround is applied.
As a result of previous Spectre patches, these systems ended up
reporting quite a lot of:
"CPUx: Spectre v2: incorrect context switching function, system vulnerable"
messages due to the action of the big.Little switcher causing the CPUs
to be re-initialised regularly. This series resolves that issue by
making the CPU vtable unique to each CPU.
However, since this is used very early, before per-cpu is setup,
per-cpu can't be used. We also have a problem that two of the methods
are not called from preempt-safe paths, but thankfully these remain
identical between all CPUs in the system. To make sure, we validate
that these are identical during boot"
* 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: spectre-v2: per-CPU vtables to work around big.Little systems
ARM: add PROC_VTABLE and PROC_TABLE macros
ARM: clean up per-processor check_bugs method call
ARM: split out processor lookup
ARM: make lookup_processor_type() non-__init
Chen Chang [Fri, 16 Nov 2018 23:08:57 +0000 (15:08 -0800)]
mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
Link: http://lkml.kernel.org/r/20181107100247.13359-1-rainccrun@gmail.com
Signed-off-by: Chen Chang <rainccrun@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 16 Nov 2018 23:08:53 +0000 (15:08 -0800)]
mm, page_alloc: check for max order in hot path
Konstantin has noticed that kvmalloc might trigger the following
warning:
WARNING: CPU: 0 PID: 6676 at mm/vmstat.c:986 __fragmentation_index+0x54/0x60
[...]
Call Trace:
fragmentation_index+0x76/0x90
compaction_suitable+0x4f/0xf0
shrink_node+0x295/0x310
node_reclaim+0x205/0x250
get_page_from_freelist+0x649/0xad0
__alloc_pages_nodemask+0x12a/0x2a0
kmalloc_large_node+0x47/0x90
__kmalloc_node+0x22b/0x2e0
kvmalloc_node+0x3e/0x70
xt_alloc_table_info+0x3a/0x80 [x_tables]
do_ip6t_set_ctl+0xcd/0x1c0 [ip6_tables]
nf_setsockopt+0x44/0x60
SyS_setsockopt+0x6f/0xc0
do_syscall_64+0x67/0x120
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
the problem is that we only check for an out of bound order in the slow
path and the node reclaim might happen from the fast path already. This
is fixable by making sure that kvmalloc doesn't ever use kmalloc for
requests that are larger than KMALLOC_MAX_SIZE but this also shows that
the code is rather fragile. A recent UBSAN report just underlines that
by the following report
UBSAN: Undefined behaviour in mm/page_alloc.c:3117:19
shift exponent 51 is too large for 32-bit type 'int'
CPU: 0 PID: 6520 Comm: syz-executor1 Not tainted 4.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xd2/0x148 lib/dump_stack.c:113
ubsan_epilogue+0x12/0x94 lib/ubsan.c:159
__ubsan_handle_shift_out_of_bounds+0x2b6/0x30b lib/ubsan.c:425
__zone_watermark_ok+0x2c7/0x400 mm/page_alloc.c:3117
zone_watermark_fast mm/page_alloc.c:3216 [inline]
get_page_from_freelist+0xc49/0x44c0 mm/page_alloc.c:3300
__alloc_pages_nodemask+0x21e/0x640 mm/page_alloc.c:4370
alloc_pages_current+0xcc/0x210 mm/mempolicy.c:2093
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0x12/0x60 mm/page_alloc.c:4414
dma_mem_alloc+0x36/0x50 arch/x86/include/asm/floppy.h:156
raw_cmd_copyin drivers/block/floppy.c:3159 [inline]
raw_cmd_ioctl drivers/block/floppy.c:3206 [inline]
fd_locked_ioctl+0xa00/0x2c10 drivers/block/floppy.c:3544
fd_ioctl+0x40/0x60 drivers/block/floppy.c:3571
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0xb3c/0x1a30 block/ioctl.c:601
block_ioctl+0x105/0x150 fs/block_dev.c:1883
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1c0/0x1150 fs/ioctl.c:687
ksys_ioctl+0x9e/0xb0 fs/ioctl.c:702
__do_sys_ioctl fs/ioctl.c:709 [inline]
__se_sys_ioctl fs/ioctl.c:707 [inline]
__x64_sys_ioctl+0x7e/0xc0 fs/ioctl.c:707
do_syscall_64+0xc4/0x510 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Note that this is not a kvmalloc path. It is just that the fast path
really depends on having sanitzed order as well. Therefore move the
order check to the fast path.
Link: http://lkml.kernel.org/r/20181113094305.GM15120@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Byoungyoung Lee <lifeasageek@gmail.com>
Cc: "Dae R. Jeong" <threeearcat@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Fri, 16 Nov 2018 23:08:43 +0000 (15:08 -0800)]
scripts/spdxcheck.py: make python3 compliant
Without this change the following happens when using Python3 (3.6.6):
$ echo "GPL-2.0" | python3 scripts/spdxcheck.py -
FAIL: 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "scripts/spdxcheck.py", line 253, in <module>
parser.parse_lines(sys.stdin, args.maxlines, '-')
File "scripts/spdxcheck.py", line 171, in parse_lines
line = line.decode(locale.getpreferredencoding(False), errors='ignore')
AttributeError: 'str' object has no attribute 'decode'
So as the line is already a string, there is no need to decode it and
the line can be dropped.
/usr/bin/python on Arch is Python 3. So this would indeed be worth
going into 4.19.
Link: http://lkml.kernel.org/r/20181023070802.22558-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joe Perches <joe@perches.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yufen Yu [Fri, 16 Nov 2018 23:08:39 +0000 (15:08 -0800)]
tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
Other filesystems such as ext4, f2fs and ubifs all return ENXIO when
lseek (SEEK_DATA or SEEK_HOLE) requests a negative offset.
man 2 lseek says
: EINVAL whence is not valid. Or: the resulting file offset would be
: negative, or beyond the end of a seekable device.
:
: ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond
: the end of the file.
Make tmpfs return ENXIO under these circumstances as well. After this,
tmpfs also passes xfstests's generic/448.
[akpm@linux-foundation.org: rewrite changelog]
Link: http://lkml.kernel.org/r/1540434176-14349-1-git-send-email-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Fri, 16 Nov 2018 23:08:35 +0000 (15:08 -0800)]
lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
gcc-8 complains about the prototype for this function:
lib/ubsan.c:432:1: error: ignoring attribute 'noreturn' in declaration of a built-in function '__ubsan_handle_builtin_unreachable' because it conflicts with attribute 'const' [-Werror=attributes]
This is actually a GCC's bug. In GCC internals
__ubsan_handle_builtin_unreachable() declared with both 'noreturn' and
'const' attributes instead of only 'noreturn':
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84210
Workaround this by removing the noreturn attribute.
[aryabinin: add information about GCC bug in changelog]
Link: http://lkml.kernel.org/r/20181107144516.4587-1-aryabinin@virtuozzo.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Janne Huttunen [Fri, 16 Nov 2018 23:08:32 +0000 (15:08 -0800)]
mm/vmstat.c: fix NUMA statistics updates
Scan through the whole array to see if an update is needed. While we're
at it, use sizeof() to be safe against any possible type changes in the
future.
The bug here is that we wouldn't sync per-cpu counters into global ones
if there was an update of numa_stats for higher cpus. Highly
theoretical one though because it is much more probable that zone_stats
are updated so we would refresh anyway. So I wouldn't bother to mark
this for stable, yet something nice to fix.
[mhocko@suse.com: changelog enhancement]
Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
Fixes: 1d90ca897cb0 ("mm: update NUMA counter threshold size")
Signed-off-by: Janne Huttunen <janne.huttunen@nokia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>