openwrt/staging/blogic.git
5 years agopowerpc/32: always populate page tables for Abatron BDI.
Christophe Leroy [Thu, 21 Feb 2019 19:08:41 +0000 (19:08 +0000)]
powerpc/32: always populate page tables for Abatron BDI.

When CONFIG_BDI_SWITCH is set, the page tables have to be populated
allthough large TLBs are used, because the BDI switch knows nothing
about those large TLBs which are handled directly in TLB miss logic.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
Christophe Leroy [Thu, 21 Feb 2019 19:08:40 +0000 (19:08 +0000)]
powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.

Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: rework mmu_mapin_ram()
Christophe Leroy [Thu, 21 Feb 2019 19:08:39 +0000 (19:08 +0000)]
powerpc/mm/32s: rework mmu_mapin_ram()

This patch reworks mmu_mapin_ram() to be more generic and map as much
blocks as possible. It now supports blocks not starting at address 0.

It scans DBATs array to find free ones instead of forcing the use of
BAT2 and BAT3.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32: add base address to mmu_mapin_ram()
Christophe Leroy [Thu, 21 Feb 2019 19:08:38 +0000 (19:08 +0000)]
powerpc/mm/32: add base address to mmu_mapin_ram()

At the time being, mmu_mapin_ram() always maps RAM from the beginning.
But some platforms like the WII have to map a second block of RAM.

This patch adds to mmu_mapin_ram() the base address of the block.
At the moment, only base address 0 is supported.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/wii: properly disable use of BATs when requested.
Christophe Leroy [Thu, 21 Feb 2019 19:08:37 +0000 (19:08 +0000)]
powerpc/wii: properly disable use of BATs when requested.

'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
deny the use of BATS for mapping memory.

This patch makes sure that the specific wii RAM mapping function
takes it into account as well.

Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram")
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Map 32Mb of RAM at init.
Christophe Leroy [Wed, 13 Feb 2019 16:06:21 +0000 (16:06 +0000)]
powerpc/8xx: Map 32Mb of RAM at init.

At the time being, initial MMU setup allows 24 Mbytes
of DATA and 8 Mbytes of code.

Some debug setup like CONFIG_KASAN generate huge
kernels with text size over the 8M limit and data over the
24 Mbytes limit.

Here is an 8xx kernel compiled with CONFIG_KASAN_INLINE for
one of my boards:

[root@po16846vm linux-powerpc]# size -x vmlinux
   text    data     bss     dec     hex filename
0x111019c 0x41b0d4 0x490de0 26984528 19bc050 vmlinux

This patch maps up to 32 Mbytes code based on _einittext symbol
and allows 32 Mbytes of memory instead of 24.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: replace most #ifdef by IS_ENABLED() in 8xx_mmu.c
Christophe Leroy [Wed, 13 Feb 2019 16:06:19 +0000 (16:06 +0000)]
powerpc/8xx: replace most #ifdef by IS_ENABLED() in 8xx_mmu.c

This patch replaces most #ifdef mess by IS_ENABLED() in 8xx_mmu.c
This has the advantage of allowing syntax verification at compile
time regardless of selected options.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for addc[.] instruction
Sandipan Das [Wed, 20 Feb 2019 06:57:00 +0000 (12:27 +0530)]
powerpc: sstep: Add tests for addc[.] instruction

This adds test cases for the addc[.] instruction.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for add[.] instruction
Sandipan Das [Wed, 20 Feb 2019 06:56:59 +0000 (12:26 +0530)]
powerpc: sstep: Add tests for add[.] instruction

This adds test cases for the add[.] instruction.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for compute type instructions
Sandipan Das [Wed, 20 Feb 2019 06:56:58 +0000 (12:26 +0530)]
powerpc: sstep: Add tests for compute type instructions

This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
[mpe: Use patch_site for the code patching]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoRevert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"
Michael Ellerman [Sat, 23 Feb 2019 09:30:50 +0000 (20:30 +1100)]
Revert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"

This reverts commit 78ca1108b10927b3d068c8da91352b0f4cd01fc5.

It is causing boot failures with qemu mac99 in at least some
configurations.

5 years agopowerpc: Move page table dump files in a dedicated subdirectory
Christophe Leroy [Mon, 18 Feb 2019 12:28:36 +0000 (12:28 +0000)]
powerpc: Move page table dump files in a dedicated subdirectory

This patch moves the files related to page table dump in a
dedicated subdirectory.

The purpose is to clean a bit arch/powerpc/mm by regrouping
multiple files handling a dedicated function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Shorten the file names while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: dump as a single line areas mapping a single physical page.
Christophe Leroy [Mon, 18 Feb 2019 12:25:20 +0000 (12:25 +0000)]
powerpc: dump as a single line areas mapping a single physical page.

When using KASAN, there are parts of the shadow area where all
pages are mapped to the kasan_early_shadow_page. It is pointless
to dump one line for each of those pages (in the example below there
are 7168 entries pointing to the same physical page).

~# cat /sys/kernel/debug/kernel_page_tables
 ...
---[ kasan shadow mem start ]---
0xf7c00000-0xf8bfffff 0x06fac000       16M        rw       present           dirty  accessed
0xf8c00000-0xf8c03fff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c04000-0xf8c07fff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c08000-0xf8c0bfff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c0c000-0xf8c0ffff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c10000-0xf8c13fff 0x00cd0000       16K        r        present           dirty  accessed
 ... 7168 identical lines
0xffbfc000-0xffbfffff 0x00cd0000       16K        r        present           dirty  accessed
---[ kasan shadow mem end ]---
 ...

This patch modifies linux table dump to dump as a single line areas
where all addresses points to the same physical page. That physical
address is put inside [] to show that all virt pages points to the
same phys page.

~# cat /sys/kernel/debug/kernel_page_tables
 ...
---[ kasan shadow mem start ]---
0xf7c00000-0xf8bfffff  0x06fac000        16M        rw       present           dirty  accessed
0xf8c00000-0xffbfffff [0x00cd0000]       16K        r        present           dirty  accessed
---[ kasan shadow mem end ]---
 ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agotools/selftest/vm: allow choosing mem size and page size in map_hugetlb
Christophe Leroy [Fri, 8 Feb 2019 15:02:55 +0000 (15:02 +0000)]
tools/selftest/vm: allow choosing mem size and page size in map_hugetlb

map_hugetlb maps 256Mbytes of memory with default hugepage size.

This patch allows the user to pass the size and page shift as an
argument in order to use different size and page size.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke
Christophe Leroy [Thu, 31 Jan 2019 10:10:31 +0000 (10:10 +0000)]
powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke

40x/booke have another path to reach 3f from transfer_to_handler,
make sure it also calls ACCOUNT_CPU_USER_ENTRY() when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling
Christophe Leroy [Fri, 25 Jan 2019 12:34:20 +0000 (12:34 +0000)]
powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling

For pages without _PAGE_USER, PP field is 00
For pages with _PAGE_USER, PP field is 10 for RW and 11 for RO.

This patch sets _PAGE_USER to 0x002 and _PAGE_RW to 0x001
is order to simplify TLB handling by reducing amount of shifts.

The location of _PAGE_PRESENT and _PAGE_HASHPTE doesn't matter
as they are only SW related flags.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:38:02 +0000 (10:38 +0000)]
powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.

PAGE_ACCESSED is only needed for CONFIG_SWAP. When CONFIG_SWAP
is not set, just ignore it. If CONFIG_SWAP is set and PAGE_ACCESSED
is not, let's take a minor fault.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't worry about _PAGE_USER in TLB miss handlers
Christophe Leroy [Thu, 21 Feb 2019 10:38:01 +0000 (10:38 +0000)]
powerpc/603: Don't worry about _PAGE_USER in TLB miss handlers

PP bits take user access into account, so no need to check _PAGE_USER
here. A DSI or ISI will be generated if needed.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: let's handle PAGE_DIRTY directly
Christophe Leroy [Thu, 21 Feb 2019 10:38:00 +0000 (10:38 +0000)]
powerpc/603: let's handle PAGE_DIRTY directly

PAGE_DIRTY corresponds to the C bit. If writing on
a page for which the C bit is not set, a DataStoreTLBMiss
is generated. No need to check it in DataLoadTLBMiss.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't handle _PAGE_RW and _PAGE_DIRTY on ITLB misses
Christophe Leroy [Thu, 21 Feb 2019 10:37:59 +0000 (10:37 +0000)]
powerpc/603: Don't handle _PAGE_RW and _PAGE_DIRTY on ITLB misses

_PAGE_RW and _PAGE_DIRTY do not matter for ITLB misses.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't handle kernel page TLB misses when not need
Christophe Leroy [Thu, 21 Feb 2019 10:37:58 +0000 (10:37 +0000)]
powerpc/603: Don't handle kernel page TLB misses when not need

ITLB miss on kernel pages only occur with CONFIG_MODULES and
CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/hash32: use physical address directly in hash handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:37:57 +0000 (10:37 +0000)]
powerpc/hash32: use physical address directly in hash handlers.

Since commit c62ce9ef97ba ("powerpc: remove remaining bits from
CONFIG_APUS"), tophys() has become a pure constant operation.
PAGE_OFFSET is known at compile time so the physical address
can be builtin directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: use physical address directly in TLB miss handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:37:56 +0000 (10:37 +0000)]
powerpc/603: use physical address directly in TLB miss handlers.

Since commit c62ce9ef97ba ("powerpc: remove remaining bits from
CONFIG_APUS"), tophys() has become a pure constant operation.
PAGE_OFFSET is known at compile time so the physical address
can be builtin directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/6xx: Store PGDIR physical address in a SPRG
Christophe Leroy [Thu, 21 Feb 2019 10:37:55 +0000 (10:37 +0000)]
powerpc/6xx: Store PGDIR physical address in a SPRG

Use SPRN_SPRG2 to store the current thread PGDIR and
avoid reading thread_struct.pgdir at every TLB miss.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS
Christophe Leroy [Thu, 21 Feb 2019 10:37:54 +0000 (10:37 +0000)]
powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS

When calling RTAS, the stack pointer is stored in SPRN_SPRG2
in order to be able to restore it in case of machine check in RTAS.

As machine check is not a perfomance critical path, this patch
frees SPRN_SPRG2 by using a field in thread struct instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: simplify BDI switch
Christophe Leroy [Thu, 21 Feb 2019 10:37:53 +0000 (10:37 +0000)]
powerpc: simplify BDI switch

There is no reason to re-read each time the pointer at
location 0xf0 as it is fixed and known.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/83xx: Also save/restore SPRG4-7 during suspend
Christophe Leroy [Fri, 25 Jan 2019 12:03:55 +0000 (12:03 +0000)]
powerpc/83xx: Also save/restore SPRG4-7 during suspend

The 83xx has 8 SPRG registers and uses at least SPRG4
for DTLB handling LRU.

Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/traps: fix recoverability of machine check handling on book3s/32
Christophe Leroy [Tue, 22 Jan 2019 14:11:24 +0000 (14:11 +0000)]
powerpc/traps: fix recoverability of machine check handling on book3s/32

Looks like book3s/32 doesn't set RI on machine check, so
checking RI before calling die() will always be fatal
allthought this is not an issue in most cases.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Remove unneccessary MSR[RI] clearing for 8xx
Christophe Leroy [Tue, 22 Jan 2019 13:52:04 +0000 (13:52 +0000)]
powerpc/32: Remove unneccessary MSR[RI] clearing for 8xx

MSR[RI] has already been cleared a few lines above.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/setup: display reason for not booting
Christophe Leroy [Tue, 18 Dec 2018 06:53:41 +0000 (06:53 +0000)]
powerpc/setup: display reason for not booting

When no machine description matches, display it clearly
before looping forever.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: hide itlbie and dtlbie symbols
Christophe Leroy [Thu, 13 Dec 2018 08:08:11 +0000 (08:08 +0000)]
powerpc/8xx: hide itlbie and dtlbie symbols

When disassembling InstructionTLBError we get the following messy code:

c000138c:       7d 84 63 78     mr      r4,r12
c0001390:       75 25 58 00     andis.  r5,r9,22528
c0001394:       75 2a 40 00     andis.  r10,r9,16384
c0001398:       41 a2 00 08     beq     c00013a0 <itlbie>
c000139c:       7c 00 22 64     tlbie   r4,r0

c00013a0 <itlbie>:
c00013a0:       39 40 04 01     li      r10,1025
c00013a4:       91 4b 00 b0     stw     r10,176(r11)
c00013a8:       39 40 10 32     li      r10,4146
c00013ac:       48 00 cc 59     bl      c000e004 <transfer_to_handler>

For a cleaner code dump, this patch replaces itlbie and dtlbie
symbols by local symbols.

c000138c:       7d 84 63 78     mr      r4,r12
c0001390:       75 25 58 00     andis.  r5,r9,22528
c0001394:       75 2a 40 00     andis.  r10,r9,16384
c0001398:       41 a2 00 08     beq     c00013a0 <InstructionTLBError+0xa0>
c000139c:       7c 00 22 64     tlbie   r4,r0
c00013a0:       39 40 04 01     li      r10,1025
c00013a4:       91 4b 00 b0     stw     r10,176(r11)
c00013a8:       39 40 10 32     li      r10,4146
c00013ac:       48 00 cc 59     bl      c000e004 <transfer_to_handler>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/selftest: fix type of mftb() in null_syscall
Christophe Leroy [Tue, 22 Jan 2019 13:54:57 +0000 (13:54 +0000)]
powerpc/selftest: fix type of mftb() in null_syscall

All callers of mftb() expect 'unsigned long', and the function itself
only returns lower part of the TB so it really is 'unsigned long'
not 'unsigned long long'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit
Paul Mackerras [Tue, 12 Feb 2019 00:58:29 +0000 (11:58 +1100)]
powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit

Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
inside pnv_cpu_offline(), with the aim of changing the LPCR value in
the SLW image to disable wakeups from the decrementer while a CPU is
offline.  However, pnv_cpu_offline() gets called each time a secondary
CPU thread is woken up to participate in running a KVM guest, that is,
not just when a CPU is offlined.

Since opal_slw_set_reg() is a very slow operation (with observed
execution times around 20 milliseconds), this means that an offline
secondary CPU can often be busy doing the opal_slw_set_reg() call
when the primary CPU wants to grab all the secondary threads so that
it can run a KVM guest.  This leads to messages like "KVM: couldn't
grab CPU n" being printed and guest execution failing.

There is no need to reprogram the SLW image on every KVM guest entry
and exit.  So that we do it only when a CPU is really transitioning
between online and offline, this moves the calls to
pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().

Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix logic when handling unknown CPU features
Michael Ellerman [Mon, 11 Feb 2019 00:20:01 +0000 (11:20 +1100)]
powerpc/64s: Fix logic when handling unknown CPU features

In cpufeatures_process_feature(), if a provided CPU feature is unknown and
enable_unknown is false, we erroneously print that the feature is being
enabled and return true, even though no feature has been enabled, and
may also set feature bits based on the last entry in the match table.

Fix this so that we only set feature bits from the match table if we have
actually enabled a feature from that table, and when failing to enable an
unknown feature, always print the "not enabling" message and return false.

Coincidentally, some older gccs (<GCC 7), when invoked with
-fsanitize-coverage=trace-pc, cause a spurious uninitialised variable
warning in this function:

  arch/powerpc/kernel/dt_cpu_ftrs.c: In function â€˜cpufeatures_process_feature’:
  arch/powerpc/kernel/dt_cpu_ftrs.c:686:7: warning: â€˜m’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (m->cpu_ftr_bit_mask)

An upcoming patch will enable support for kcov, which requires this option.
This patch avoids the warning.

Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Reported-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ajd: add commit message]
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
5 years agopowerpc/smp: Make __smp_send_nmi_ipi() static
Nicholas Piggin [Mon, 26 Nov 2018 02:01:07 +0000 (12:01 +1000)]
powerpc/smp: Make __smp_send_nmi_ipi() static

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/smp: Fix NMI IPI xmon timeout
Nicholas Piggin [Mon, 26 Nov 2018 02:01:06 +0000 (12:01 +1000)]
powerpc/smp: Fix NMI IPI xmon timeout

The xmon debugger IPI handler waits in the callback function while
xmon is still active. This means they don't complete the IPI, and the
initiator always times out waiting for them.

Things manage to work after the timeout because there is some fallback
logic to keep NMI IPI state sane in case of the timeout, but this is a
bit ugly.

This patch changes NMI IPI back to half-asynchronous (i.e., wait for
everyone to call in, do not wait for IPI function to complete), but
the complexity is avoided by going one step further and allowing new
IPIs to be issued before the IPI functions to all complete.

If synchronization against that is required, it is left up to the
caller, but current callers don't require that. In fact with the
timeout handling, callers must be able to cope with this already.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/smp: Fix NMI IPI timeout
Nicholas Piggin [Mon, 26 Nov 2018 02:01:05 +0000 (12:01 +1000)]
powerpc/smp: Fix NMI IPI timeout

The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi() times out
on the first condition, delay_us will be zero which will send it into
the second spin loop with no timeout so it will spin forever.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64
Michael Ellerman [Fri, 8 Feb 2019 12:34:16 +0000 (23:34 +1100)]
powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
64Kpage size from 64bit booke") we dropped the 64K page size support
from the 64-bit nohash (Book3E) code.

But we didn't update the dependencies of the PPC_64K_PAGES option,
meaning a randconfig can still trigger this code and cause a build
breakage, eg:
  arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error "Page size not supported"
  arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error Unsupported page size

So remove PPC_BOOK3E_64 from the dependencies. This also means we
don't need to worry about PPC_FSL_BOOK3E, because that was just trying
to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64: Make sys_switch_endian() traceable
Michael Ellerman [Tue, 15 Jan 2019 06:37:36 +0000 (17:37 +1100)]
powerpc/64: Make sys_switch_endian() traceable

We weren't using SYSCALL_DEFINE for sys_switch_endian(), which means
it wasn't able to be traced by CONFIG_FTRACE_SYSCALLS.

By using the macro we create the right metadata and the syscall is
visible. eg:

  # cd /sys/kernel/debug/tracing
  # echo 1 | tee events/syscalls/sys_*_switch_endian/enable
  # ~/switch_endian_test
  # cat trace
  ...
  switch_endian_t-3604  [009] ....   315.175164: sys_switch_endian()
  switch_endian_t-3604  [009] ....   315.175167: sys_switch_endian -> 0x5555aaaa5555aaaa
  switch_endian_t-3604  [009] ....   315.175169: sys_switch_endian()
  switch_endian_t-3604  [009] ....   315.175169: sys_switch_endian -> 0x5555aaaa5555aaaa

Fixes: 529d235a0e19 ("powerpc: Add a proper syscall for switching endianness")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dts: Standardize DTS status assignments from "ok" to "okay"
Robert P. J. Day [Sat, 2 Sep 2017 06:47:26 +0000 (02:47 -0400)]
powerpc/dts: Standardize DTS status assignments from "ok" to "okay"

While the current kernel drivers/of/ code allows developers to be
sloppy and use a DTS status value of "ok", the current DTSpec 0.1
makes it clear that the proper spelling is "okay", so fix the small
number of PowerPC .dts files that do this.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s: Remove pgd/pud/pmd_set() interfaces
Aneesh Kumar K.V [Thu, 14 Feb 2019 06:45:40 +0000 (12:15 +0530)]
powerpc/book3s: Remove pgd/pud/pmd_set() interfaces

When updating page tables, we need to make sure we fill the page table
entry valid bits. We do this by or'ing in one of PGD/PUD/PMD_VAL_BITS.

The page table 'set' interfaces allow updating the raw value of page
table entries without setting the valid bits, so remove those
interfaces to avoid incorrect usage in future.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Reword commit message based on mailing list discussion]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
Mark Cave-Ayland [Fri, 8 Feb 2019 14:33:19 +0000 (14:33 +0000)]
powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest

Commit 8792468da5e1 "powerpc: Add the ability to save FPU without
giving it up" unexpectedly removed the MSR_FE0 and MSR_FE1 bits from
the bitmask used to update the MSR of the previous thread in
__giveup_fpu() causing a KVM-PR MacOS guest to lockup and panic the
host kernel.

Leaving FE0/1 enabled means unrelated processes might receive FPEs
when they're not expecting them and crash. In particular if this
happens to init the host will then panic.

eg (transcribed):
  qemu-system-ppc[837]: unhandled signal 8 at 12cc9ce4 nip 12cc9ce4 lr 12cc9ca4 code 0
  systemd[1]: unhandled signal 8 at 202f02e0 nip 202f02e0 lr 001003d4 code 0
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Reinstate these bits to the MSR bitmask to enable MacOS guests to run
under 32-bit KVM-PR once again without issue.

Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries: export timebase register sample in lparcfg
Tyrel Datwyler [Sat, 8 Dec 2018 23:48:27 +0000 (17:48 -0600)]
powerpc/pseries: export timebase register sample in lparcfg

The Processor Utilzation of Resource Registers (PURR) provide an
estimate of resources used by a cpu thread. Section 7.6 in Book III of
the ISA outlines how to calculate the percentage of shared resources
for threads using the ratio of the PURR delta and Timebase Register
delta for a sampled period.

This calculation is currently done erroneously by the lparstat tool
from the powerpc-utils package. This patch exports the current
timebase value after we sample the PURRs and exposes it to userspace
accounting tools via /proc/ppc64/lparcfg.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/44x: Force PCI on for CURRITUCK
Michael Ellerman [Thu, 7 Feb 2019 02:43:26 +0000 (13:43 +1100)]
powerpc/44x: Force PCI on for CURRITUCK

The recent rework of PCI kconfig symbols exposed an existing bug in
the CURRITUCK kconfig logic.

It selects PPC4xx_PCI_EXPRESS which depends on PCI, but PCI is user
selectable and might be disabled, leading to a warning:

  WARNING: unmet direct dependencies detected for PPC4xx_PCI_EXPRESS
    Depends on [n]: PCI [=n] && 4xx [=y]
    Selected by [y]:
    - CURRITUCK [=y] && PPC_47x [=y]

Prior to commit eb01d42a7778 ("PCI: consolidate PCI config entry in
drivers/pci") PCI was enabled by default for currituck_defconfig so we
didn't see the warning. The bad logic was still there, it just
required someone disabling PCI in their .config to hit it.

Fix it by forcing PCI on for CURRITUCK, which seems was always the
expectation anyway.

Fixes: eb01d42a7778 ("PCI: consolidate PCI config entry in drivers/pci")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh: Add eeh_force_recover to debugfs
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:17 +0000 (11:48 +1100)]
powerpc/eeh: Add eeh_force_recover to debugfs

This patch adds a debugfs interface to force scheduling a recovery event.
This can be used to recover a specific PE or schedule a "special" recovery
even that checks for errors at the PHB level.
To force a recovery of a normal PE, use:

 echo '<#pe>:<#phb>' > /sys/kernel/debug/powerpc/eeh_force_recover

To force a scan for broken PHBs:

 echo 'hwcheck' > /sys/kernel/debug/powerpc/eeh_force_recover

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh: Allow disabling recovery
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:16 +0000 (11:48 +1100)]
powerpc/eeh: Allow disabling recovery

Currently when we detect an error we automatically invoke the EEH recovery
handler. This can be annoying when debugging EEH problems, or when working
on EEH itself so this patch adds a debugfs knob that will prevent a
recovery event from being queued up when an issue is detected.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pci: Add pci_find_controller_for_domain()
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:15 +0000 (11:48 +1100)]
powerpc/pci: Add pci_find_controller_for_domain()

Add a helper to find the pci_controller structure based on the domain
number / phb id.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh_cache: Bump log level of eeh_addr_cache_print()
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:14 +0000 (11:48 +1100)]
powerpc/eeh_cache: Bump log level of eeh_addr_cache_print()

To use this function at all #define DEBUG needs to be set in eeh_cache.c.
Considering that printing at pr_debug is probably not all that useful since
it adds the additional hurdle of requiring you to enable the debug print if
dynamic_debug is in use so this patch bumps it to pr_info.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh_cache: Add a way to dump the EEH address cache
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:13 +0000 (11:48 +1100)]
powerpc/eeh_cache: Add a way to dump the EEH address cache

Adds a debugfs file that can be read to view the contents of the EEH
address cache. This is pretty similar to the existing
eeh_addr_cache_print() function, but that function is intended to debug
issues inside of the kernel since it's #ifdef`ed out by default, and writes
into the kernel log.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh_cache: Add pr_debug() prints for insert/remove
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:12 +0000 (11:48 +1100)]
powerpc/eeh_cache: Add pr_debug() prints for insert/remove

The EEH address cache is used to map a physical MMIO address back to a PCI
device. It's useful to know when it's being manipulated, but currently this
requires recompiling with #define DEBUG set. This is pointless since we
have dynamic_debug nowdays, so remove the #ifdef guard and add a pr_debug()
for the remove case too.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes
Oliver O'Halloran [Fri, 15 Feb 2019 00:48:11 +0000 (11:48 +1100)]
powerpc/eeh: Use debugfs_create_u32 for eeh_max_freezes

There's no need to the custom getter/setter functions so we should remove
them in favour of using the generic one. While we're here, change the type
of eeh_max_freeze to u32 and print the value in decimal rather than
hex because printing it in hex makes no sense.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: drop unused GENERIC_CSUM Kconfig item
Christophe Leroy [Fri, 15 Feb 2019 10:32:02 +0000 (10:32 +0000)]
powerpc: drop unused GENERIC_CSUM Kconfig item

Commit d4fde568a34a ("powerpc/64: Use optimized checksum routines on
little-endian") converted last powerpc user of GENERIC_CSUM.

This patch does a final cleanup dropping the Kconfig GENERIC_CSUM
option which is always 'n', and associated piece of code in
asm/checksum.h

Fixes: d4fde568a34a ("powerpc/64: Use optimized checksum routines on little-endian")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s/hash: Fix assert_slb_presence() use of the slbfee. instruction
Nicholas Piggin [Fri, 15 Feb 2019 10:20:20 +0000 (20:20 +1000)]
powerpc/64s/hash: Fix assert_slb_presence() use of the slbfee. instruction

The slbfee. instruction must have bit 24 of RB clear, failure to do
so can result in false negatives that result in incorrect assertions.

This is not obvious from the ISA v3.0B document, which only says:

    The hardware ignores the contents of RB 36:38 40:63 -- p.1032

This patch fixes the bug and also clears all other bits from PPC bit
36-63, which is good practice when dealing with reserved or ignored
bits.

Fixes: e15a4fea4dee ("powerpc/64s/hash: Add some SLB debugging tests")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/hash: Increase vmalloc space to 512T with hash MMU
Michael Ellerman [Wed, 13 Feb 2019 11:15:09 +0000 (16:45 +0530)]
powerpc/mm/hash: Increase vmalloc space to 512T with hash MMU

This patch updates the kernel non-linear virtual map to 512TB when
we're built with 64K page size and are using the hash MMU. We allocate
one context for the vmalloc region and hence the max virtual area size
is limited by the context map size (512TB for 64K and 64TB for 4K page
size).

This patch fixes boot failures with large amounts of system RAM where
we need large vmalloc space to handle per cpu allocations.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
5 years agopowerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
Michael Ellerman [Thu, 14 Feb 2019 00:08:29 +0000 (11:08 +1100)]
powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning

GCC 8 warns about the logic in vr_get/set(), which with -Werror breaks
the build:

  In function â€˜user_regset_copyin’,
      inlined from â€˜vr_set’ at arch/powerpc/kernel/ptrace.c:628:9:
  include/linux/regset.h:295:4: error: â€˜memcpy’ offset [-527, -529] is
  out of the bounds [0, 16] of object â€˜vrsave’ with type â€˜union
  <anonymous>’ [-Werror=array-bounds]
  arch/powerpc/kernel/ptrace.c: In function â€˜vr_set’:
  arch/powerpc/kernel/ptrace.c:623:5: note: â€˜vrsave’ declared here
     } vrsave;

This has been identified as a regression in GCC, see GCC bug 88273.

However we can avoid the warning and also simplify the logic and make
it more robust.

Currently we pass -1 as end_pos to user_regset_copyout(). This says
"copy up to the end of the regset".

The definition of the regset is:
[REGSET_VMX] = {
.core_note_type = NT_PPC_VMX, .n = 34,
.size = sizeof(vector128), .align = sizeof(vector128),
.active = vr_active, .get = vr_get, .set = vr_set
},

The end is calculated as (n * size), ie. 34 * sizeof(vector128).

In vr_get/set() we pass start_pos as 33 * sizeof(vector128), meaning
we can copy up to sizeof(vector128) into/out-of vrsave.

The on-stack vrsave is defined as:
  union {
  elf_vrreg_t reg;
  u32 word;
  } vrsave;

And elf_vrreg_t is:
  typedef __vector128 elf_vrreg_t;

So there is no bug, but we rely on all those sizes lining up,
otherwise we would have a kernel stack exposure/overwrite on our
hands.

Rather than relying on that we can pass an explict end_pos based on
the sizeof(vrsave). The result should be exactly the same but it's
more obviously not over-reading/writing the stack and it avoids the
compiler warning.

Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Cc: stable@vger.kernel.org
Tested-by: Mathieu Malaterre <malat@debian.org>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv/npu: Remove redundant change_pte() hook
Peter Xu [Thu, 31 Jan 2019 10:30:22 +0000 (18:30 +0800)]
powerpc/powernv/npu: Remove redundant change_pte() hook

The change_pte() notifier was designed to use as a quick path to
update secondary MMU PTEs on write permission changes or PFN changes.
For KVM, it could reduce the vm-exits when vcpu faults on the pages
that was touched up by KSM. It's not used to do cache invalidations,
for example, if we see the notifier will be called before the real PTE
update after all (please see set_pte_at_notify that set_pte_at was
called later).

All the necessary cache invalidation should all be done in
invalidate_range() already.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Thu, 21 Feb 2019 13:09:56 +0000 (00:09 +1100)]
Merge branch 'topic/ppc-kvm' into next

Merge commits we're sharing with kvm-ppc tree.

5 years agopowerpc/64s: Better printing of machine check info for guest MCEs
Paul Mackerras [Thu, 21 Feb 2019 02:40:20 +0000 (13:40 +1100)]
powerpc/64s: Better printing of machine check info for guest MCEs

This adds an "in_guest" parameter to machine_check_print_event_info()
so that we can avoid trying to translate guest NIP values into
symbolic form using the host kernel's symbol table.

Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoKVM: PPC: Book3S HV: Simplify machine check handling
Paul Mackerras [Thu, 21 Feb 2019 02:38:49 +0000 (13:38 +1100)]
KVM: PPC: Book3S HV: Simplify machine check handling

This makes the handling of machine check interrupts that occur inside
a guest simpler and more robust, with less done in assembler code and
in real mode.

Now, when a machine check occurs inside a guest, we always get the
machine check event struct and put a copy in the vcpu struct for the
vcpu where the machine check occurred.  We no longer call
machine_check_queue_event() from kvmppc_realmode_mc_power7(), because
on POWER8, when a vcpu is running on an offline secondary thread and
we call machine_check_queue_event(), that calls irq_work_queue(),
which doesn't work because the CPU is offline, but instead triggers
the WARN_ON(lazy_irq_pending()) in pnv_smp_cpu_kill_self() (which
fires again and again because nothing clears the condition).

All that machine_check_queue_event() actually does is to cause the
event to be printed to the console.  For a machine check occurring in
the guest, we now print the event in kvmppc_handle_exit_hv()
instead.

The assembly code at label machine_check_realmode now just calls C
code and then continues exiting the guest.  We no longer either
synthesize a machine check for the guest in assembly code or return
to the guest without a machine check.

The code in kvmppc_handle_exit_hv() is extended to handle the case
where the guest is not FWNMI-capable.  In that case we now always
synthesize a machine check interrupt for the guest.  Previously, if
the host thinks it has recovered the machine check fully, it would
return to the guest without any notification that the machine check
had occurred.  If the machine check was caused by some action of the
guest (such as creating duplicate SLB entries), it is much better to
tell the guest that it has caused a problem.  Therefore we now always
generate a machine check interrupt for guests that are not
FWNMI-capable.

Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge branch 'topic/dma' into next
Michael Ellerman [Thu, 21 Feb 2019 12:15:10 +0000 (23:15 +1100)]
Merge branch 'topic/dma' into next

Merge hch's big DMA rework series. This is in a topic branch in case he
wants to merge it to minimise conflicts.

5 years agoKVM: PPC: Book3S HV: Context switch AMR on Power9
Michael Ellerman [Wed, 20 Feb 2019 08:55:00 +0000 (19:55 +1100)]
KVM: PPC: Book3S HV: Context switch AMR on Power9

kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9
when guest and host are both running with the Radix MMU.

Currently in that path we don't save the host AMR (Authority Mask
Register) value, and we always restore 0 on return to the host. That
is OK at the moment because the AMR is not used for storage keys with
the Radix MMU.

However we plan to start using the AMR on Radix to prevent the kernel
from reading/writing to userspace outside of copy_to/from_user(). In
order to make that work we need to save/restore the AMR value.

We only restore the value if it is different from the guest value,
which is already in the register when we exit to the host. This should
mean we rarely need to actually restore the value when running a
modern Linux as a guest, because it will be using the same value as
us.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Russell Currey <ruscur@russell.cc>
5 years agoMerge branch 'fixes' into next
Michael Ellerman [Tue, 19 Feb 2019 08:56:26 +0000 (19:56 +1100)]
Merge branch 'fixes' into next

There's a few important fixes in our fixes branch, in particular the
pgd/pud_present() one, so merge it now.

5 years agopowerpc/dma: trim the fat from <asm/dma-mapping.h>
Christoph Hellwig [Wed, 13 Feb 2019 07:01:33 +0000 (08:01 +0100)]
powerpc/dma: trim the fat from <asm/dma-mapping.h>

There is no need to provide anything but get_arch_dma_ops to
<linux/dma-mapping.h>.  More the remaining declarations to <asm/iommu.h>
and drop all the includes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove set_dma_offset
Christoph Hellwig [Wed, 13 Feb 2019 07:01:32 +0000 (08:01 +0100)]
powerpc/dma: remove set_dma_offset

There is no good reason for this helper, just opencode it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove get_dma_offset
Christoph Hellwig [Wed, 13 Feb 2019 07:01:31 +0000 (08:01 +0100)]
powerpc/dma: remove get_dma_offset

Just fold the calculation into __phys_to_dma/__dma_to_phys as those are
the only places that should know about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: use the generic direct mapping bypass
Christoph Hellwig [Wed, 13 Feb 2019 07:01:30 +0000 (08:01 +0100)]
powerpc/dma: use the generic direct mapping bypass

Now that we've switched all the powerpc nommu and swiotlb methods to
use the generic dma_direct_* calls we can remove these ops vectors
entirely and rely on the common direct mapping bypass that avoids
indirect function calls entirely.  This also allows to remove a whole
lot of boilerplate code related to setting up these operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: use the dma_direct mapping routines
Christoph Hellwig [Wed, 13 Feb 2019 07:01:29 +0000 (08:01 +0100)]
powerpc/dma: use the dma_direct mapping routines

Switch the streaming DMA mapping and ownership transfer methods to the
functionally identical dma_direct_ versions.  Factor the cache
maintainance helpers into the form expected by the common code for that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: use the dma-direct allocator for coherent platforms
Christoph Hellwig [Wed, 13 Feb 2019 07:01:28 +0000 (08:01 +0100)]
powerpc/dma: use the dma-direct allocator for coherent platforms

The generic code allows a few nice things such as node local allocations
and dipping into the CMA area.  The lookup of the right zone for a given
dma mask works a little different, but the results should be the same.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoswiotlb: remove swiotlb_dma_supported
Christoph Hellwig [Wed, 13 Feb 2019 07:01:27 +0000 (08:01 +0100)]
swiotlb: remove swiotlb_dma_supported

The only user left is powerpc, but even there the generic dma-direct
version works just as well, given that we guarantee that the swiotlb
buffer must always be addressable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove dma_nommu_dma_supported
Christoph Hellwig [Wed, 13 Feb 2019 07:01:26 +0000 (08:01 +0100)]
powerpc/dma: remove dma_nommu_dma_supported

This function is largely identical to the generic version used
everywhere else.  Replace it with the generic version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove dma_nommu_get_required_mask
Christoph Hellwig [Wed, 13 Feb 2019 07:01:25 +0000 (08:01 +0100)]
powerpc/dma: remove dma_nommu_get_required_mask

This function is identical to the generic dma_direct_get_required_mask,
except that the generic version also takes the bus_dma_mask account,
which could lead to incorrect results in the powerpc version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove dma_nommu_mmap_coherent
Christoph Hellwig [Wed, 13 Feb 2019 07:01:24 +0000 (08:01 +0100)]
powerpc/dma: remove dma_nommu_mmap_coherent

The coherent cache version of this function already is functionally
identicall to the default version, and by defining the
arch_dma_coherent_to_pfn hook the same is ture for the noncoherent
version as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: use phys_to_dma instead of get_dma_offset
Christoph Hellwig [Wed, 13 Feb 2019 07:01:23 +0000 (08:01 +0100)]
powerpc/dma: use phys_to_dma instead of get_dma_offset

Use the standard portable helper instead of the powerpc specific one,
which is about to go away.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agodma-mapping, powerpc: simplify the arch dma_set_mask override
Christoph Hellwig [Wed, 13 Feb 2019 07:01:22 +0000 (08:01 +0100)]
dma-mapping, powerpc: simplify the arch dma_set_mask override

Instead of letting the architecture supply all of dma_set_mask just
give it an additional hook selected by Kconfig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: fix an off-by-one in dma_capable
Christoph Hellwig [Wed, 13 Feb 2019 07:01:21 +0000 (08:01 +0100)]
powerpc/dma: fix an off-by-one in dma_capable

We need to compare the last byte in the dma range and not the one after it
for the bus_dma_mask, just like we do for the regular dma_mask.  Fix this
cleanly by merging the two comparisms into one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove max_direct_dma_addr
Christoph Hellwig [Wed, 13 Feb 2019 07:01:20 +0000 (08:01 +0100)]
powerpc/dma: remove max_direct_dma_addr

The max_direct_dma_addr duplicates the bus_dma_mask field in struct
device.  Use the generic field instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: move pci_dma_dev_setup_swiotlb to fsl_pci.c
Christoph Hellwig [Wed, 13 Feb 2019 07:01:19 +0000 (08:01 +0100)]
powerpc/dma: move pci_dma_dev_setup_swiotlb to fsl_pci.c

pci_dma_dev_setup_swiotlb is only used by the fsl_pci code, and closely
related to it, so fsl_pci.c seems like a better place for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove get_pci_dma_ops
Christoph Hellwig [Wed, 13 Feb 2019 07:01:18 +0000 (08:01 +0100)]
powerpc/dma: remove get_pci_dma_ops

This function is only used by the Cell iommu code, which can keep track
if it is using the iommu internally just as good.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: remove the iommu fallback for coherent allocations
Christoph Hellwig [Wed, 13 Feb 2019 07:01:17 +0000 (08:01 +0100)]
powerpc/dma: remove the iommu fallback for coherent allocations

All iommu capable platforms now always use the iommu code with the
internal bypass, so there is not need for this magic anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pci: remove the dma_set_mask pci_controller ops methods
Christoph Hellwig [Wed, 13 Feb 2019 07:01:16 +0000 (08:01 +0100)]
powerpc/pci: remove the dma_set_mask pci_controller ops methods

Unused now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: stop overriding dma_get_required_mask
Christoph Hellwig [Wed, 13 Feb 2019 07:01:15 +0000 (08:01 +0100)]
powerpc/dma: stop overriding dma_get_required_mask

The ppc_md and pci_controller_ops methods are unused now and can be
removed.  The dma_nommu implementation is generic to the generic one
except for using max_pfn instead of calling into the memblock API,
and all other dma_map_ops instances implement a method of their own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: use the generic iommu bypass code
Christoph Hellwig [Wed, 13 Feb 2019 07:01:14 +0000 (08:01 +0100)]
powerpc/powernv: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: remove pnv_npu_dma_set_mask
Christoph Hellwig [Wed, 13 Feb 2019 07:01:13 +0000 (08:01 +0100)]
powerpc/powernv: remove pnv_npu_dma_set_mask

These devices are not PCIe devices and do not have associated dma map
ops, so this is just dead code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: remove pnv_pci_ioda_pe_single_vendor
Christoph Hellwig [Wed, 13 Feb 2019 07:01:12 +0000 (08:01 +0100)]
powerpc/powernv: remove pnv_pci_ioda_pe_single_vendor

This function is completely bogus - the fact that two PCIe devices come
from the same vendor has absolutely nothing to say about the DMA
capabilities and characteristics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dart: use the generic iommu bypass code
Christoph Hellwig [Wed, 13 Feb 2019 07:01:11 +0000 (08:01 +0100)]
powerpc/dart: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dart: remove dead cleanup code in iommu_init_early_dart
Christoph Hellwig [Wed, 13 Feb 2019 07:01:10 +0000 (08:01 +0100)]
powerpc/dart: remove dead cleanup code in iommu_init_early_dart

If dart_init failed we didn't have a chance to setup dma or controller
ops yet, so there is no point in resetting them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/cell: use the generic iommu bypass code
Christoph Hellwig [Wed, 13 Feb 2019 07:01:09 +0000 (08:01 +0100)]
powerpc/cell: use the generic iommu bypass code

This gets rid of a lot of clumsy code and finally allows us to mark
dma_iommu_ops const.

Includes fixes from Michael Ellerman.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/cell: move dma direct window setup out of dma_configure
Christoph Hellwig [Wed, 13 Feb 2019 07:01:08 +0000 (08:01 +0100)]
powerpc/cell: move dma direct window setup out of dma_configure

Configure the dma settings at device setup time, and stop playing games
with get_pci_dma_ops.  This prepares for using the common dma_configure
code later on.

Includes fixes from Michael Ellerman.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries: use the generic iommu bypass code
Christoph Hellwig [Wed, 13 Feb 2019 07:01:07 +0000 (08:01 +0100)]
powerpc/pseries: use the generic iommu bypass code

Use the generic iommu bypass code instead of overriding set_dma_mask.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries: unwind dma_get_required_mask_pSeriesLP a bit
Christoph Hellwig [Wed, 13 Feb 2019 07:01:06 +0000 (08:01 +0100)]
powerpc/pseries: unwind dma_get_required_mask_pSeriesLP a bit

Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops
to simply the code a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: handle iommu bypass in dma_iommu_ops
Christoph Hellwig [Wed, 13 Feb 2019 07:01:05 +0000 (08:01 +0100)]
powerpc/dma: handle iommu bypass in dma_iommu_ops

Add a new iommu_bypass flag to struct dev_archdata so that the dma_iommu
implementation can handle the direct mapping transparently instead of
switiching ops around.  Setting of this flag is controlled by new
pci_controller_ops method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_ops
Christoph Hellwig [Wed, 13 Feb 2019 07:01:04 +0000 (08:01 +0100)]
powerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_ops

vio_dma_mapping_ops currently does a lot of indirect calls through
dma_iommu_ops, which not only make the code harder to follow but are
also expensive in the post-spectre world.  Unwind the indirect calls
by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
just use the dma_iommu_* methods directly where we can.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agodma-direct: we might need GFP_DMA for 32-bit dma masks
Christoph Hellwig [Wed, 13 Feb 2019 07:01:03 +0000 (08:01 +0100)]
dma-direct: we might need GFP_DMA for 32-bit dma masks

If there is no ZONE_DMA32 we might need GFP_DMA to be able to
allocate memory that satisfies a 32-bit DMA mask.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agonet: pasemi: set a 64-bit DMA mask on the DMA device
Christoph Hellwig [Wed, 13 Feb 2019 07:01:02 +0000 (08:01 +0100)]
net: pasemi: set a 64-bit DMA mask on the DMA device

The pasemi driver never set a DMA mask, and given that the powerpc
DMA mapping routines never check it this worked ok so far.  But the
generic dma-direct code which I plan to switch on for powerpc checks
the DMA mask and fails unsupported mapping requests, so we need to
make sure the proper 64-bit mask is set.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()
Michael Ellerman [Thu, 14 Feb 2019 04:00:36 +0000 (15:00 +1100)]
powerpc/64s: Fix possible corruption on big endian due to pgd/pud_present()

In v4.20 we changed our pgd/pud_present() to check for _PAGE_PRESENT
rather than just checking that the value is non-zero, e.g.:

  static inline int pgd_present(pgd_t pgd)
  {
 -       return !pgd_none(pgd);
 +       return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
  }

Unfortunately this is broken on big endian, as the result of the
bitwise & is truncated to int, which is always zero because
_PAGE_PRESENT is 0x8000000000000000ul. This means pgd_present() and
pud_present() are always false at compile time, and the compiler
elides the subsequent code.

Remarkably with that bug present we are still able to boot and run
with few noticeable effects. However under some work loads we are able
to trigger a warning in the ext4 code:

  WARNING: CPU: 11 PID: 29593 at fs/ext4/inode.c:3927 .ext4_set_page_dirty+0x70/0xb0
  CPU: 11 PID: 29593 Comm: debugedit Not tainted 4.20.0-rc1 #1
  ...
  NIP .ext4_set_page_dirty+0x70/0xb0
  LR  .set_page_dirty+0xa0/0x150
  Call Trace:
   .set_page_dirty+0xa0/0x150
   .unmap_page_range+0xbf0/0xe10
   .unmap_vmas+0x84/0x130
   .unmap_region+0xe8/0x190
   .__do_munmap+0x2f0/0x510
   .__vm_munmap+0x80/0x110
   .__se_sys_munmap+0x14/0x30
   system_call+0x5c/0x70

The fix is simple, we need to convert the result of the bitwise & to
an int before returning it.

Thanks to Erhard, Jan Kara and Aneesh for help with debugging.

Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Erhard F. <erhard_f@mailbox.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Escalate reset when IODA reset fails
Oliver O'Halloran [Fri, 1 Feb 2019 00:42:01 +0000 (11:42 +1100)]
powerpc/powernv: Escalate reset when IODA reset fails

The IODA reset is used to flush out any OS controlled state from the PHB.
This reset can fail if a PHB fatal error has occurred in early boot,
probably due to a because of a bad device. We already do a fundemental
reset of the device in some cases, so this patch just adds a test to force
a full reset if firmware reports an error when performing the IODA reset.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/ptrace: Mitigate potential Spectre v1
Breno Leitao [Wed, 30 Jan 2019 12:46:00 +0000 (10:46 -0200)]
powerpc/ptrace: Mitigate potential Spectre v1

'regno' is directly controlled by user space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.

On PTRACE_SETREGS and PTRACE_GETREGS requests, user space passes the
register number that would be read or written. This register number is
called 'regno' which is part of the 'addr' syscall parameter.

This 'regno' value is checked against the maximum pt_regs structure size,
and then used to dereference it, which matches the initial part of a
Spectre v1 (and Spectre v1.1) attack. The dereferenced value, then,
is returned to userspace in the GETREGS case.

This patch sanitizes 'regno' before using it to dereference pt_reg.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Include .branch_lt in data section
Joel Stanley [Wed, 14 Nov 2018 03:02:18 +0000 (13:32 +1030)]
powerpc/32: Include .branch_lt in data section

When building a 32 bit powerpc kernel with Binutils 2.31.1 this warning
is emitted:

 powerpc-linux-gnu-ld: warning: orphan section `.branch_lt' from
 `arch/powerpc/kernel/head_44x.o' being placed in section `.branch_lt'

As of binutils commit 2d7ad24e8726 ("Support PLT16 relocs against local
symbols")[1], 32 bit targets can produce .branch_lt sections in their
output.

Include these symbols in the .data section as the ppc64 kernel does.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=2d7ad24e8726ba4c45c9e67be08223a146a837ce
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Alan Modra <amodra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh: Correct retries in eeh_pe_reset_full()
Sam Bobroff [Thu, 29 Nov 2018 03:16:42 +0000 (14:16 +1100)]
powerpc/eeh: Correct retries in eeh_pe_reset_full()

Currently, eeh_pe_reset_full() will only attempt to reset a PE more
than once if activating the reset state and deactivating it both
succeed, but later polling shows that it hasn't become active.

Change this so that it will try up to three times for any reason other
than an unrecoverable slot error and adjust the message generation so
that it's clear weather the reset has ultimately succeeded or failed.
This allows the reset to succeed in some situations where it would
currently fail.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/eeh: Improve recovery of passed-through devices
Sam Bobroff [Thu, 29 Nov 2018 03:16:41 +0000 (14:16 +1100)]
powerpc/eeh: Improve recovery of passed-through devices

Currently, the EEH recovery process considers passed-through devices
as if they were not EEH-aware, which can cause them to be removed as
part of recovery.  Because device removal requires cooperation from
the guest, this may lead to the process stalling or deadlocking.
Also, if devices are removed on the host side, they will be removed
from their IOMMU group, making recovery in the guest impossible.

Therefore, alter the recovery process so that passed-through devices
are not removed but are instead left frozen (and marked isolated)
until the guest performs it's own recovery.  If firmware thaws a
passed-through PE because it's parent PE has been thawed (because it
was not passed through), re-freeze it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>