Ravi Bangoria [Thu, 17 Oct 2019 09:32:04 +0000 (15:02 +0530)]
powerpc/watchpoint: Support for 8xx in ptrace-hwbreak.c selftest
On the 8xx, signals are generated after executing the instruction. So
no need to manually single-step on 8xx. Also, 8xx __set_dabr()
currently ignores length and hardcodes the length to 8 bytes. So all
unaligned and 512 byte testcase will fail on 8xx. Ignore those
testcases on 8xx.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-8-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:32:03 +0000 (15:02 +0530)]
powerpc/watchpoint: Add DAR outside test in perf-hwbreak.c selftest
So far we used to ignore exception if DAR points outside of user
specified range. But now we are ignoring it only if actual load/store
range does not overlap with user specified range. Include selftests
for the same:
# ./tools/testing/selftests/powerpc/ptrace/perf-hwbreak
...
TESTED: No overlap
TESTED: Partial overlap
TESTED: Partial overlap
TESTED: No overlap
TESTED: Full overlap
success: perf_hwbreak
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-7-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:32:02 +0000 (15:02 +0530)]
selftests/powerpc: Rewrite ptrace-hwbreak.c selftest
ptrace-hwbreak.c selftest is logically broken. On powerpc, when
watchpoint is created with ptrace, signals are generated before
executing the instruction and user has to manually singlestep the
instruction with watchpoint disabled, which selftest never does and
thus it keeps on getting the signal at the same instruction. If we fix
it, selftest fails because the logical connection between
tracer(parent) and tracee(child) is also broken. Rewrite the selftest
and add new tests for unaligned access.
With patch:
$ ./tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak
test: ptrace-hwbreak
tags: git_version:
powerpc-5.3-4-224-g218b868240c7-dirty
PTRACE_SET_DEBUGREG, WO, len: 1: Ok
PTRACE_SET_DEBUGREG, WO, len: 2: Ok
PTRACE_SET_DEBUGREG, WO, len: 4: Ok
PTRACE_SET_DEBUGREG, WO, len: 8: Ok
PTRACE_SET_DEBUGREG, RO, len: 1: Ok
PTRACE_SET_DEBUGREG, RO, len: 2: Ok
PTRACE_SET_DEBUGREG, RO, len: 4: Ok
PTRACE_SET_DEBUGREG, RO, len: 8: Ok
PTRACE_SET_DEBUGREG, RW, len: 1: Ok
PTRACE_SET_DEBUGREG, RW, len: 2: Ok
PTRACE_SET_DEBUGREG, RW, len: 4: Ok
PTRACE_SET_DEBUGREG, RW, len: 8: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, WO, len: 1: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RO, len: 1: Ok
PPC_PTRACE_SETHWDEBUG, MODE_EXACT, RW, len: 1: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, WO, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RO, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW ALIGNED, RW, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, WO, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RO, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, RW, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, MODE_RANGE, DW UNALIGNED, DAR OUTSIDE, RW, len: 6: Ok
PPC_PTRACE_SETHWDEBUG, DAWR_MAX_LEN, RW, len: 512: Ok
success: ptrace-hwbreak
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-6-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:32:01 +0000 (15:02 +0530)]
powerpc/watchpoint: Don't ignore extraneous exceptions blindly
On powerpc, watchpoint match range is double-word granular. On a
watchpoint hit, DAR is set to the first byte of overlap between actual
access and watched range. And thus it's quite possible that DAR does
not point inside user specified range. Ex, say user creates a
watchpoint with address range 0x1004 to 0x1007. So hw would be
configured to watch from 0x1000 to 0x1007. If there is a 4 byte access
from 0x1002 to 0x1005, DAR will point to 0x1002 and thus interrupt
handler considers it as extraneous, but it's actually not, because
part of the access belongs to what user has asked.
Instead of blindly ignoring the exception, get actual address range by
analysing an instruction, and ignore only if actual range does not
overlap with user specified range.
Note: The behavior is unchanged for 8xx.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-5-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:32:00 +0000 (15:02 +0530)]
powerpc/watchpoint: Fix ptrace code that muck around with address/len
ptrace_set_debugreg() does not consider new length while overwriting
the watchpoint. Fix that. ppc_set_hwdebug() aligns watchpoint address
to doubleword boundary but does not change the length. If address
range is crossing doubleword boundary and length is less then 8, we
will lose samples from second doubleword. So fix that as well.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-4-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:31:59 +0000 (15:01 +0530)]
powerpc/watchpoint: Fix length calculation for unaligned target
Watchpoint match range is always doubleword(8 bytes) aligned on
powerpc. If the given range is crossing doubleword boundary, we need
to increase the length such that next doubleword also get
covered. Ex,
address len = 6 bytes
|=========.
|------------v--|------v--------|
| | | | | | | | | | | | | | | | |
|---------------|---------------|
<---8 bytes--->
In such case, current code configures hw as:
start_addr = address & ~HW_BREAKPOINT_ALIGN
len = 8 bytes
And thus read/write in last 4 bytes of the given range is ignored.
Fix this by including next doubleword in the length.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-3-ravi.bangoria@linux.ibm.com
Ravi Bangoria [Thu, 17 Oct 2019 09:31:58 +0000 (15:01 +0530)]
powerpc/watchpoint: Introduce macros for watchpoint length
We are hadrcoding length everywhere in the watchpoint code. Introduce
macros for the length and use them.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-2-ravi.bangoria@linux.ibm.com
Gustavo L. F. Walbon [Thu, 2 May 2019 21:09:07 +0000 (18:09 -0300)]
powerpc/security: Fix wrong message when RFI Flush is disable
The issue was showing "Mitigation" message via sysfs whatever the
state of "RFI Flush", but it should show "Vulnerable" when it is
disabled.
If you have "L1D private" feature enabled and not "RFI Flush" you are
vulnerable to meltdown attacks.
"RFI Flush" is the key feature to mitigate the meltdown whatever the
"L1D private" state.
SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only.
So the message should be as the truth table shows:
CPU | L1D private | RFI Flush | sysfs
----|-------------|-----------|-------------------------------------
P9 | False | False | Vulnerable
P9 | False | True | Mitigation: RFI Flush
P9 | True | False | Vulnerable: L1D private per thread
P9 | True | True | Mitigation: RFI Flush, L1D private per thread
P8 | False | False | Vulnerable
P8 | False | True | Mitigation: RFI Flush
Output before this fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: L1D private per thread
Output after fix:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush, L1D private per thread
# echo 0 > /sys/kernel/debug/powerpc/rfi_flush
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Vulnerable: L1D private per thread
Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com>
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com
Chris Smart [Sun, 3 Nov 2019 23:33:56 +0000 (23:33 +0000)]
powerpc/crypto: Add cond_resched() in crc-vpmsum self-test
The stress test for vpmsum implementations executes a long for loop in
the kernel. This blocks the scheduler, which prevents other tasks from
running, resulting in a warning.
This fix adds a call to cond_reshed() at the end of each loop, which
allows the scheduler to run other tasks as required.
Signed-off-by: Chris Smart <chris.smart@humanservices.gov.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191103233356.5472-1-chris.smart@humanservices.gov.au
David Hildenbrand [Thu, 31 Oct 2019 14:29:31 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Simulation mode
Let's allow to test the implementation without needing HW support.
When "simulate=1" is specified when loading the module, we bypass all
HW checks and HW calls. The sysfs file "simulate_loan_target_kb" can
be used to simulate HW requests.
The simualtion mode can be activated using:
modprobe cmm debug=1 simulate=1
And the requested loan target can be changed using:
echo X > /sys/devices/system/cmm/cmm0/simulate_loan_target_kb
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-11-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:30 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Switch to balloon_page_alloc()
balloon_page_alloc() will use GFP_HIGHUSER_MOVABLE in case we have
CONFIG_BALLOON_COMPACTION. This is now possible, as balloon pages are
movable with CONFIG_BALLOON_COMPACTION. Without
CONFIG_BALLOON_COMPACTION, GFP_HIGHUSER is used.
Note that apart from that, balloon_page_alloc() uses the following
flags:
__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN
And current code used:
GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC
GFP_HIGHUSER/GFP_HIGHUSER_MOVABLE include
__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM
GFP_NOIO is __GFP_RECLAIM.
With CONFIG_BALLOON_COMPACTION, we essentially add:
__GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM | __GFP_MOVABLE
Without CONFIG_BALLOON_COMPACTION, we essentially add:
__GFP_IO | __GFP_FS | __GFP_HARDWALL | __GFP_HIGHMEM
I assume this is fine, as this is what all other balloon compaction
users use. If it turns out to be a problem, we could add __GFP_MOVABLE
manually if we have CONFIG_BALLOON_COMPACTION.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-10-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:29 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Implement balloon compaction
We can now get rid of the cmm_lock and completely rely on the balloon
compaction internals, which now also manage the page list and the
lock.
Inflated/"loaned" pages are now movable. Memory blocks that contain
such pages can get offlined. Also, all such pages will be marked
PageOffline() and can therefore be excluded in memory dumps using
recent versions of makedumpfile.
Don't switch to balloon_page_alloc() yet (due to the GFP_NOIO). Will
do that separately to discuss this change in detail.
Signed-off-by: David Hildenbrand <david@redhat.com>
[mpe: Add isolated_pages-- in cmm_migratepage() as suggested by David]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-9-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:28 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Convert loaned_pages to an atomic_long_t
When switching to balloon compaction, we want to drop the cmm_lock and
completely rely on the balloon compaction list lock internally.
loaned_pages is currently protected under the cmm_lock.
Note: Right now cmm_alloc_pages() and cmm_free_pages() can be called
at the same time, e.g., via the thread and a concurrent OOM notifier.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-8-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:27 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Rip out memory isolate notifier
The memory isolate notifier was added to allow to offline memory
blocks that contain inflated/"loaned" pages. We can achieve the same
using the balloon compaction framework.
Get rid of the memory isolate notifier. Also, we can get rid of
cmm_mem_going_offline(), as we will never reach that code path now
when we have allocated memory in the balloon (allocated pages are
unmovable and will no longer be special-cased using the memory
isolation notifier).
Leave the memory notifier in place, so we can still back off in case
memory gets offlined.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-7-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:26 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Use adjust_managed_page_count() insted of totalram_pages_*
adjust_managed_page_count() performs a totalram_pages_add(), but also
adjusts the managed pages of the zone. Let's use that instead, similar
to virtio-balloon. Use it before freeing a page.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-6-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:25 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Drop page array
We can simply store the pages in a list (page->lru), no need for a
separate data structure (+ complicated handling). This is how most
other balloon drivers store allocated pages without additional
tracking data.
For the notifiers, use page_to_pfn() to check if a page is in the
applicable range. Use page_to_phys() in plpar_page_set_loaned() and
plpar_page_set_active() (I assume due to the __pa() that's the right
thing to do).
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-5-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:24 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Cleanup rc handling in cmm_init()
No need to initialize rc. Also, let's return 0 directly when
succeeding.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-4-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:23 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Report errors when registering notifiers fails
If we don't set the rc, we will return "0", making it look like we
succeeded.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-3-david@redhat.com
David Hildenbrand [Thu, 31 Oct 2019 14:29:22 +0000 (15:29 +0100)]
powerpc/pseries/cmm: Implement release() function for sysfs device
When unloading the module, one gets
------------[ cut here ]------------
Device 'cmm0' does not have a release() function, it is broken and must be fixed. See Documentation/kobject.txt.
WARNING: CPU: 0 PID: 19308 at drivers/base/core.c:1244 .device_release+0xcc/0xf0
...
We only have one static fake device. There is nothing to do when
releasing the device (via cmm_exit()).
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191031142933.10779-2-david@redhat.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:37 +0000 (23:21 -0600)]
powerpc/pseries: Enable support for ibm,drc-info property
Advertise client support for the PAPR architected ibm,drc-info device
tree property during CAS handshake.
Fixes: c7a3275e0f9e ("powerpc/pseries: Revert support for ibm,drc-info devtree property")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-11-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:36 +0000 (23:21 -0600)]
PCI: rpaphp: Correctly match ibm, my-drc-index to drc-name when using drc-info
The newer ibm,drc-info property is a condensed description of the old
ibm,drc-* properties (ie. names, types, indexes, and power-domains).
When matching a drc-index to a drc-name we need to verify that the
index is within the start and last drc-index range and map it to a
drc-name using the drc-name-prefix and logical index.
Fix the mapping by checking that the index is within the range of the
current drc-info entry, and build the name from the drc-name-prefix
concatenated with the starting drc-name-suffix value and the sequential
index obtained by subtracting ibm,my-drc-index from this entries
drc-start-index.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-10-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:35 +0000 (23:21 -0600)]
PCI: rpaphp: Annotate and correctly byte swap DRC properties
The device tree is in big endian format and any properties directly
retrieved using OF helpers that don't explicitly byte swap should
be annotated. In particular there are several places where we grab
the opaque property value for the old ibm,drc-* properties and the
ibm,my-drc-index property.
Fix this for better static checking by annotating values we know to
explicitly big endian, and byte swap where appropriate.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-9-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:33 +0000 (23:21 -0600)]
PCI: rpaphp: Add drc-info support for hotplug slot registration
Split physical PCI slot registration scanning into separate routines
that support the old ibm,drc-* properties and one that supports the
new compressed ibm,drc-info property.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-7-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:32 +0000 (23:21 -0600)]
PCI: rpaphp: Don't rely on firmware feature to imply drc-info support
In the event that the partition is migrated to a platform with older
firmware that doesn't support the ibm,drc-info property the device
tree is modified to remove the ibm,drc-info property and replace it
with the older style ibm,drc-* properties for types, names, indexes,
and power-domains. One of the requirements of the drc-info firmware
feature is that the client is able to handle both the new property,
and old style properties at runtime. Therefore we can't rely on the
firmware feature alone to dictate which property is currently
present in the device tree.
Fix this short coming by checking explicitly for the ibm,drc-info
property, and falling back to the older ibm,drc-* properties if it
doesn't exist.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-6-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:31 +0000 (23:21 -0600)]
PCI: rpaphp: Fix up pointer to first drc-info entry
The first entry of the ibm,drc-info property is an int encoded count
of the number of drc-info entries that follow. The "value" pointer
returned by of_prop_next_u32() is still pointing at the this value
when we call of_read_drc_info_cell(), but the helper function
expects that value to be pointing at the first element of an entry.
Fix up by incrementing the "value" pointer to point at the first
element of the first drc-info entry prior.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-5-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:30 +0000 (23:21 -0600)]
powerpc/pseries: Add cpu DLPAR support for drc-info property
Older firmwares provided information about Dynamic Reconfig
Connectors (DRC) through several device tree properties, namely
ibm,drc-types, ibm,drc-indexes, ibm,drc-names, and
ibm,drc-power-domains. New firmwares have the ability to present this
same information in a much condensed format through a device tree
property called ibm,drc-info.
The existing cpu DLPAR hotplug code only understands the older DRC
property format when validating the drc-index of a cpu during a
hotplug add. This updates those code paths to use the ibm,drc-info
property, when present, instead for validation.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-4-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:29 +0000 (23:21 -0600)]
powerpc/pseries: Fix drc-info mappings of logical cpus to drc-index
There are a couple subtle errors in the mapping between cpu-ids and a
cpus associated drc-index when using the new ibm,drc-info property.
The first is that while drc-info may have been a supported firmware
feature at boot it is possible we have migrated to a CEC with older
firmware that doesn't support the ibm,drc-info property. In that case
the device tree would have been updated after migration to remove the
ibm,drc-info property and replace it with the older style ibm,drc-*
properties for types, indexes, names, and power-domains. PAPR even
goes as far as dictating that if we advertise support for drc-info
that we are capable of supporting either property type at runtime.
The second is that the first value of the ibm,drc-info property is
the int encoded count of drc-info entries. As such "value" returned
by of_prop_next_u32() is pointing at that count, and not the first
element of the first drc-info entry as is expected by the
of_read_drc_info_cell() helper.
Fix the first by ignoring DRC-INFO firmware feature and instead
testing directly for ibm,drc-info, and then falling back to the
old style ibm,drc-indexes in the case it doesn't exit.
Fix the second by incrementing value to the next element prior to
parsing drc-info entries.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-3-git-send-email-tyreld@linux.ibm.com
Tyrel Datwyler [Mon, 11 Nov 2019 05:21:28 +0000 (23:21 -0600)]
powerpc/pseries: Fix bad drc_index_start value parsing of drc-info entry
The ibm,drc-info property is an array property that contains drc-info
entries such that each entry is made up of 2 string encoded elements
followed by 5 int encoded elements. The of_read_drc_info_cell()
helper contains comments that correctly name the expected elements
and their encoding. However, the usage of of_prop_next_string() and
of_prop_next_u32() introduced a subtle skippage of the first u32.
This is a result of of_prop_next_string() returning a pointer to the
next property value which is not a string, but actually a (__be32 *).
As, a result the following call to of_prop_next_u32() passes over the
current int encoded value and actually stores the next one wrongly.
Simply endian swap the current value in place after reading the first
two string values. The remaining int encoded values can then be read
correctly using of_prop_next_u32().
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-2-git-send-email-tyreld@linux.ibm.com
Michael Ellerman [Wed, 13 Nov 2019 05:52:25 +0000 (16:52 +1100)]
Merge branch 'topic/secureboot' into next
Merge the secureboot support, as well as the IMA changes needed to
support it.
From Nayna's cover letter:
In order to verify the OS kernel on PowerNV systems, secure boot
requires X.509 certificates trusted by the platform. These are
stored in secure variables controlled by OPAL, called OPAL secure
variables. In order to enable users to manage the keys, the secure
variables need to be exposed to userspace.
OPAL provides the runtime services for the kernel to be able to
access the secure variables. This patchset defines the kernel
interface for the OPAL APIs. These APIs are used by the hooks, which
load these variables to the keyring and expose them to the userspace
for reading/writing.
Overall, this patchset adds the following support:
* expose secure variables to the kernel via OPAL Runtime API interface
* expose secure variables to the userspace via kernel sysfs interface
* load kernel verification and revocation keys to .platform and
.blacklist keyring respectively.
The secure variables can be read/written using simple linux
utilities cat/hexdump.
For example:
Path to the secure variables is: /sys/firmware/secvar/vars
Each secure variable is listed as directory.
$ ls -l
total 0
drwxr-xr-x. 2 root root 0 Aug 20 21:20 db
drwxr-xr-x. 2 root root 0 Aug 20 21:20 KEK
drwxr-xr-x. 2 root root 0 Aug 20 21:20 PK
The attributes of each of the secure variables are (for example: PK):
$ ls -l
total 0
-r--r--r--. 1 root root 4096 Oct 1 15:10 data
-r--r--r--. 1 root root 65536 Oct 1 15:10 size
--w-------. 1 root root 4096 Oct 1 15:12 update
The "data" is used to read the existing variable value using
hexdump. The data is stored in ESL format. The "update" is used to
write a new value using cat. The update is to be submitted as AUTH
file.
Nayna Jain [Mon, 11 Nov 2019 03:10:36 +0000 (21:10 -0600)]
powerpc: Load firmware trusted keys/hashes into kernel keyring
The keys used to verify the Host OS kernel are managed by firmware as
secure variables. This patch loads the verification keys into the
.platform keyring and revocation hashes into .blacklist keyring. This
enables verification and loading of the kernels signed by the boot
time keys which are trusted by firmware.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Search by compatible in load_powerpc_certs(), not using format]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-5-git-send-email-nayna@linux.ibm.com
Nayna Jain [Mon, 11 Nov 2019 03:10:35 +0000 (21:10 -0600)]
x86/efi: move common keyring handler functions to new file
The handlers to add the keys to the .platform keyring and blacklisted
hashes to the .blacklist keyring is common for both the uefi and powerpc
mechanisms of loading the keys/hashes from the firmware.
This patch moves the common code from load_uefi.c to keyring_handler.c
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Acked-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-4-git-send-email-nayna@linux.ibm.com
Nayna Jain [Mon, 11 Nov 2019 03:10:34 +0000 (21:10 -0600)]
powerpc: expose secure variables to userspace via sysfs
PowerNV secure variables, which store the keys used for OS kernel
verification, are managed by the firmware. These secure variables need to
be accessed by the userspace for addition/deletion of the certificates.
This patch adds the sysfs interface to expose secure variables for PowerNV
secureboot. The users shall use this interface for manipulating
the keys stored in the secure variables.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-3-git-send-email-nayna@linux.ibm.com
Nayna Jain [Mon, 11 Nov 2019 03:10:33 +0000 (21:10 -0600)]
powerpc/powernv: Add OPAL API interface to access secure variable
The X.509 certificates trusted by the platform and required to secure
boot the OS kernel are wrapped in secure variables, which are
controlled by OPAL.
This patch adds firmware/kernel interface to read and write OPAL
secure variables based on the unique key.
This support can be enabled using CONFIG_OPAL_SECVAR.
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Make secvar_ops __ro_after_init, only build opal-secvar.c if PPC_SECURE_BOOT=y]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-2-git-send-email-nayna@linux.ibm.com
Nayna Jain [Tue, 1 Oct 2019 23:37:18 +0000 (19:37 -0400)]
sysfs: Fixes __BIN_ATTR_WO() macro
This patch fixes the size and write parameter for the macro
__BIN_ATTR_WO().
Fixes: 7f905761e15a8 ("sysfs: add BIN_ATTR_WO() macro")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1569973038-2710-1-git-send-email-nayna@linux.ibm.com
Michael Ellerman [Tue, 12 Nov 2019 13:32:03 +0000 (00:32 +1100)]
Merge branch 'topic/ima' into topic/secureboot
From Nayna's cover letter:
The IMA subsystem supports custom, built-in, arch-specific policies
to define the files to be measured and appraised. These policies are
honored based on priority, where arch-specific policy is the highest
and custom is the lowest.
PowerNV systems use a Linux-based bootloader to kexec the OS. The
bootloader kernel relies on IMA for signature verification of the OS
kernel before doing the kexec. This patchset adds support for
powerpc arch-specific IMA policies that are conditionally defined
based on a system's secure boot and trusted boot states. The OS
secure boot and trusted boot states are determined via device-tree
properties.
The verification needs to be performed only for binaries that are
not blacklisted. The kernel currently only checks against the
blacklist of keys. However, doing so results in blacklisting all the
binaries that are signed by the same key. In order to prevent just
one particular binary from being loaded, it must be checked against
a blacklist of binary hashes. This patchset also adds support to IMA
for checking against a hash blacklist for files. signed by appended
signature.
Mimi Zohar [Thu, 31 Oct 2019 03:31:34 +0000 (23:31 -0400)]
powerpc/ima: Indicate kernel modules appended signatures are enforced
The arch specific kernel module policy rule requires kernel modules to
be signed, either as an IMA signature, stored as an xattr, or as an
appended signature. As a result, kernel modules appended signatures
could be enforced without "sig_enforce" being set or reflected in
/sys/module/module/parameters/sig_enforce. This patch sets
"sig_enforce".
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-10-git-send-email-zohar@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:33 +0000 (23:31 -0400)]
powerpc/ima: Update ima arch policy to check for blacklist
This patch updates the arch-specific policies for PowerNV system to
make sure that the binary hash is not blacklisted.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-9-git-send-email-zohar@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:32 +0000 (23:31 -0400)]
ima: Check against blacklisted hashes for files with modsig
Asymmetric private keys are used to sign multiple files. The kernel
currently supports checking against blacklisted keys. However, if the
public key is blacklisted, any file signed by the blacklisted key will
automatically fail signature verification. Blacklisting the public key
is not fine enough granularity, as we might want to only blacklist a
particular file.
This patch adds support for checking against the blacklisted hash of
the file, without the appended signature, based on the IMA policy. It
defines a new policy option "appraise_flag=check_blacklist".
In addition to the blacklisted binary hashes stored in the firmware
"dbx" variable, the Linux kernel may be configured to load blacklisted
binary hashes onto the .blacklist keyring as well. The following
example shows how to blacklist a specific kernel module hash.
$ sha256sum kernel/kheaders.ko
77fa889b35a05338ec52e51591c1b89d4c8d1c99a21251d7c22b1a8642a6bad3
kernel/kheaders.ko
$ grep BLACKLIST .config
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_HASH_LIST="blacklist-hash-list"
$ cat certs/blacklist-hash-list
"bin:
77fa889b35a05338ec52e51591c1b89d4c8d1c99a21251d7c22b1a8642a6bad3"
Update the IMA custom measurement and appraisal policy
rules (/etc/ima-policy):
measure func=MODULE_CHECK template=ima-modsig
appraise func=MODULE_CHECK appraise_flag=check_blacklist
appraise_type=imasig|modsig
After building, installing, and rebooting the kernel:
545660333 ---lswrv 0 0 \_ blacklist:
bin:
77fa889b35a05338ec52e51591c1b89d4c8d1c99a21251d7c22b1a8642a6bad3
measure func=MODULE_CHECK template=ima-modsig
appraise func=MODULE_CHECK appraise_flag=check_blacklist
appraise_type=imasig|modsig
modprobe: ERROR: could not insert 'kheaders': Permission denied
10
0c9834db5a0182c1fb0cdc5d3adcf11a11fd83dd ima-sig
sha256:
3bc6ed4f0b4d6e31bc1dbc9ef844605abc7afdc6d81a57d77a1ec9407997c40
2 /usr/lib/modules/5.4.0-rc3+/kernel/kernel/kheaders.ko
10
82aad2bcc3fa8ed94762356b5c14838f3bcfa6a0 ima-modsig
sha256:
3bc6ed4f0b4d6e31bc1dbc9ef844605abc7afdc6d81a57d77a1ec9407997c40
2 /usr/lib/modules/5.4.0rc3+/kernel/kernel/kheaders.ko sha256:
77fa889b3
5a05338ec52e51591c1b89d4c8d1c99a21251d7c22b1a8642a6bad3
3082029a06092a864886f70d010702a082028b30820287020101310d300b0609608648
016503040201300b06092a864886f70d01070131820264....
10
25b72217cc1152b44b134ce2cd68f12dfb71acb3 ima-buf
sha256:
8b58427fedcf8f4b20bc8dc007f2e232bf7285d7b93a66476321f9c2a3aa132
b blacklisted-hash
77fa889b35a05338ec52e51591c1b89d4c8d1c99a21251d7c22b1a8642a6bad3
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
[zohar@linux.ibm.com: updated patch description]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-8-git-send-email-zohar@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:31 +0000 (23:31 -0400)]
certs: Add wrapper function to check blacklisted binary hash
The -EKEYREJECTED error returned by existing is_hash_blacklisted() is
misleading when called for checking against blacklisted hash of a
binary.
This patch adds a wrapper function is_binary_blacklisted() to return
-EPERM error if binary is blacklisted.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-7-git-send-email-zohar@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:30 +0000 (23:31 -0400)]
ima: Make process_buffer_measurement() generic
process_buffer_measurement() is limited to measuring the kexec boot
command line. This patch makes process_buffer_measurement() more
generic, allowing it to measure other types of buffer data (e.g.
blacklisted binary hashes or key hashes).
process_buffer_measurement() may be called directly from an IMA hook
or as an auxiliary measurement record. In both cases the buffer
measurement is based on policy. This patch modifies the function to
conditionally retrieve the policy defined PCR and template for the IMA
hook case.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
[zohar@linux.ibm.com: added comment in process_buffer_measurement()]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-6-git-send-email-zohar@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:29 +0000 (23:31 -0400)]
powerpc/ima: Define trusted boot policy
This patch defines an arch-specific trusted boot only policy and a
combined secure and trusted boot policy.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-5-git-send-email-zohar@linux.ibm.com
Nayna Jain [Tue, 5 Nov 2019 23:02:07 +0000 (17:02 -0600)]
powerpc: Detect the trusted boot state of the system
While secure boot permits only properly verified signed kernels to be
booted, trusted boot calculates the file hash of the kernel image and
stores the measurement prior to boot, that can be subsequently
compared against good known values via attestation services.
This patch reads the trusted boot state of a PowerNV system. The state
is used to conditionally enable additional measurement rules in the
IMA arch-specific policies.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e9eeee6b-b9bf-1e41-2954-61dbd6fbfbcf@linux.ibm.com
Nayna Jain [Thu, 31 Oct 2019 03:31:27 +0000 (23:31 -0400)]
powerpc/ima: Add support to initialize ima policy rules
PowerNV systems use a Linux-based bootloader, which rely on the IMA
subsystem to enforce different secure boot modes. Since the
verification policy may differ based on the secure boot mode of the
system, the policies must be defined at runtime.
This patch implements arch-specific support to define IMA policy rules
based on the runtime secure boot mode of the system.
This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
config is enabled.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-3-git-send-email-zohar@linux.ibm.com
Nayna Jain [Tue, 5 Nov 2019 23:00:22 +0000 (17:00 -0600)]
powerpc: Detect the secure boot mode of the system
This patch defines a function to detect the secure boot state of a
PowerNV system.
The PPC_SECURE_BOOT config represents the base enablement of secure
boot for powerpc.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Fold in change from Nayna to add "ibm,secureboot" to ids]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/46b003b9-3225-6bf7-9101-ed6580bb748c@linux.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:58 +0000 (13:32 +1100)]
powerpc: Don't flush caches when adding memory
This operation takes a significant amount of time when hotplugging
large amounts of memory (~50 seconds with 890GB of persistent memory).
This was orignally in commit
fb5924fddf9e
("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace,
but the flush on add is not needed as it is flushed on remove.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-7-alastair@au1.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:57 +0000 (13:32 +1100)]
powerpc: Chunk calls to flush_dcache_range in arch_*_memory
When presented with large amounts of memory being hotplugged
(in my test case, ~890GB), the call to flush_dcache_range takes
a while (~50 seconds), triggering RCU stalls.
This patch breaks up the call into 1GB chunks, calling
cond_resched() inbetween to allow the scheduler to run.
Fixes: fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug")
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-6-alastair@au1.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:56 +0000 (13:32 +1100)]
powerpc: Convert flush_icache_range & friends to C
Similar to commit
22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()
This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.
By converting these functions to C, it becomes easier to maintain.
flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Minor fixups, don't export __flush_dcache_icache()]
Link: https://lore.kernel.org/r/20191104023305.9581-5-alastair@au1.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:55 +0000 (13:32 +1100)]
powerpc: define helpers to get L1 icache sizes
This patch adds helpers to retrieve icache sizes, and renames the existing
helpers to make it clear that they are for dcache.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-4-alastair@au1.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:54 +0000 (13:32 +1100)]
powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
When calling __kernel_sync_dicache with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-3-alastair@au1.ibm.com
Alastair D'Silva [Mon, 4 Nov 2019 02:32:53 +0000 (13:32 +1100)]
powerpc: Allow flush_icache_range to work across ranges >4GB
When calling flush_icache_range with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104023305.9581-2-alastair@au1.ibm.com
Chris Packham [Thu, 1 Aug 2019 22:50:06 +0000 (10:50 +1200)]
powerpc: Support CMDLINE_EXTEND
Bring powerpc in line with other architectures that support extending or
overriding the bootloader provided command line.
The current behaviour is most like CMDLINE_FROM_BOOTLOADER where the
bootloader command line is preferred but the kernel config can provide a
fallback so CMDLINE_FROM_BOOTLOADER is the default. CMDLINE_EXTEND can
be used to append the CMDLINE from the kernel config to the one provided
by the bootloader.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190801225006.21952-1-chris.packham@alliedtelesis.co.nz
Michael Ellerman [Wed, 6 Nov 2019 02:30:25 +0000 (13:30 +1100)]
powerpc/64s: Always disable branch profiling for prom_init.o
Otherwise the build fails because prom_init is calling symbols it's
not allowed to, eg:
Error: External symbol 'ftrace_likely_update' referenced from prom_init.c
make[3]: *** [arch/powerpc/kernel/Makefile:197: arch/powerpc/kernel/prom_init_check] Error 1
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191106051129.7626-1-mpe@ellerman.id.au
Rasmus Villemoes [Fri, 2 Nov 2018 21:17:06 +0000 (22:17 +0100)]
macintosh: ans-lcd: make anslcd_logo static and __initconst
This variable has no reason to have external linkage, and since it is
only used in an __init function, it might as well be made __initconst
also.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20181102211707.10229-1-linux@rasmusvillemoes.dk
Geert Uytterhoeven [Mon, 21 Oct 2019 14:23:09 +0000 (16:23 +0200)]
powerpc/security: Fix debugfs data leak on 32-bit
"powerpc_security_features" is "unsigned long", i.e. 32-bit or 64-bit,
depending on the platform (PPC_FSL_BOOK3E or PPC_BOOK3S_64). Hence
casting its address to "u64 *", and calling debugfs_create_x64() is
wrong, and leaks 32-bit of nearby data to userspace on 32-bit platforms.
While all currently defined SEC_FTR_* security feature flags fit in
32-bit, they all have "ULL" suffixes to make them 64-bit constants.
Hence fix the leak by changing the type of "powerpc_security_features"
(and the parameter types of its accessors) to "u64". This also allows
to drop the cast.
Fixes: 398af571128fe75f ("powerpc/security: Show powerpc_security_features in debugfs")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191021142309.28105-1-geert+renesas@glider.be
Aneesh Kumar K.V [Tue, 1 Oct 2019 08:46:56 +0000 (14:16 +0530)]
powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
With large memory (8TB and more) hotplug, we can get soft lockup
warnings as below. These were caused by a long loop without any
explicit cond_resched which is a problem for !PREEMPT kernels.
Avoid this using cond_resched() while inserting hash page table
entries. We already do similar cond_resched() in __add_pages(), see
commit
f64ac5e6e306 ("mm, memory_hotplug: add scheduling point to
__add_pages").
rcu: 3-....: (24002 ticks this GP) idle=13e/1/0x4000000000000002 softirq=722/722 fqs=12001
(t=24003 jiffies g=4285 q=2002)
NMI backtrace for cpu 3
CPU: 3 PID: 3870 Comm: ndctl Not tainted 5.3.0-197.18-default+ #2
Call Trace:
dump_stack+0xb0/0xf4 (unreliable)
nmi_cpu_backtrace+0x124/0x130
nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
arch_trigger_cpumask_backtrace+0x28/0x3c
rcu_dump_cpu_stacks+0xf8/0x154
rcu_sched_clock_irq+0x878/0xb40
update_process_times+0x48/0x90
tick_sched_handle.isra.16+0x4c/0x80
tick_sched_timer+0x68/0xe0
__hrtimer_run_queues+0x180/0x430
hrtimer_interrupt+0x110/0x300
timer_interrupt+0x108/0x2f0
decrementer_common+0x114/0x120
--- interrupt: 901 at arch_add_memory+0xc0/0x130
LR = arch_add_memory+0x74/0x130
memremap_pages+0x494/0x650
devm_memremap_pages+0x3c/0xa0
pmem_attach_disk+0x188/0x750
nvdimm_bus_probe+0xac/0x2c0
really_probe+0x148/0x570
driver_probe_device+0x19c/0x1d0
device_driver_attach+0xcc/0x100
bind_store+0x134/0x1c0
drv_attr_store+0x44/0x60
sysfs_kf_write+0x64/0x90
kernfs_fop_write+0x1a0/0x270
__vfs_write+0x3c/0x70
vfs_write+0xd0/0x260
ksys_write+0xdc/0x130
system_call+0x5c/0x68
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001084656.31277-1-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 07:58:01 +0000 (13:28 +0530)]
powerpc/mm/book3s64/radix: Flush the full mm even when need_flush_all is set
With the previous patch, we should now not be using need_flush_all for
powerpc. But then make sure we force a PID tlbie flush with RIC=2 if
we ever find need_flush_all set. Also don't reset it after a mmu
gather flush.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024075801.22434-3-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 07:58:00 +0000 (13:28 +0530)]
powerpc/mm/book3s64/radix: Use freed_tables instead of need_flush_all
With commit
22a61c3c4f13 ("asm-generic/tlb: Track freeing of
page-table directories in struct mmu_gather") we now track whether we
freed page table in mmu_gather. Use that to decide whether to flush
Page Walk Cache.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024075801.22434-2-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 07:57:59 +0000 (13:27 +0530)]
powerpc/mm/book3s64/radix: Remove unused code.
mm_tlb_flush_nested change was added in the mmu gather tlb flush to
handle the case of parallel pte invalidate happening with mmap_sem
held in read mode. This fix was done by commit
02390f66bd23 ("powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush
miss problem with THP") and the problem is explained in detail in
commit
99baac21e458 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss
problem")
This was later updated by commit
7a30df49f63a ("mm: mmu_gather: remove
__tlb_reset_range() for force flush") to do a full mm flush rather
than a range flush. By commit
dd2283f2605e ("mm: mmap: zap pages with
read mmap_sem in munmap") we are also now allowing a page table free
in mmap_sem read mode which means we should do a PWC flush too. Our
current full mm flush imply a PWC flush.
With all the above change the mm_tlb_flush_nested(mm) branch in
radix__tlb_flush will never be taken because for the nested case we
would have taken the if (tlb->fullmm) branch. This patch removes the
unused code. Also, remove the gflush change in
__radix__flush_tlb_range that was added to handle the range tlb flush
code. We only check for THP there because hugetlb is flushed via a
different code path where page size is explicitly specified.
This is a partial revert of commit
02390f66bd23 ("powerpc/64s/radix:
Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024075801.22434-1-aneesh.kumar@linux.ibm.com
Anthony Steinhauser [Tue, 29 Oct 2019 19:07:59 +0000 (12:07 -0700)]
powerpc/security/book3s64: Report L1TF status in sysfs
Some PowerPC CPUs are vulnerable to L1TF to the same extent as to
Meltdown. It is also mitigated by flushing the L1D on privilege
transition.
Currently the sysfs gives a false negative on L1TF on CPUs that I
verified to be vulnerable, a Power9 Talos II Boston 004e 1202, PowerNV
T2P9D01.
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Just have cpu_show_l1tf() call cpu_show_meltdown() directly]
Link: https://lore.kernel.org/r/20191029190759.84821-1-asteinhauser@google.com
Nathan Lynch [Wed, 16 Oct 2019 18:36:11 +0000 (13:36 -0500)]
powerpc/pseries: safely roll back failed DLPAR cpu add
dlpar_online_cpu() attempts to online all threads of a core that has
been added to an LPAR. If onlining a non-primary thread
fails (e.g. due to an allocation failure), the core is left with at
least one thread online. dlpar_cpu_add() attempts to roll back the
whole operation, releasing the core back to the platform. However,
since some threads of the core being removed are still online, the
BUG_ON(cpu_online(cpu)) in pseries_remove_processor() strikes:
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 3 PID: 8587 Comm: drmgr Not tainted
5.3.0-rc2-00190-g9b123d1ea237-dirty #46
NIP:
c0000000000eeb2c LR:
c0000000000eeac4 CTR:
c0000000000ee9e0
REGS:
c0000001f745b6c0 TRAP: 0700 Not tainted (
5.3.0-rc2-00190-g9b123d1ea237-dirty)
MSR:
800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR:
44002448 XER:
00000000
CFAR:
c00000000195d718 IRQMASK: 0
GPR00:
c0000000000eeac4 c0000001f745b950 c0000000032f6200 0000000000000008
GPR04:
0000000000000008 c000000003349c78 0000000000000040 00000000000001ff
GPR08:
0000000000000008 0000000000000000 0000000000000001 0007ffffffffffff
GPR12:
0000000084002844 c00000001ecacb80 0000000000000000 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 0000000000000000 0000000000000008
GPR24:
c000000003349ee0 c00000000334a2e4 c0000000fca4d7a8 c000000001d20048
GPR28:
0000000000000001 ffffffffffffffff ffffffffffffffff c0000000fca4d7c4
NIP [
c0000000000eeb2c] pseries_smp_notifier+0x14c/0x2e0
LR [
c0000000000eeac4] pseries_smp_notifier+0xe4/0x2e0
Call Trace:
[
c0000001f745b950] [
c0000000000eeac4] pseries_smp_notifier+0xe4/0x2e0 (unreliable)
[
c0000001f745ba10] [
c0000000001ac774] notifier_call_chain+0xb4/0x190
[
c0000001f745bab0] [
c0000000001ad62c] blocking_notifier_call_chain+0x7c/0xb0
[
c0000001f745baf0] [
c00000000167bda0] of_detach_node+0xc0/0x110
[
c0000001f745bb50] [
c0000000000e7ae4] dlpar_detach_node+0x64/0xa0
[
c0000001f745bb80] [
c0000000000edefc] dlpar_cpu_add+0x31c/0x360
[
c0000001f745bc10] [
c0000000000ee980] dlpar_cpu_probe+0x50/0xb0
[
c0000001f745bc50] [
c00000000002cf70] arch_cpu_probe+0x40/0x70
[
c0000001f745bc70] [
c000000000ccd808] cpu_probe_store+0x48/0x80
[
c0000001f745bcb0] [
c000000000cbcef8] dev_attr_store+0x38/0x60
[
c0000001f745bcd0] [
c00000000059c980] sysfs_kf_write+0x70/0xb0
[
c0000001f745bd10] [
c00000000059afb8] kernfs_fop_write+0xf8/0x280
[
c0000001f745bd60] [
c0000000004b437c] __vfs_write+0x3c/0x70
[
c0000001f745bd80] [
c0000000004b8710] vfs_write+0xd0/0x220
[
c0000001f745bdd0] [
c0000000004b8acc] ksys_write+0x7c/0x140
[
c0000001f745be20] [
c00000000000bbd8] system_call+0x5c/0x68
Move dlpar_offline_cpu() up in the file so that dlpar_online_cpu() can
use it to re-offline any threads that have been onlined when an error
is encountered.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: e666ae0b10aa ("powerpc/pseries: Update CPU hotplug error recovery")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016183611.10867-3-nathanl@linux.ibm.com
Nathan Lynch [Wed, 16 Oct 2019 18:36:10 +0000 (13:36 -0500)]
powerpc/pseries: address checkpatch warnings in dlpar_offline_cpu
Remove some stray blank lines, convert a printk to pr_warn, and
address a line length violation.
One functional change: use WARN_ON instead of BUG_ON in case H_PROD of
a ceded thread yields an unexpected result from the platform. We can
expect this code path to get uninterruptibly stuck in __cpu_die() if
this happens, but that's more desirable than crashing.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: b6db63d1a7f0 ("pseries/pseries: Add code to online/offline CPUs of a DLPAR node")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016183611.10867-2-nathanl@linux.ibm.com
Michael Ellerman [Mon, 4 Nov 2019 23:15:56 +0000 (10:15 +1100)]
selftests/powerpc: Skip tm-signal-sigreturn-nt if TM not available
On systems where TM (Transactional Memory) is disabled the
tm-signal-sigreturn-nt test causes a SIGILL:
test: tm_signal_sigreturn_nt
tags: git_version:
7c202575ef63
!! child died by signal 4
failure: tm_signal_sigreturn_nt
We should skip the test if TM is not available.
Fixes: 34642d70ac7e ("selftests/powerpc: Add checks for transactional sigreturn")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191104233524.24348-1-mpe@ellerman.id.au
Michael Ellerman [Mon, 4 Nov 2019 10:01:59 +0000 (21:01 +1100)]
Merge branch 'fixes' into next
Merge our fixes branch, primarily to bring in the powernv CPU hotplug
warning fix.
Michael Ellerman [Thu, 24 Oct 2019 00:47:30 +0000 (11:47 +1100)]
powerpc/tools: Don't quote $objdump in scripts
Some of our scripts are passed $objdump and then call it as
"$objdump". This doesn't work if it contains spaces because we're
using ccache, for example you get errors such as:
./arch/powerpc/tools/relocs_check.sh: line 48: ccache ppc64le-objdump: No such file or directory
./arch/powerpc/tools/unrel_branch_check.sh: line 26: ccache ppc64le-objdump: No such file or directory
Fix it by not quoting the string when we expand it, allowing the shell
to do the right thing for us.
Fixes: a71aa05e1416 ("powerpc: Convert relocs_check to a shell script using grep")
Fixes: 4ea80652dc75 ("powerpc/64s: Tool to flag direct branches from unrelocated interrupt vectors")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024004730.32135-1-mpe@ellerman.id.au
Michael Ellerman [Wed, 30 Oct 2019 11:12:31 +0000 (22:12 +1100)]
powerpc: Add build-time check of ptrace PT_xx defines
As part of the uapi we export a lot of PT_xx defines for each register
in struct pt_regs. These are expressed as an index from gpr[0], in
units of unsigned long.
Currently there's nothing tying the values of those defines to the
actual layout of the struct.
But we *don't* want to change the uapi defines to derive the PT_xx
values based on the layout of the struct, those values are ABI and
must never change.
Instead we want to do the reverse, make sure that the layout of the
struct never changes vs the PT_xx defines. So add build time checks of
that.
This probably seems paranoid, but at least once in the past someone
has sent a patch that would have broken the ABI if it hadn't been
spotted. Although it probably would have been detected via testing,
it's preferable to just quash any issues at the source.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191030111231.22720-1-mpe@ellerman.id.au
Mathieu Malaterre [Sat, 8 Dec 2018 15:46:23 +0000 (16:46 +0100)]
powerpc/ptrace: Add prototype for function pt_regs_check
`pt_regs_check` is a dummy function, its purpose is to break the build
if struct pt_regs and struct user_pt_regs don't match.
This function has no functionnal purpose, and will get eliminated at
link time or after init depending on CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
This commit adds a prototype to fix warning at W=1:
arch/powerpc/kernel/ptrace.c:3339:13: error: no previous prototype for ‘pt_regs_check’ [-Werror=missing-prototypes]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20181208154624.6504-1-malat@debian.org
Michael Ellerman [Mon, 20 May 2019 10:55:20 +0000 (20:55 +1000)]
selftests/powerpc: Add a test of spectre_v2 mitigations
This test uses the PMU to count branch prediction hits/misses for a
known loop, and compare the result to the reported spectre v2
mitigation.
This gives us a way of sanity checking that the reported mitigation is
actually in effect.
Sample output for some cases, eg:
Power9:
sysfs reports: 'Vulnerable'
PM_BR_PRED_CCACHE: result 368 running/enabled
5792777124
PM_BR_MPRED_CCACHE: result 319 running/enabled
5792775546
PM_BR_PRED_PCACHE: result
2147483281 running/enabled
5792773128
PM_BR_MPRED_PCACHE: result
213604201 running/enabled
5792771640
Miss percent 9 %
OK - Measured branch prediction rates match reported spectre v2 mitigation.
sysfs reports: 'Mitigation: Indirect branch serialisation (kernel only)'
PM_BR_PRED_CCACHE: result 895 running/enabled
5780320920
PM_BR_MPRED_CCACHE: result 822 running/enabled
5780312414
PM_BR_PRED_PCACHE: result
2147482754 running/enabled
5780308836
PM_BR_MPRED_PCACHE: result
213639731 running/enabled
5780307912
Miss percent 9 %
OK - Measured branch prediction rates match reported spectre v2 mitigation.
sysfs reports: 'Mitigation: Indirect branch cache disabled'
PM_BR_PRED_CCACHE: result
2147483649 running/enabled
20540186160
PM_BR_MPRED_CCACHE: result
2147483649 running/enabled
20540180056
PM_BR_PRED_PCACHE: result 0 running/enabled
20540176090
PM_BR_MPRED_PCACHE: result 0 running/enabled
20540174182
Miss percent 100 %
OK - Measured branch prediction rates match reported spectre v2 mitigation.
Power8:
sysfs reports: 'Vulnerable'
PM_BR_PRED_CCACHE: result
2147483649 running/enabled
3505888142
PM_BR_MPRED_CCACHE: result 9 running/enabled
3505882788
Miss percent 0 %
OK - Measured branch prediction rates match reported spectre v2 mitigation.
sysfs reports: 'Mitigation: Indirect branch cache disabled'
PM_BR_PRED_CCACHE: result
2147483649 running/enabled
16931421988
PM_BR_MPRED_CCACHE: result
2147483649 running/enabled
16931416478
Miss percent 100 %
OK - Measured branch prediction rates match reported spectre v2 mitigation.
success: spectre_v2
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190520105520.22274-1-mpe@ellerman.id.au
Nicholas Piggin [Tue, 22 Oct 2019 11:58:14 +0000 (21:58 +1000)]
powerpc/powernv: Fix CPU idle to be called with IRQs disabled
Commit
e78a7614f3876 ("idle: Prevent late-arriving interrupts from
disrupting offline") changes arch_cpu_idle_dead to be called with
interrupts disabled, which triggers the WARN in pnv_smp_cpu_kill_self.
Fix this by fixing up irq_happened after hard disabling, rather than
requiring there are no pending interrupts, similarly to what was done
done until commit
2525db04d1cc5 ("powerpc/powernv: Simplify lazy IRQ
handling in CPU offline").
Fixes: e78a7614f3876 ("idle: Prevent late-arriving interrupts from disrupting offline")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add unexpected_mask rather than checking for known bad values,
change the WARN_ON() to a WARN_ON_ONCE()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191022115814.22456-1-npiggin@gmail.com
Michael Ellerman [Mon, 14 Oct 2019 02:30:43 +0000 (13:30 +1100)]
selftests/powerpc: Fixup clobbers for TM tests
Some of our TM (Transactional Memory) tests, list "r1" (the stack
pointer) as a clobbered register.
GCC >= 9 doesn't accept this, and the build breaks:
ptrace-tm-spd-tar.c: In function 'tm_spd_tar':
ptrace-tm-spd-tar.c:31:2: error: listing the stack pointer register 'r1' in a clobber list is deprecated [-Werror=deprecated]
31 | asm __volatile__(
| ^~~
ptrace-tm-spd-tar.c:31:2: note: the value of the stack pointer after an 'asm' statement must be the same as it was before the statement
We do have some fairly large inline asm blocks in these tests, and
some of them do change the value of r1. However they should all return
to C with the value in r1 restored, so I think it's legitimate to say
r1 is not clobbered.
As Segher points out, the r1 clobbers may have been added because of
the use of `or 1,1,1`, however that doesn't actually clobber r1.
Segher also points out that some of these tests do clobber LR, because
they call functions, and that is not listed in the clobbers, so add
that where appropriate.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191029095324.14669-1-mpe@ellerman.id.au
Thiago Jung Bauermann [Wed, 11 Sep 2019 16:34:33 +0000 (13:34 -0300)]
powerpc/prom_init: Undo relocation before entering secure mode
The ultravisor will do an integrity check of the kernel image but we
relocated it so the check will fail. Restore the original image by
relocating it back to the kernel virtual base address.
This works because during build vmlinux is linked with an expected
virtual runtime address of KERNELBASE.
Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Tested-by: Michael Anderson <andmike@linux.ibm.com>
[mpe: Add IS_ENABLED() to fix the CONFIG_RELOCATABLE=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911163433.12822-1-bauerman@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 09:35:42 +0000 (15:05 +0530)]
powerpc/book3s64/hash: Use secondary hash for bolted mapping if the primary is full
With bolted hash page table entry, kernel currently only use primary hash group
when inserting the hash page table entry. In the rare case where kernel find all the
8 primary hash slot occupied by bolted entries, this can result in hash page
table insert failure for bolted entries. Avoid this by using the secondary hash
group.
This is different from what kernel does for the non-bolted mapping. With
non-bolted entries kernel will try secondary before removing an existing entry
from hash page table group. With bolted prefer primary hash group and hence
try to insert the page table entry by removing a slot from primary before trying
the secondary hash group.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024093542.29777-3-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 09:35:41 +0000 (15:05 +0530)]
powerpc/pseries: Don't fail hash page table insert for bolted mapping
If the hypervisor returned H_PTEG_FULL for H_ENTER hcall, retry a hash page table
insert by removing a random entry from the group.
After some runtime, it is very well possible to find all the 8 hash page table
entry slot in the hpte group used for mapping. Don't fail a bolted entry insert
in that case. With Storage class memory a user can find this error easily since
a namespace enable/disable is equivalent to memory add/remove.
This results in failures as reported below:
$ ndctl create-namespace -r region1 -t pmem -m devdax -a 65536 -s 100M
libndctl: ndctl_dax_enable: dax1.3: failed to enable
Error: namespace1.2: failed to enable
failed to create namespace: No such device or address
In kernel log we find the details as below:
Unable to create mapping for hot added memory 0xc000042006000000..0xc00004200d000000: -1
dax_pmem: probe of dax1.3 failed with error -14
This indicates that we failed to create a bolted hash table entry for direct-map
address backing the namespace.
We also observe failures such that not all namespaces will be enabled with
ndctl enable-namespace all command.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024093542.29777-2-aneesh.kumar@linux.ibm.com
Aneesh Kumar K.V [Thu, 24 Oct 2019 09:35:40 +0000 (15:05 +0530)]
powerpc/pseries: Don't opencode HPTE_V_BOLTED
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024093542.29777-1-aneesh.kumar@linux.ibm.com
Michael Ellerman [Sun, 13 Oct 2019 10:23:51 +0000 (21:23 +1100)]
powerpc/pseries: Mark accumulate_stolen_time() as notrace
accumulate_stolen_time() is called prior to interrupt state being
reconciled, which can trip the warning in arch_local_irq_restore():
WARNING: CPU: 5 PID: 1017 at arch/powerpc/kernel/irq.c:258 .arch_local_irq_restore+0x9c/0x130
...
NIP .arch_local_irq_restore+0x9c/0x130
LR .rb_start_commit+0x38/0x80
Call Trace:
.ring_buffer_lock_reserve+0xe4/0x620
.trace_function+0x44/0x210
.function_trace_call+0x148/0x170
.ftrace_ops_no_ops+0x180/0x1d0
ftrace_call+0x4/0x8
.accumulate_stolen_time+0x1c/0xb0
decrementer_common+0x124/0x160
For now just mark it as notrace. We may change the ordering to call it
after interrupt state has been reconciled, but that is a larger
change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191024055932.27940-1-mpe@ellerman.id.au
Michael Ellerman [Tue, 28 May 2019 08:16:14 +0000 (18:16 +1000)]
powerpc/configs: Rename foo_basic_defconfig to foo_base.config
We have several "defconfigs" that are not actually full defconfigs
they are just a base set of options which are then merged with other
fragments to produce a working defconfig.
The most obvious example is corenet_basic_defconfig which only
contains one symbol CONFIG_CORENET_GENERIC=y. And in fact if you build
it as a "defconfig" that one symbol ends up undefined, because its
prerequisites are missing.
There is also mpc85xx_base_defconfig which doesn't actually enable
CONFIG_PPC_85xx.
To avoid confusion, rename these config fragments to "foo_base.config"
to make it clearer that they are not full defconfigs and are instaed
just fragments that are used to generate real defconfigs.
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190528081614.26096-1-mpe@ellerman.id.au
Andrew Donnellan [Thu, 1 Aug 2019 04:58:55 +0000 (14:58 +1000)]
powerpc/configs: Add debug config fragment
Add a debug config fragment that we can use to put useful debug
options into.
It can be used like:
# make foo_defconfig
# make debug.config
Currently the only option included is to enable debugfs SCOM access.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
[mpe: Drop the special targets, just use the fragment directly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190801045855.5822-1-ajd@linux.ibm.com
Aneesh Kumar K.V [Tue, 17 Sep 2019 12:38:51 +0000 (18:08 +0530)]
powerpc/nvdimm: Update vmemmap_populated to check sub-section range
With commit:
7cc7867fb061 ("mm/devm_memremap_pages: enable sub-section remap")
pmem namespaces are remapped in 2M chunks. On architectures like ppc64 we
can map the memmap area using 16MB hugepage size and that can cover
a memory range of 16G.
While enabling new pmem namespaces, since memory is added in sub-section chunks,
before creating a new memmap mapping, kernel should check whether there is an
existing memmap mapping covering the new pmem namespace. Currently, this is
validated by checking whether the section covering the range is already
initialized or not. Considering there can be multiple namespaces in the same
section this can result in wrong validation. Update this to check for
sub-sections in the range. This is done by checking for all pfns in the range we
are mapping.
We could optimize this by checking only just one pfn in each sub-section. But
since this is not fast-path we keep this simple.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190917123851.22553-1-aneesh.kumar@linux.ibm.com
Christopher M. Riedl [Sat, 7 Sep 2019 06:11:24 +0000 (01:11 -0500)]
powerpc/xmon: Restrict when kernel is locked down
Xmon should be either fully or partially disabled depending on the
kernel lockdown state.
Put xmon into read-only mode for lockdown=integrity and prevent user
entry into xmon when lockdown=confidentiality. Xmon checks the lockdown
state on every attempted entry:
(1) during early xmon'ing
(2) when triggered via sysrq
(3) when toggled via debugfs
(4) when triggered via a previously enabled breakpoint
The following lockdown state transitions are handled:
(1) lockdown=none -> lockdown=integrity
set xmon read-only mode
(2) lockdown=none -> lockdown=confidentiality
clear all breakpoints, set xmon read-only mode,
prevent user re-entry into xmon
(3) lockdown=integrity -> lockdown=confidentiality
clear all breakpoints, set xmon read-only mode,
prevent user re-entry into xmon
Suggested-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190907061124.1947-3-cmr@informatik.wtf
Christopher M. Riedl [Sat, 7 Sep 2019 06:11:23 +0000 (01:11 -0500)]
powerpc/xmon: Allow listing and clearing breakpoints in read-only mode
Read-only mode should not prevent listing and clearing any active
breakpoints.
Tested-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Christopher M. Riedl <cmr@informatik.wtf>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190907061124.1947-2-cmr@informatik.wtf
Frederic Barrat [Wed, 16 Oct 2019 16:28:33 +0000 (18:28 +0200)]
powerpc/powernv/eeh: Fix oops when probing cxl devices
Recent cleanup in the way EEH support is added to a device causes a
kernel oops when the cxl driver probes a device and creates virtual
devices discovered on the FPGA:
BUG: Kernel NULL pointer dereference at 0x000000a0
Faulting instruction address: 0xc000000000048070
Oops: Kernel access of bad area, sig: 7 [#1]
...
NIP eeh_add_device_late.part.9+0x50/0x1e0
LR eeh_add_device_late.part.9+0x3c/0x1e0
Call Trace:
_dev_info+0x5c/0x6c (unreliable)
pnv_pcibios_bus_add_device+0x60/0xb0
pcibios_bus_add_device+0x40/0x60
pci_bus_add_device+0x30/0x100
pci_bus_add_devices+0x64/0xd0
cxl_pci_vphb_add+0xe0/0x130 [cxl]
cxl_probe+0x504/0x5b0 [cxl]
local_pci_probe+0x6c/0x110
work_for_cpu_fn+0x38/0x60
The root cause is that those cxl virtual devices don't have a
representation in the device tree and therefore no associated pci_dn
structure. In eeh_add_device_late(), pdn is NULL, so edev is NULL and
we oops.
We never had explicit support for EEH for those virtual devices.
Instead, EEH events are reported to the (real) pci device and handled
by the cxl driver. Which can then forward to the virtual devices and
handle dependencies. The fact that we try adding EEH support for the
virtual devices is new and a side-effect of the recent cleanup.
This patch fixes it by skipping adding EEH support on powernv for
devices which don't have a pci_dn structure.
The cxl driver doesn't create virtual devices on pseries so this patch
doesn't fix it there intentionally.
Fixes: b905f8cdca77 ("powerpc/eeh: EEH for pSeries hot plug")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016162833.22509-1-fbarrat@linux.ibm.com
Michael Ellerman [Sun, 13 Oct 2019 23:26:34 +0000 (10:26 +1100)]
selftests/powerpc: Reduce sigfuz runtime to ~60s
The defaults for the sigfuz test is to run for 4000 iterations, but
that can take quite a while and the test harness may kill the test.
Reduce the number of iterations to 600, which gives a runtime of
roughly 1 minute on a Power8 system.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191013234643.3430-1-mpe@ellerman.id.au
Christophe Leroy [Mon, 14 Oct 2019 16:51:28 +0000 (16:51 +0000)]
powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.
Make sure starting addr is aligned to segment boundary so that when
incrementing the segment, the starting address of the new segment is
below the end address. Otherwise the last segment might get missed.
Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/067a1b09f15f421d40797c2d04c22d4049a1cee8.1571071875.git.christophe.leroy@c-s.fr
Michael Ellerman [Sun, 13 Oct 2019 10:21:06 +0000 (21:21 +1100)]
Merge branch 'fixes' into next
Merge our fixes branch, to bring in the fixes for the KVM PCR bug and
the spufs crash.
Deb McLemore [Mon, 21 May 2018 02:04:38 +0000 (21:04 -0500)]
powerpc/powernv: Add queue mechanism for early messages
When issuing a BMC soft poweroff during IPL, the poweroff can be lost
so the machine would not poweroff.
This is because opal messages can be received before the opal-power
code registered its notifiers.
Fix it by buffering messages. If we receive a message and do not yet
have a handler for that type, store the message and replay when a
handler for that type is registered.
Signed-off-by: Deb McLemore <debmc@linux.vnet.ibm.com>
[mpe: Single unlock path in opal_message_notifier_register(), tweak
comments/formatting and change log.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1526868278-4204-1-git-send-email-debmc@linux.vnet.ibm.com
Qian Cai [Tue, 17 Sep 2019 15:22:30 +0000 (11:22 -0400)]
powerpc/pkeys: remove unused pkey_allows_readwrite
pkey_allows_readwrite() was first introduced in the commit
5586cf61e108
("powerpc: introduce execute-only pkey"), but the usage was removed
entirely in the commit
a4fcc877d4e1 ("powerpc/pkeys: Preallocate
execute-only key").
Found by the "-Wunused-function" compiler warning flag.
Fixes: a4fcc877d4e1 ("powerpc/pkeys: Preallocate execute-only key")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1568733750-14580-1-git-send-email-cai@lca.pw
Qian Cai [Mon, 15 Jul 2019 18:32:32 +0000 (14:32 -0400)]
powerpc/setup_64: fix -Wempty-body warnings
At the beginning of setup_64.c, it has,
#ifdef DEBUG
#define DBG(fmt...) udbg_printf(fmt)
#else
#define DBG(fmt...)
#endif
where DBG() could be compiled away, and generate warnings,
arch/powerpc/kernel/setup_64.c: In function 'initialize_cache_info':
arch/powerpc/kernel/setup_64.c:579:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find dcache properties !\n");
^
arch/powerpc/kernel/setup_64.c:582:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find icache properties !\n");
Fix it by using the suggestions from Michael:
"Neither of those sites should use DBG(), that's not really early
boot code, they should just use pr_warn().
And the other uses of DBG() in initialize_cache_info() should just
be removed.
In smp_release_cpus() the entry/exit DBG's should just be removed,
and the spinning_secondaries line should just be pr_debug().
That would just leave the two calls in early_setup(). If we taught
udbg_printf() to return early when udbg_putc is NULL, then we could
just call udbg_printf() unconditionally and get rid of the DBG macro
entirely."
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Qian Cai <cai@lca.pw>
[mpe: Split udbg change out into previous patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1563215552-8166-1-git-send-email-cai@lca.pw
Michael Ellerman [Fri, 11 Oct 2019 08:30:39 +0000 (19:30 +1100)]
powerpc/udbg: Make it safe to call udbg_printf() always
Make udbg_printf() check if udbg_putc is set, and if not just return.
This makes it safe to call udbg_printf() anytime, even when a udbg
backend has not been registered, which means we can avoid some ifdefs
at call sites.
Signed-off-by: Qian Cai <cai@lca.pw>
[mpe: Split out of larger patch, write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Hari Bathini [Wed, 9 Oct 2019 15:27:20 +0000 (20:57 +0530)]
powerpc: make syntax for FADump config options in kernel/Makefile readable
arch/powerpc/kernel/fadump.c file needs to be compiled in if 'config
FA_DUMP' or 'config PRESERVE_FA_DUMP' is set. The current syntax
achieves that but looks a bit odd. Fix it for better readability.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/157063484064.11906.3586824898111397624.stgit@hbathini.in.ibm.com
Hari Bathini [Wed, 9 Oct 2019 14:04:29 +0000 (19:34 +0530)]
powerpc/configs: add FADump awareness to skiroot_defconfig
FADump is supported on PowerNV platform. To fulfill this support, the
petitboot kernel must be FADump aware. Enable config PRESERVE_FA_DUMP
to make the petitboot kernel FADump aware.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/157062986936.23016.10146169203560084401.stgit@hbathini.in.ibm.com
Emmanuel Nicolet [Tue, 8 Oct 2019 14:13:42 +0000 (16:13 +0200)]
spufs: fix a crash in spufs_create_root()
The spu_fs_context was not set in fc->fs_private, this caused a crash
when accessing ctx->mode in spufs_create_root().
Fixes: d2e0981c3b9a ("vfs: Convert spufs to use the new mount API")
Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20191008141342.GA266797@gmail.com
Vaibhav Jain [Fri, 27 Sep 2019 06:20:02 +0000 (11:50 +0530)]
powerpc/papr_scm: Fix an off-by-one check in papr_scm_meta_{get, set}
A validation check to prevent out of bounds read/write inside
functions papr_scm_meta_{get,set}() is off-by-one that prevent reads
and writes to the last byte of the label area.
This bug manifests as a failure to probe a dimm when libnvdimm is
unable to read the entire config-area as advertised by
ND_CMD_GET_CONFIG_SIZE. This usually happens when there are large
number of namespaces created in the region backed by the dimm and the
label-index spans max possible config-area. An error of the form below
usually reported in the kernel logs:
[ 255.293912] nvdimm: probe of nmem0 failed with error -22
The patch fixes these validation checks there by letting libnvdimm
access the entire config-area.
Fixes: 53e80bd042773('powerpc/nvdimm: Add support for multibyte read/write for metadata')
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190927062002.3169-1-vaibhav@linux.ibm.com
Jordan Niethe [Fri, 4 Oct 2019 02:53:17 +0000 (12:53 +1000)]
powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S
needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to
continue. This happens after resetting the PCR. Before commit
13c7bb3c57dc ("powerpc/64s: Set reserved PCR bits"), r0 would always
be 0 before it was stored to kvmppc_vcore->in_guest. However because
of this change in the commit:
/* Reset PCR */
ld r0, VCORE_PCR(r5)
- cmpdi r0, 0
+ LOAD_REG_IMMEDIATE(r6, PCR_MASK)
+ cmpld r0, r6
beq 18f
- li r0, 0
- mtspr SPRN_PCR, r0
+ mtspr SPRN_PCR, r6
18:
/* Signal secondary CPUs to continue */
stb r0,VCORE_IN_GUEST(r5)
We are no longer comparing r0 against 0 and loading it with 0 if it
contains something else. Hence when we store r0 to
kvmppc_vcore->in_guest, it might not be 0. This means that secondary
CPUs will not be signalled to continue. Those CPUs get stuck and
errors like the following are logged:
KVM: CPU 1 seems to be stuck
KVM: CPU 2 seems to be stuck
KVM: CPU 3 seems to be stuck
KVM: CPU 4 seems to be stuck
KVM: CPU 5 seems to be stuck
KVM: CPU 6 seems to be stuck
KVM: CPU 7 seems to be stuck
This can be reproduced with:
$ for i in `seq 1 7` ; do chcpu -d $i ; done ;
$ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \
-M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \
-kernel vmlinux -initrd initrd.cpio.xz
Fix by making sure r0 is 0 before storing it to
kvmppc_vcore->in_guest.
Fixes: 13c7bb3c57dc ("powerpc/64s: Set reserved PCR bits")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
Desnes A. Nunes do Rosario [Thu, 3 Oct 2019 21:10:10 +0000 (18:10 -0300)]
selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
Newer versions of GCC (>= 9) demand that the size of the string to be
copied must be explicitly smaller than the size of the destination.
Thus, the NULL char has to be taken into account on strncpy.
This will avoid the following compiling error:
tlbie_test.c: In function 'main':
tlbie_test.c:639:4: error: 'strncpy' specified bound 100 equals destination size
strncpy(logdir, optarg, LOGDIR_NAME_SIZE);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191003211010.9711-1-desnesn@linux.ibm.com
Laurent Dufour [Tue, 1 Oct 2019 13:29:28 +0000 (15:29 +0200)]
powerpc/pseries: Remove confusing warning message.
Since commit
1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate
Characteristics"), a warning message is displayed when booting a guest
on top of KVM:
lpar: arch/powerpc/platforms/pseries/lpar.c pseries_lpar_read_hblkrm_characteristics Error calling get-system-parameter (0xfffffffd)
This message is displayed because this hypervisor is not supporting
the H_BLOCK_REMOVE hcall and thus is not exposing the corresponding
feature.
Reading the TLB Block Invalidate Characteristics should not be done if
the feature is not exposed.
Fixes: 1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate Characteristics")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191001132928.72555-1-ldufour@linux.ibm.com
Stephen Rothwell [Mon, 30 Sep 2019 00:13:42 +0000 (10:13 +1000)]
powerpc/64s/radix: Fix build failure with RADIX_MMU=n
After merging the powerpc tree, today's linux-next build (powerpc64
allnoconfig) failed like this:
arch/powerpc/mm/book3s64/pgtable.c:216:3:
error: implicit declaration of function 'radix__flush_all_lpid_guest'
radix__flush_all_lpid_guest() is only declared for
CONFIG_PPC_RADIX_MMU which is not set for this build.
Fix it by adding an empty version for the RADIX_MMU=n case, which
should never be called.
Fixes: 99161de3a283 ("powerpc/64s/radix: tidy up TLB flushing code")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Munge change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190930101342.36c1afa0@canb.auug.org.au
Linus Torvalds [Sun, 6 Oct 2019 21:27:30 +0000 (14:27 -0700)]
Linux 5.4-rc2
Linus Torvalds [Sun, 6 Oct 2019 20:53:27 +0000 (13:53 -0700)]
elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings
In commit
4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") we
changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
executable mappings.
Then, people reported that it broke some binaries that had overlapping
segments from the same file, and commit
ad55eac74f20 ("elf: enforce
MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
overlaying elf segment cases. But only some - despite the summary line
of that commit, it only did it when it also does a temporary brk vma for
one obvious overlapping case.
Now Russell King reports another overlapping case with old 32-bit x86
binaries, which doesn't trigger that limited case. End result: we had
better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
Yes, it's a sign of old binaries generated with old tool-chains, but we
do pride ourselves on not breaking existing setups.
This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
and the old load_elf_library() use-cases, because nobody has reported
breakage for those. Yet.
Note that in all the cases seen so far, the overlapping elf sections
seem to be just re-mapping of the same executable with different section
attributes. We could possibly introduce a new MAP_FIXED_NOFILECHANGE
flag or similar, which acts like NOREPLACE, but allows just remapping
the same executable file using different protection flags.
It's not clear that would make a huge difference to anything, but if
people really hate that "elf remaps over previous maps" behavior, maybe
at least a more limited form of remapping would alleviate some concerns.
Alternatively, we should take a look at our elf_map() logic to see if we
end up not mapping things properly the first time.
In the meantime, this is the minimal "don't do that then" patch while
people hopefully think about it more.
Reported-by: Russell King <linux@armlinux.org.uk>
Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map")
Fixes: ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments")
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 6 Oct 2019 18:10:15 +0000 (11:10 -0700)]
Merge tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping regression fix from Christoph Hellwig:
"Revert an incorret hunk from a patch that caused problems on various
arm boards (Andrey Smirnov)"
* tag 'dma-mapping-5.4-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix false positive warnings in dma_common_free_remap()
Linus Torvalds [Sun, 6 Oct 2019 00:18:43 +0000 (17:18 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A few fixes this time around:
- Fixup of some clock specifications for DRA7 (device-tree fix)
- Removal of some dead/legacy CPU OPP/PM code for OMAP that throws
warnings at boot
- A few more minor fixups for OMAPs, most around display
- Enable STM32 QSPI as =y since their rootfs sometimes comes from
there
- Switch CONFIG_REMOTEPROC to =y since it went from tristate to bool
- Fix of thermal zone definition for ux500 (5.4 regression)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: multi_v7_defconfig: Fix SPI_STM32_QSPI support
ARM: dts: ux500: Fix up the CPU thermal zone
arm64/ARM: configs: Change CONFIG_REMOTEPROC from m to y
ARM: dts: am4372: Set memory bandwidth limit for DISPC
ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
ARM: OMAP2+: Add missing LCDC midlemode for am335x
ARM: OMAP2+: Fix missing reset done flag for am3 and am43
ARM: dts: Fix gpio0 flags for am335x-icev2
ARM: omap2plus_defconfig: Enable more droid4 devices as loadable modules
ARM: omap2plus_defconfig: Enable DRM_TI_TFP410
DTS: ARM: gta04: introduce legacy spi-cs-high to make display work again
ARM: dts: Fix wrong clocks for dra7 mcasp
clk: ti: dra7: Fix mcasp8 clock bits
Linus Torvalds [Sat, 5 Oct 2019 19:56:59 +0000 (12:56 -0700)]
Merge tag 'kbuild-fixes-v5.4' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- remove unneeded ar-option and KBUILD_ARFLAGS
- remove long-deprecated SUBDIRS
- fix modpost to suppress false-positive warnings for UML builds
- fix namespace.pl to handle relative paths to ${objtree}, ${srctree}
- make setlocalversion work for /bin/sh
- make header archive reproducible
- fix some Makefiles and documents
* tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: make headers archive reproducible
kbuild: update compile-test header list for v5.4-rc2
kbuild: two minor updates for Documentation/kbuild/modules.rst
scripts/setlocalversion: clear local variable to make it work for sh
namespace: fix namespace.pl script to support relative paths
video/logo: do not generate unneeded logo C files
video/logo: remove unneeded *.o pattern from clean-files
integrity: remove pointless subdir-$(CONFIG_...)
integrity: remove unneeded, broken attempt to add -fshort-wchar
modpost: fix static EXPORT_SYMBOL warnings for UML build
kbuild: correct formatting of header in kbuild module docs
kbuild: remove SUBDIRS support
kbuild: remove ar-option and KBUILD_ARFLAGS