Madhavan Srinivasan [Mon, 31 Jul 2017 08:02:41 +0000 (13:32 +0530)]
powerpc/perf: Factor out PPMU_ONLY_COUNT_RUN check code from power8
There are some hardware events on Power systems which only count when
the processor is not idle, and there are some fixed-function counters
which count such events. For example, the "run cycles" event counts
cycles when the processor is not idle. If the user asks to count
cycles, we can use "run cycles" if this is a per-task event, since the
processor is running when the task is running, by definition. We can't
use "run cycles" if the user asks for "cycles" on a system-wide
counter.
Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN flag
in power8_get_alternatives() function. Based on the flag, events are
switched if needed. This function should also be enabled in power9, so
factor out the code to isa207_get_alternatives().
Fixes: efe881afdd999 ('powerpc/perf: Factor out event_alternative function')
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Madhavan Srinivasan [Tue, 25 Jul 2017 05:35:51 +0000 (11:05 +0530)]
powerpc/perf: Update default sdar_mode value for power9
Commit
20dd4c624d251 ('powerpc/perf: Fix SDAR_MODE value for continous
sampling on Power9') set the default sdar_mode value in MMCRA[SDAR_MODE]
to be used as 0b01 (Update on TLB miss). And this value is set if sdar_mode
from event is zero, or we are in continous sampling mode in power9 dd1.
But it is preferred to have the sdar_mode value for power9 as
0b10 (Update on dcache miss) for better sampling updates instead
of 0b01 (Update on TLB miss).
From Anton:
Using a bandwidth test case with a 1MB footprint, I profiled cycles and
chose TLB updates of the SDAR:
$ perf record -d -e r000400000000001E:u ./bw2001 1M
^
SDAR TLB
$ perf report -D | grep PERF_RECORD_SAMPLE | sed 's/.*addr: //' | sort -u | wc -l
4
I get 4 unique addresses. If I ran with dcache misses:
$ perf record -d -e r000800000000001E:u ./bw2001 1M
^
SDAR dcache miss
$ perf report -D|grep PERF_RECORD_SAMPLE| sed 's/.*addr: //'|sort -u | wc -l
5217
I get 5217 unique addresses. No surprises here, but it does show why
TLB misses is the wrong event to default to - we get very little useful
information out of it.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ivan Mikhaylov [Tue, 25 Jul 2017 11:40:04 +0000 (14:40 +0300)]
powerpc/44x/fsp2: Enable eMMC arasan for fsp2 platform
Add mmc0 changes for enabling arasan emmc and change
defconfig appropriately.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Suraj Jitindar Singh [Thu, 3 Aug 2017 04:15:51 +0000 (14:15 +1000)]
powerpc/mm: Properly invalidate when setting process table base
The host process table base is stored in the partition table by calling
the function native_register_process_table(). Currently this just sets
the entry in memory and is missing a subsequent cache invalidation
instruction. Any update to the partition table should be followed by a
cache invalidation instruction specifying invalidation of the caching of
any partition table entries (RIC = 2, PRS = 0).
We already have a function to update the partition table with the
required cache invalidation instructions - mmu_partition_table_set_entry().
Update the native_register_process_table() function to call
mmu_partition_table_set_entry(), this ensures all appropriate
invalidation will be performed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Use a local for patb0 to clean it up slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 2 Aug 2017 01:54:41 +0000 (20:54 -0500)]
powerpc/xive: Ensure active irqd when setting affinity
Ensure irqd is active before attempting to set affinity. This should
make the set affinity code more robust. For instance, this prevents
these messages seen on a 4.12 based kernel when taking cpus offline:
[ 123.
053037264,3] XIVE[ IC 00 ] ISN 2 lead to invalid IVE !
[ 77.885859] xive: Error -6 reconfiguring irq 17
[ 77.885862] IRQ17: set affinity failed(-6).
That particular case has been fixed in 4.13-rc1 by commit
91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs").
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 1 Aug 2017 12:00:54 +0000 (22:00 +1000)]
powerpc: Add irq accounting for watchdog interrupts
This adds an irq counter for the watchdog soft-NMI. This interrupt
only fires when interrupts are soft-disabled, so it will not
increment much even when the watchdog is running. However it's
useful for debugging and sanity checking.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 1 Aug 2017 12:00:53 +0000 (22:00 +1000)]
powerpc: Add irq accounting for system reset interrupts
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 1 Aug 2017 12:00:52 +0000 (22:00 +1000)]
powerpc: Fix powerpc-specific watchdog build configuration
The powerpc kernel/watchdog.o should be built when HARDLOCKUP_DETECTOR
and HAVE_HARDLOCKUP_DETECTOR_ARCH are both selected. If only the former
is selected, then the generic perf watchdog has been selected.
To simplify this check, introduce a new Kconfig symbol PPC_WATCHDOG that
depends on both. This Kconfig option means the powerpc specific
watchdog is enabled.
Without this patch, Book3E will attempt to build the powerpc watchdog.
Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Tue, 1 Aug 2017 12:00:51 +0000 (22:00 +1000)]
powerpc/64s: Fix mce accounting for powernv
On 64-bit Book3s, when we're in HV mode, we have already counted the
machine check exception in machine_check_early().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use IS_ENABLED() rather than an #ifdef]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nathan Fontenot [Wed, 2 Aug 2017 18:03:22 +0000 (14:03 -0400)]
powerpc/pseries: Check memory device state before onlining/offlining
When DLPAR adding or removing memory we need to check the device
offline status before trying to online/offline the memory. This is
needed because calls to device_online() and device_offline() will
return non-zero for memory that is already online and offline
respectively.
This update resolves two scenarios. First, for a kernel built with
auto-online memory enabled (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y),
memory will be onlined as part of calls to add_memory(). After adding
the memory the pseries DLPAR code tries to online it and fails since
the memory is already online. The DLPAR code then tries to remove the
memory which produces the oops message below because the memory is not
offline.
The second scenario occurs when removing memory that is already
offline, i.e. marking memory offline (via sysfs) and then trying to
remove that memory. This doesn't work because offlining the already
offline memory does not succeed and the DLPAR code then fails the
DLPAR remove operation.
The fix for both scenarios is to check the device.offline status
before making the calls to device_online() or device_offline().
kernel BUG at mm/memory_hotplug.c:1936!
...
NIP [
c0000000002ca428] .remove_memory+0xb8/0xc0
LR [
c0000000002ca3cc] .remove_memory+0x5c/0xc0
Call Trace:
.remove_memory+0x5c/0xc0 (unreliable)
.dlpar_add_lmb+0x384/0x400
.dlpar_memory+0x5dc/0xca0
.handle_dlpar_errorlog+0x74/0xe0
.pseries_hp_work_fn+0x2c/0x90
.process_one_work+0x17c/0x460
.worker_thread+0x88/0x500
.kthread+0x15c/0x1a0
.ret_from_kernel_thread+0x58/0xc0
Fixes: 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[mpe: Use bool, add explicit rc=0 case, change log typos & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Andreas Schwab [Sat, 5 Aug 2017 17:55:11 +0000 (19:55 +0200)]
powerpc: Fix invalid use of register expressions
binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals, for example:
arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression
In practice these are almost all uses of r0 that should just be a
literal 0.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Mention r0 is almost always the culprit, fold in purgatory change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 1 Aug 2017 10:29:24 +0000 (20:29 +1000)]
powerpc/mm/hash64: Make vmalloc 56T on hash
On 64-bit book3s, with the hash MMU, we currently define the kernel
virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a
leftover from pre v3.7 when our user VM was also 16T.
Of that 16T we split it 50/50, with half used for PCI IO and ioremap
and the other 8T for vmalloc.
We never bothered to make it any bigger because 8T of vmalloc ought to
be enough for anybody. But it turns out that's not true, the per cpu
allocator wants large amounts of vmalloc space, not to make large
allocations, but to allow a large stride between allocations, because
we use pcpu_embed_first_chunk().
With a bit of juggling we can increase the entire kernel virtual space
to 64T. The only real complication is the check of the address in the
SLB miss handler, see the comment in the code.
Although we could continue to split virtual space 50/50 as we do now,
no one seems to be running out of PCI IO or ioremap space. So instead
keep that as 8T, and use the remaining 56T for vmalloc.
In future we should be able to increase the kernel virtual space to
512T, the code already supports that, it just needs testing on older
hardware.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Michael Ellerman [Tue, 1 Aug 2017 10:29:23 +0000 (20:29 +1000)]
powerpc/mm/slb: Move comment next to the code it's referring to
There is a comment in slb_allocate() referring to the load of
paca->vmalloc_sllp, but it's several lines prior in the assembly.
We're about to change this code, and we want to add another comment,
so move the comment immediately prior to the instruction it's talking
about.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 1 Aug 2017 10:29:22 +0000 (20:29 +1000)]
powerpc/mm/book3s64: Make KERN_IO_START a variable
Currently KERN_IO_START is defined as:
#define KERN_IO_START (KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))
Although it looks like a constant, both the components are actually
variables, to allow us to have a different value between Radix and
Hash with a single kernel.
However that still requires both Radix and Hash to place the kernel IO
region at the same location relative to the start and end of the
kernel virtual region (namely 1/2 way through it), and we'd like to
change that.
So split KERN_IO_START out into its own variable, and initialise it
for Radix and Hash. In the medium term we should be able to
reconsolidate this, by doing a more involved rearrangement of the
location of the regions.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Matt Brown [Fri, 4 Aug 2017 01:12:18 +0000 (11:12 +1000)]
powerpc/powernv: Use darn instruction for get_random_seed() on Power9
This adds powernv_get_random_darn() which utilises the darn instruction,
introduced in ISA v3.0/POWER9.
The darn instruction can potentially return an error, which is supported
by the get_random_seed() API, in normal usage if we see an error we just
return that to the caller.
However when detecting whether darn is functional at boot we try up to
10 times, before deciding that darn doesn't work and failing the
registration of get_random_seed(). That way an intermittent failure
at boot doesn't deprive the system of randomness until the next reboot.
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Move init into a function, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Tue, 8 Aug 2017 06:37:24 +0000 (08:37 +0200)]
powerpc/32: Fix boot failure on non 6xx platforms
Commit
d300627c6a536 ("powerpc/6xx: Handle DABR match before calling
do_page_fault") breaks non 6xx platforms.
Failed to execute /init (error -14)
Starting init: /bin/sh exists but couldn't execute it (error -14)
Kernel panic - not syncing: No working init found. Try passing init= ...
CPU: 0 PID: 1 Comm: init Not tainted
4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
Call Trace:
panic+0x108/0x250 (unreliable)
rootfs_mount+0x0/0x58
ret_from_kernel_thread+0x5c/0x64
Rebooting in 180 seconds..
This is because in handle_page_fault(), the call to do_page_fault() has been
mistakenly enclosed inside an #ifdef CONFIG_6xx
Fixes: d300627c6a536 ("powerpc/6xx: Handle DABR match before calling do_page_fault")
Brown-paper-bag-to-be-worn-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Frederic Barrat [Fri, 4 Aug 2017 09:55:14 +0000 (11:55 +0200)]
powerpc/powernv: Enable PCI peer-to-peer
P9 has support for PCI peer-to-peer, enabling a device to write in the
MMIO space of another device directly, without interrupting the CPU.
This patch adds support for it on powernv, by adding a new API to be
called by drivers. The pnv_pci_set_p2p(...) call configures an
'initiator', i.e the device which will issue the MMIO operation, and a
'target', i.e. the device on the receiving side.
P9 really only supports MMIO stores for the time being but that's
expected to change in the future, so the API allows to define both
load and store operations.
/* PCI p2p descriptor */
#define OPAL_PCI_P2P_ENABLE 0x1
#define OPAL_PCI_P2P_LOAD 0x2
#define OPAL_PCI_P2P_STORE 0x4
int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
u64 desc)
It uses a new OPAL call, as the configuration magic is done on the
PHBs by skiboot.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:46 +0000 (14:49 +1000)]
powerpc: Remove old unused icswx based coprocessor support
We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.
Take out all that code.
There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:45 +0000 (14:49 +1000)]
powerpc/mm: Cleanup check for stack expansion
When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.
While at it, make store_update_sp a boolean.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:44 +0000 (14:49 +1000)]
powerpc/mm: Don't lose "major" fault indication on retry
If the first iteration returns VM_FAULT_MAJOR but the second
one doesn't, we fail to account the fault as a major fault.
This fixes it and brings the code in line with x86.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:43 +0000 (14:49 +1000)]
powerpc/mm: Move page fault VMA access checks to a helper
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:42 +0000 (14:49 +1000)]
powerpc/mm: Set fault flags earlier
Move out the code that sets FAULT_FLAG_WRITE so the block that check
access permissions can be extracted. While at it also set
FAULT_FLAG_INSTRUCTION which will be used for protection keys.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:41 +0000 (14:49 +1000)]
powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault
Mostly for the failure cases
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:40 +0000 (14:49 +1000)]
powerpc/mm: Move/simplify faulthandler_disabled() and !mm check
Do the check before we re-enable interrupts and clean the code
up a bit.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:39 +0000 (14:49 +1000)]
powerpc/mm: Move the DSISR_PROTFAULT sanity check
This has a page of comment explaining what's going on right in
the middle of do_page_fault() which makes things a bit hard to
follow. Move it to a helper instead. Also do the test earlier
as there's no point waiting until after we found the VMA.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:38 +0000 (14:49 +1000)]
powerpc/mm: Cosmetic fix to page fault accounting
No need to break those lines, they aren't that long
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:37 +0000 (14:49 +1000)]
powerpc/mm: Move CMO accounting out of do_page_fault into a helper
It makes do_page_fault() more readable. No functional change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:36 +0000 (14:49 +1000)]
powerpc/mm: Rework mm_fault_error()
First, handle the normal retry failure in do_page_fault itself,
since it's a simple return statement. That allows us to remove
the "continue" special return code from mm_fault_error().
Once that's done, we can have an implementation much closer to
x86 where we only call mm_fault_error() if VM_FAULT_ERROR is set
and directly return.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:35 +0000 (14:49 +1000)]
powerpc/mm: Make bad_area* helper functions
Instead of goto labels, instead call those functions and return.
This gets us closer to x86 and allows us to shring do_page_fault()
even more.
The main difference with x86 is that those function return a value
which we then return from do_page_fault(). That value is our
return value from do_page_fault() which we use to generate
kernel faults.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:34 +0000 (14:49 +1000)]
powerpc/mm: Fix reporting of kernel execute faults
We currently test for is_exec and DSISR_PROTFAULT but that doesn't
make sense as this is the wrong error bit to test for an execute
permission failure.
In fact, we had code that would return early if we had an exec
fault in kernel mode so I think that was just dead code anyway.
Finally the location of that test is awkward and prevents further
simplifications.
So instead move that test into a helper along with the existing
early test for kernel exec faults and out of range accesses,
and put it all in a "bad_kernel_fault()" helper. While at it
test the correct error bits.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:33 +0000 (14:49 +1000)]
powerpc/mm: Simplify returns from __do_page_fault
Now that we moved the exception state handling to a wrapper, we can
just directly return rather than "goto bail"
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:32 +0000 (14:49 +1000)]
powerpc/mm: Move debugger check to notify_page_fault()
unclutters the main path
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:31 +0000 (14:49 +1000)]
powerpc/mm: Overhaul handling of bad page faults
A bad page fault is when the HW signals an error such as a bad
copy/paste, an AMO error, or some other type of error that will
not be fixed by updating the PTE.
Use a helper page_fault_is_bad() to check for bad page faults thus
removing the per-processor family open-coding in __do_page_fault()
and trigger a SIGBUS rather than a SIGSEGV which is more appropriate.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:30 +0000 (14:49 +1000)]
powerpc/mm: Move error_code checks for bad faults earlier
There's no point looking for the VMA etc.. when we already know
we are going to fail.
This adds some code to set "code" for the si_code but that will
be gone in subsequent patches.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:29 +0000 (14:49 +1000)]
powerpc/mm: Move out definition of CPU specific is_write bits
Define a common page_fault_is_write() helper and use it
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:28 +0000 (14:49 +1000)]
powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs
This uses the newly defined constants for this rather than open-coded
numbers. There is a side effect on 64-bit which is to pass through
some of the new P9 bits which we didn't before.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:27 +0000 (14:49 +1000)]
powerpc/mm: Update bits used to skip hash_page
We test a number of bits from DSISR/SRR1 before deciding
to call hash_page(). If any of these is set, we go directly
to do_page_fault() as the bit indicate a fault that needs
to be handled there (no hashing needed).
This updates the current open-coded masks to use the new
DSISR definitions.
This *does* change the masks actually used in two ways:
- We used to test various bits that were defined as "always 0"
in the architecture and could be repurposed for something
else. From now on, we just ignore such bits.
- We were missing some new bits defined on P9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:26 +0000 (14:49 +1000)]
powerpc/mm: Update definitions of DSISR bits
This updates the definitions for the various DSISR bits to
match both some historical stuff and to match new bits on
POWER9.
In addition, we define some masks corresponding to the "bad"
faults on Book3S, and some masks corresponding to the bits
that match between DSISR and SRR1 for a DSI and an ISI.
This comes with a small code update to change the definition
of DSISR_PGDIRFAULT which becomes DSISR_PRTABLE_FAULT to
match architecture 3.0B
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:25 +0000 (14:49 +1000)]
powerpc/6xx: Handle DABR match before calling do_page_fault
On legacy 6xx 32-bit procesors, we checked for the DABR match bit
in DSISR from do_page_fault(), in the middle of a pile of ifdef's
because all other CPU types do it in assembly prior to calling
do_page_fault. Fix that.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Add #ifdef CONFIG_6xx]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:24 +0000 (14:49 +1000)]
powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
By filtering the relevant SRR1 bits in the assembly rather than
in do_page_fault() itself, we avoid a conditional branch (since we
already come from different path for data and instruction faults).
This will allow more simplifications later
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:23 +0000 (14:49 +1000)]
powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
This will allow simplifying the returns from do_page_fault
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:06 +0000 (14:49 +1000)]
powerpc/mm/radix: Avoid flushing the PWC on every flush_tlb_range
We do that because it's used by THP pmd collapsing, so use
instead a dedicated flush function.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:05 +0000 (14:49 +1000)]
powerpc/mm/radix: Improve TLB/PWC flushes
At the moment we have to rather sub-optimal flushing behaviours:
- flush_tlb_mm() will flush the PWC which is unnecessary (for example
when doing a fork)
- A large unmap will call flush_tlb_pwc() multiple times causing us
to perform that fairly expensive operation repeatedly. This happens
often in batches of 3 on every new process.
So we change flush_tlb_mm() to only flush the TLB, and we use the
existing "need_flush_all" flag in struct mmu_gather to indicate
that the PWC needs flushing.
Unfortunately, flush_tlb_range() still needs to do a full flush
for now as it's used by the THP collapsing. We will fix that later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 19 Jul 2017 04:49:04 +0000 (14:49 +1000)]
powerpc/mm/radix: Improve _tlbiel_pid to be usable for PWC flushes
The PWC flush only needs a single set call, just like the
full (RIC=2) flush.
This will allow us to get rid of the dedicated _tlbiel_pwc()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Victor Aoqui [Thu, 20 Jul 2017 17:26:06 +0000 (14:26 -0300)]
powerpc/kernel: Avoid preemption check in iommu_range_alloc()
Replace the __this_cpu_read() with raw_cpu_read() in
iommu_range_alloc(). Otherwise we get a warning about using
__this_cpu_read() in preemptible code:
BUG: using __this_cpu_read() in preemptible
caller is iommu_range_alloc+0xa8/0x3d0
Preemption doesn't need to be disabled since according to the comment
any CPU can safely use any IOMMU pool.
Signed-off-by: Victor Aoqui <victora@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Fri, 21 Jul 2017 11:01:34 +0000 (16:31 +0530)]
powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug
Currently we use the stop-api provided by the firmware to program the
SLW engine to restore the values of hypervisor resources that get lost
on deeper idle states (such as winkle). Since the deep states were
only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
woken up by decrementer.
On POWER9, some of the deep platform idle states such as stop4 can be
used in cpuidle as well. In this case, we want the CPU in stop4 to be
woken up by the decrementer when some timer on the CPU expires.
In this patch, we program the stop-api for LPCR with PECE1
bit cleared only when we are offlining the CPU and set it
back once the CPU is online.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Fri, 21 Jul 2017 10:41:37 +0000 (16:11 +0530)]
powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rui Teng [Thu, 12 Jan 2017 09:09:06 +0000 (17:09 +0800)]
powerpc/mm: Fix check of multiple 16G pages from device tree
The offset of hugepage block will not be 16G, if the expected
page is more than one. Calculate the totol size instead of the
hardcode value.
Fixes: 4792adbac9eb ("powerpc: Don't use a 16G page if beyond mem= limits")
Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com>
Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 24 Jul 2017 14:10:15 +0000 (00:10 +1000)]
powerpc/udbg: Reduce the footgun potential of EARLY_DEBUG_LPAR(_HVSI)
For debugging very early boot problems we have CONFIG_PPC_EARLY_DEBUG,
which allows configuring the kernel such that it unconditionally writes
to a particular type of console, regardless of whether that console
exists or not. This is useful sometimes when the kernel crashes before
it can even determine what platform it's on, and therefore what consoles
exist.
However if you boot a kernel built this way on a different platform, it
will generally crash because it writes to a console that doesn't exist.
A particularly nasty instance of this is if you enable the hypervisor
console early debug, and then boot that kernel on bare metal. The result
is that the kernel calls "the hypervisor" very early in boot, but the
kernel *is* the hypervisor, so we jump to the system call handler and
start executing all sorts of code that isn't ready to be run. This may
lead to a machine check or check stop depending on how lucky you are.
Luckily there is an easy way to avoid this particular case. We simply
read the MSR before installing the hooks, and if we see MSR_HV is set
then we are the hypervisor and we definitely should not use the
hypervisor console.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 24 Jul 2017 12:50:45 +0000 (22:50 +1000)]
powerpc/configs: Add a powernv_be_defconfig
Although pretty much everyone using powernv is running little endian,
we should still test we can build for big endian. So add a
powernv_be_defconfig, which is autogenerated by flipping the endian
symbol in powernv_defconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Anju T Sudhakar [Tue, 18 Jul 2017 21:36:36 +0000 (03:06 +0530)]
powerpc/perf: Add thread IMC PMU support
Add support to register Thread In-Memory Collection PMU counters.
Patch adds thread IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anju T Sudhakar [Tue, 18 Jul 2017 21:36:35 +0000 (03:06 +0530)]
powerpc/perf: Add core IMC PMU support
Add support to register Core In-Memory Collection PMU counters.
Patch adds core IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anju T Sudhakar [Tue, 18 Jul 2017 21:36:34 +0000 (03:06 +0530)]
powerpc/perf: Add nest IMC PMU support
Add support to register Nest In-Memory Collection PMU counters.
Patch adds a new device file called "imc-pmu.c" under powerpc/perf
folder to contain all the device PMU functions.
Device tree parser code added to parse the PMU events information
and create sysfs event attributes for the PMU.
Cpumask attribute added along with Cpu hotplug online/offline functions
specific for nest PMU. A new state "CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE"
added for the cpu hotplug callbacks. Error handle path frees the memory
and unregisters the CPU hotplug callbacks.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Madhavan Srinivasan [Tue, 18 Jul 2017 21:36:33 +0000 (03:06 +0530)]
powerpc/powernv: Detect and create IMC device
Code to create platform device for the In-Memory Collection (IMC)
counters. Platform devices are created based on the IMC compatibility.
New header file created to contain the data structures and macros
needed for In-Memory Collection (IMC) counter pmu devices.
The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes for these IMC
PMUs. Device probe() parses the device to locate three possible IMC
device types (Nest/Core/Thread). Function then branch to parse each
unit nodes to populate vital information such as device memory sizes,
event nodes information, base address for reserve memory access (if
any) and so on. Simple bare-minimum shutdown function added which only
"stops" the engines.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Fix build with CONFIG_PERF_EVENTS=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Madhavan Srinivasan [Tue, 18 Jul 2017 21:36:32 +0000 (03:06 +0530)]
powerpc/powernv: Add IMC OPAL APIs
In-Memory Collection (IMC) counters are performance monitoring
infrastructure. These counters need special sequence of SCOMs to
init/start/stop which is handled by OPAL. And OPAL provides three APIs
to init and control these IMC engines.
OPAL API documentation:
https://github.com/open-power/skiboot/blob/master/doc/opal-api/opal-imc-counters.rst
Patch updates the kernel side powernv platform code to support the new
OPAL APIs
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Aneesh Kumar K.V [Wed, 28 Jun 2017 06:09:28 +0000 (11:39 +0530)]
powerpc/mm: Build fix for non SPARSEMEM_VMEMAP config
We can use pfn_to_page() in realmode for other configs. Hence remove the
CONFIG_FLATMEM ifdef.
Fixes: 8e0861fa3c4e ("powerpc: Prepare to support kernel handling of IOMMU map/unmap")
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Also fix up the #endif comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Geliang Tang [Sat, 29 Apr 2017 01:45:14 +0000 (09:45 +0800)]
powerpc/powernv: use memdup_user
Use memdup_user() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 24 Jul 2017 11:46:03 +0000 (21:46 +1000)]
powerpc/pseries: Don't needlessly initialise rv to 0
All cases initialise rv, and if they didn't that would be a bug. By
dropping the initialisation we give the compiler the chance to catch
those bugs for us.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Geliang Tang [Sat, 29 Apr 2017 01:45:15 +0000 (09:45 +0800)]
powerpc/pseries: use memdup_user_nul
Use memdup_user_nul() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Scott Wood [Sun, 25 Jun 2017 02:39:05 +0000 (21:39 -0500)]
powerpc/ipic: Support edge on IRQ0
External IRQ0 (index 48) has the same capabilities as the other IRQ1-7
and is handled by the same register IPIC_SEPNR. When this register is
not specified for "ack" in "ipic_info", you cannot configure this IRQ
as IRQ_TYPE_EDGE_FALLING. This oversight was probably due to the
non-contiguous hwirq numbering of IRQ0 in the IPIC.
Signed-off-by: Jurgen Schindele <schindele@nentec.de>
[scottwood: Cleaned up commit message and posted as a proper patch]
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Laurentiu Tudor [Mon, 17 Jul 2017 13:12:43 +0000 (16:12 +0300)]
powerpc: allow compiling with GENERIC_MSI_IRQ_DOMAIN
This allows building powerpc with the GENERIC_MSI_IRQ_DOMAIN
Kconfig by enabling the asm-generic msi.h in Kbuild. Without
this, there's a compilation error [1] because powerpc, as most
arches, doesn't provide an asm/msi.h.
[1] In file included from ./include/linux/kvm_host.h:20:0,
from ./arch/powerpc/include/asm/kvm_ppc.h:30,
from arch/powerpc/kernel/dbell.c:20:
./include/linux/msi.h:195:21: fatal error: asm/msi.h: No such file or directory
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Santosh Sivaraj [Tue, 4 Jul 2017 04:22:46 +0000 (09:52 +0530)]
powerpc/powernv: Get cpu only after validity check
Check for validity of cpu before calling get_hard_smp_processor_id().
Found with coverity.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Sun, 23 Jul 2017 23:15:17 +0000 (16:15 -0700)]
Linux 4.13-rc2
Linus Torvalds [Sun, 23 Jul 2017 23:06:21 +0000 (16:06 -0700)]
Properly alphabetize MAINTAINERS file
This adds a perl script to actually parse the MAINTAINERS file, clean up
some whitespace in it, warn about errors in it, and then properly sort
the end result.
My perl-fu is atrocious, so the script has basically been created by
randomly putting various characters in a pile, mixing them around, and
then looking it the end result does anything interesting when used as a
perl script.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 23 Jul 2017 22:08:05 +0000 (15:08 -0700)]
Fix up MAINTAINERS file problems
Prepping for scripting the MAINTAINERS file cleanup (and possible split)
showed a couple of cases where the headers for a couple of entries were
bogus.
There's a few different kinds of bogosities:
- the X-GENE SOC EDAC case was confused and split over two lines
- there were four entries for "GREYBUS PROTOCOLS DRIVERS" that were all
different things.
- the NOKIA N900 CAMERA SUPPORT" was duplicated
all of which were more obvious when you started doing associative arrays
in perl to track these things by the header (so that we can alphabetize
this thing properly, and so that we might split it up by the data too).
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 23 Jul 2017 18:22:45 +0000 (11:22 -0700)]
Merge tag 'for-linus-4.13b-rc2-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Some fixes and cleanups for running under Xen"
* tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: don't online new memory initially
xen/x86: fix cpu hotplug
xen/grant-table: log the lack of grants
xen/x86: Don't BUG on CPU0 offlining
Juergen Gross [Mon, 10 Jul 2017 08:10:45 +0000 (10:10 +0200)]
xen/balloon: don't online new memory initially
When setting up the Xenstore watch for the memory target size the new
watch will fire at once. Don't try to reach the configured target size
by onlining new memory in this case, as the current memory size will
be smaller in almost all cases due to e.g. BIOS reserved pages.
Onlining new memory will lead to more problems e.g. undesired conflicts
with NVMe devices meant to be operated as block devices.
Instead remember the difference between target size and current size
when the watch fires for the first time and apply it to any further
size changes, too.
In order to avoid races between balloon.c and xen-balloon.c init calls
do the xen-balloon.c initialization from balloon.c.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Wed, 5 Jul 2017 14:05:20 +0000 (16:05 +0200)]
xen/x86: fix cpu hotplug
Commit
dc6416f1d711eb4c1726e845d653235dcaae12e1 ("xen/x86: Call
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()")
introduced an error leading to a stack overflow of the idle task when
a cpu was brought offline/online many times: by calling
cpu_startup_entry() instead of returning at the end of xen_play_dead()
do_idle() would be entered again and again.
Don't use cpu_startup_entry(), but cpuhp_online_idle() instead allowing
to return from xen_play_dead().
Cc: <stable@vger.kernel.org> # 4.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Wengang Wang [Tue, 18 Jul 2017 07:40:35 +0000 (09:40 +0200)]
xen/grant-table: log the lack of grants
log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).
Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Vitaly Kuznetsov [Mon, 26 Jun 2017 16:39:30 +0000 (18:39 +0200)]
xen/x86: Don't BUG on CPU0 offlining
CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests
BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a
leftover from ancient times when CPU0 hotplug was impossible, it works
just fine for HVM.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Linus Torvalds [Sat, 22 Jul 2017 16:25:00 +0000 (09:25 -0700)]
Merge tag 'hwmon-for-linus-v4.13-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Avoid buffer overruns in applesmc driver"
* tag 'hwmon-for-linus-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (applesmc) Avoid buffer overruns
Linus Torvalds [Sat, 22 Jul 2017 16:00:24 +0000 (09:00 -0700)]
Merge tag 'tty-4.13-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
huge at all, a revert of a patch that turned out to break things, a
fix up for a new tty ioctl we added in 4.13-rc1 to get the uapi
definition correct, and a few minor serial driver fixes for reported
issues.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix TIOCGPTPEER ioctl definition
tty: hide unused pty_get_peer function
tty: serial: lpuart: Fix the logic for detecting the 32-bit type UART
serial: imx: Prevent TX buffer PIO write when a DMA has been started
Revert "serial: imx-serial - move DMA buffer configuration to DT"
serial: sh-sci: Uninitialized variables in sysfs files
serial: st-asc: Potential error pointer dereference
Linus Torvalds [Sat, 22 Jul 2017 15:57:24 +0000 (08:57 -0700)]
Merge tag 'char-misc-4.13-rc1' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc driver fixes for 4.13-rc2. All fix
reported problems with 4.13-rc1 or older kernels (like the binder
fixes). Full details in the shortlog.
All have been in linux-next with no reported issues"
* tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
w1: omap-hdq: fix error return code in omap_hdq_probe()
regmap: regmap-w1: Fix build troubles
w1: Fix slave count on 1-Wire bus (resend)
mux: mux-core: unregister mux_class in mux_exit()
mux: remove the Kconfig question for the subsystem
nvmem: rockchip-efuse: amend compatible rk322x-efuse to rk3228-efuse
drivers/fsi: fix fsi_slave_mode prototype
fsi: core: register with postcore_initcall
thunderbolt: Correct access permissions for active NVM contents
vmbus: re-enable channel tasklet
spmi: pmic-arb: Always allocate ppid_to_apid table
MAINTAINERS: Add entry for SPMI subsystem
spmi: Include OF based modalias in device uevent
binder: Use wake up hint for synchronous transactions.
binder: use group leader instead of open thread
Revert "android: binder: Sanity check at binder ioctl"
Linus Torvalds [Sat, 22 Jul 2017 15:55:16 +0000 (08:55 -0700)]
Merge tag 'usb-4.13-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 4.13-rc2.
The usual batch, gadget fixes for reported issues, as well as xhci
fixes, and a small random collection of other fixes for reported
issues.
All have been in linux-next with no reported issues"
* tag 'usb-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
xhci: fix memleak in xhci_run()
usb: xhci: fix spinlock recursion for USB2 test mode
xhci: fix 20000ms port resume timeout
usb: xhci: Issue stop EP command only when the EP state is running
xhci: Bad Ethernet performance plugged in ASM1042A host
xhci: Fix NULL pointer dereference when cleaning up streams for removed host
usb: renesas_usbhs: gadget: disable all eps when the driver stops
usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
usb: gadget: udc: renesas_usb3: protect usb3_ep->started in usb3_start_pipen()
usb: gadget: udc: renesas_usb3: fix zlp transfer by the dmac
usb: gadget: udc: renesas_usb3: fix free size in renesas_usb3_dma_free_prd()
usb: gadget: f_uac2: endianness fixes.
usb: gadget: f_uac1: endianness fixes.
include: usb: audio: specify exact endiannes of descriptors
usb: gadget: udc: start_udc() can be static
usb: dwc2: gadget: On USB RESET reset device address to zero
usb: storage: return on error to avoid a null pointer dereference
usb: typec: include linux/device.h in ucsi.h
USB: cdc-acm: add device-id for quirky printer
usb: dwc3: gadget: only unmap requests from DMA if mapped
...
Linus Torvalds [Sat, 22 Jul 2017 15:53:24 +0000 (08:53 -0700)]
Merge tag 'staging-4.13-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for reported issues for
4.13-rc2.
Also in here is a new driver, the virtualbox DRM driver. It's
stand-alone and got acks from the DRM developers to go in through this
tree. It's a new thing, but it should be fine for this point in the rc
cycle due to it being independent.
All of this has been in linux-next for a while with no reported
issues"
* tag 'staging-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8188eu: add TL-WN722N v2 support
staging: speakup: safely register and unregister ldisc
staging: speakup: add functions to register and unregister ldisc
staging: speakup: safely close tty
staging: sm750fb: avoid conflicting vesafb
staging: lustre: ko2iblnd: check copy_from_iter/copy_to_iter return code
staging: vboxvideo: Add vboxvideo to drivers/staging
staging: sm750fb: fixed a assignment typo
staging: rtl8188eu: memory leak in rtw_free_cmd_obj()
staging: vchiq_arm: fix error codes in probe
staging: comedi: ni_mio_common: fix AO timer off-by-one regression
Randy Dunlap [Fri, 21 Jul 2017 20:32:27 +0000 (13:32 -0700)]
MAINTAINERS: fix alphabetical ordering
Fix major alphabetic errors. No attempt to fix items that all begin
with the same word (like ARM, BROADCOM, DRM, EDAC, FREESCALE, INTEL,
OMAP, PCI, SAMSUNG, TI, USB, etc.).
(diffstat +/- is different by one line because TI KEYSTONE MULTICORE
had 2 blank lines after it.)
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Jul 2017 23:26:01 +0000 (16:26 -0700)]
Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Stable bugfix:
- Fix error reporting regression
Bugfixes:
- Fix setting filelayout ds address race
- Fix subtle access bug when using ACLs
- Fix setting mnt3_counts array size
- Fix a couple of pNFS commit races"
* tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
NFS: Be more careful about mapping file permissions
NFS: Store the raw NFS access mask in the inode's access cache
NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
NFS: Refactor NFS access to kernel access mask calculation
net/sunrpc/xprt_sock: fix regression in connection error reporting.
nfs: count correct array for mnt3_counts array size
Revert commit
722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")
pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
NFS: Fix another COMMIT race in pNFS
NFS: Fix a COMMIT race in pNFS
mount: copy the port field into the cloned nfs_server structure.
NFS: Don't run wake_up_bit() when nobody is waiting...
nfs: add export operations
Linus Torvalds [Fri, 21 Jul 2017 23:24:22 +0000 (16:24 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"This fixes a crash with SELinux and several other old and new bugs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: check for bad and whiteout index on lookup
ovl: do not cleanup directory and whiteout index entries
ovl: fix xattr get and set with selinux
ovl: remove unneeded check for IS_ERR()
ovl: fix origin verification of index dir
ovl: mark parent impure on ovl_link()
ovl: fix random return value on mount
Linus Torvalds [Fri, 21 Jul 2017 23:20:05 +0000 (16:20 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A small set of fixes for -rc2 - two fixes for BFQ, documentation and
code, and a removal of an unused variable in nbd. Outside of that, a
small collection of fixes from the usual crew on the nvme side"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvmet: don't report 0-bytes in serial number
nvmet: preserve controller serial number between reboots
nvmet: Move serial number from controller to subsystem
nvmet: prefix version configfs file with attr
nvme-pci: Fix an error handling path in 'nvme_probe()'
nvme-pci: Remove nvme_setup_prps BUG_ON
nvme-pci: add another device ID with stripe quirk
nvmet-fc: fix byte swapping in nvmet_fc_ls_create_association
nvme: fix byte swapping in the streams code
nbd: kill unused ret in recv_work
bfq: dispatch request to prevent queue stalling after the request completion
bfq: fix typos in comments about B-WF2Q+ algorithm
Linus Torvalds [Fri, 21 Jul 2017 21:22:05 +0000 (14:22 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma
Pull more rdma fixes from Doug Ledford:
"As per my previous pull request, there were two drivers that each had
a rather large number of legitimate fixes still to be sent.
As it turned out, I also missed a reasonably large set of fixes from
one person across the stack that are all important fixes. All in all,
the bnxt_re, i40iw, and Dan Carpenter are 3/4 to 2/3rds of this pull
request.
There were some other random fixes that I didn't send in the last pull
request that I added to this one. This catches the rdma stack up to
the fixes from up to about the beginning of this week. Any more fixes
I'll wait and batch up later in the -rc cycle. This will give us a
good base to start with for basing a for-next branch on -rc2.
Summary:
- i40iw fixes
- bnxt_re fixes
- Dan Carpenter bugfixes across stack
- ten more random fixes, no more than two from any one person"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
RDMA/core: Initialize port_num in qp_attr
RDMA/uverbs: Fix the check for port number
IB/cma: Fix reference count leak when no ipv4 addresses are set
RDMA/iser: don't send an rkey if all data is written as immadiate-data
rxe: fix broken receive queue draining
RDMA/qedr: Prevent memory overrun in verbs' user responses
iw_cxgb4: don't use WR keys/addrs for 0 byte reads
IB/mlx4: Fix CM REQ retries in paravirt mode
IB/rdmavt: Setting of QP timeout can overflow jiffies computation
IB/core: Fix sparse warnings
RDMA/bnxt_re: Fix the value reported for local ack delay
RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq
RDMA/bnxt_re: Fix return value of poll routine
RDMA/bnxt_re: Enable atomics only if host bios supports
RDMA/bnxt_re: Specify RDMA component when allocating stats context
RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP
RDMA/bnxt_re: Report supported value to IB stack in query_device
RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
...
Linus Torvalds [Fri, 21 Jul 2017 21:16:42 +0000 (14:16 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of fixes for rc2: two imx regressions, vc4 fix, dma-buf fix,
some displayport mst fixes, and an amdkfd fix.
Nothing too crazy, I assume we just haven't see much rc1 testing yet"
* tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux:
drm/mst: Avoid processing partially received up/down message transactions
drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
drm/mst: Fix error handling during MST sideband message reception
drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failure
drm/imx: fix typo in ipu_plane_formats[]
drm/vc4: Fix VBLANK handling in crtc->enable() path
dma-buf/fence: Avoid use of uninitialised timestamp
drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec
drm/radeon: Remove initialization of shared_resources.num_mec
drm/amdkfd: Remove unused references to shared_resources.num_mec
drm/amdgpu: Fix KFD oversubscription by tracking queues correctly
Linus Torvalds [Fri, 21 Jul 2017 20:59:51 +0000 (13:59 -0700)]
Merge tag 'trace-v4.13-rc1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Three minor updates
- Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating
memory for the ring buffer without causing OOMs
- Fix a memory leak in adding and removing instances
- Add __rcu annotation to be able to debug RCU usage of function
tracing a bit better"
* tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
trace: fix the errors caused by incompatible type of RCU variables
tracing: Fix kmemleak in instance_rmdir
tracing/ring_buffer: Try harder to allocate
Linus Torvalds [Fri, 21 Jul 2017 20:58:10 +0000 (13:58 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"A bunch of small fixes for x86"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: hyperv: avoid livelock in oneshot SynIC timers
KVM: VMX: Fix invalid guest state detection after task-switch emulation
x86: add MULTIUSER dependency for KVM
KVM: nVMX: Disallow VM-entry in MOV-SS shadow
KVM: nVMX: track NMI blocking state separately for each VMCS
KVM: x86: masking out upper bits
Linus Torvalds [Fri, 21 Jul 2017 20:54:37 +0000 (13:54 -0700)]
Merge tag 'powerpc-4.13-3' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A handful of fixes, mostly for new code:
- some reworking of the new STRICT_KERNEL_RWX support to make sure we
also remove executable permission from __init memory before it's
freed.
- a fix to some recent optimisations to the hypercall entry where we
were clobbering r12, this was breaking nested guests (PR KVM).
- a fix for the recent patch to opal_configure_cores(). This could
break booting on bare metal Power8 boxes if the kernel was built
without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.
- .. and finally a workaround for spurious PMU interrupts on Power9
DD2.
Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh"
* tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
powerpc/mm/hash: Refactor hash__mark_rodata_ro()
powerpc/mm/radix: Refactor radix__mark_rodata_ro()
powerpc/64s: Fix hypercall entry clobbering r12 input
powerpc/perf: Avoid spurious PMU interrupts after idle
powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
Linus Torvalds [Fri, 21 Jul 2017 18:20:58 +0000 (11:20 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Half of the fixes are for various build time warnings triggered by
randconfig builds. Most (but not all...) were harmless.
There's also:
- ACPI boundary condition fixes
- UV platform fixes
- defconfig updates
- an AMD K6 CPU init fix
- a %pOF printk format related preparatory change
- .. and a warning fix related to the tlb/PCID changes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/devicetree: Convert to using %pOF instead of ->full_name
x86/platform/uv/BAU: Disable BAU on single hub configurations
x86/platform/intel-mid: Fix a format string overflow warning
x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG
x86/build: Silence the build with "make -s"
x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning
x86/fpu/math-emu: Fix possible uninitialized variable use
perf/x86: Shut up false-positive -Wmaybe-uninitialized warning
x86/defconfig: Remove stale, old Kconfig options
x86/ioapic: Pass the correct data to unmask_ioapic_irq()
x86/acpi: Prevent out of bound access caused by broken ACPI tables
x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT
x86/platform/uv/BAU: Fix congested_response_us not taking effect
x86/cpu: Use indirect call to measure performance in init_amd_k6()
Linus Torvalds [Fri, 21 Jul 2017 18:18:09 +0000 (11:18 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"A timer_irq_init() clocksource API robustness fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly
Linus Torvalds [Fri, 21 Jul 2017 18:16:12 +0000 (11:16 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"A cputime fix and code comments/organization fix to the deadline
scheduler"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix confusing comments about selection of top pi-waiter
sched/cputime: Don't use smp_processor_id() in preemptible context
Linus Torvalds [Fri, 21 Jul 2017 18:12:48 +0000 (11:12 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two hw-enablement patches, two race fixes, three fixes for regressions
of semantics, plus a number of tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add proper condition to run sched_task callbacks
perf/core: Fix locking for children siblings group read
perf/core: Fix scheduling regression of pinned groups
perf/x86/intel: Fix debug_store reset field for freq events
perf/x86/intel: Add Goldmont Plus CPU PMU support
perf/x86/intel: Enable C-state residency events for Apollo Lake
perf symbols: Accept zero as the kernel base address
Revert "perf/core: Drop kernel samples even though :u is specified"
perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
perf evsel: State in the default event name if attr.exclude_kernel is set
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
Linus Torvalds [Fri, 21 Jul 2017 18:11:23 +0000 (11:11 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixlet from Ingo Molnar:
"Remove an unnecessary priority adjustment in the rtmutex code"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Remove unnecessary priority adjustment
Trond Myklebust [Thu, 20 Jul 2017 21:00:02 +0000 (17:00 -0400)]
NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
We must set fl->dsaddr once, and once only, even if there are multiple
processes calling filelayout_check_deviceid() for the same layout
segment.
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Fri, 21 Jul 2017 18:07:41 +0000 (11:07 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"A resume_irq() fix, plus a number of static declaration fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/digicolor: Drop unnecessary static
irqchip/mips-cpu: Drop unnecessary static
irqchip/gic/realview: Drop unnecessary static
irqchip/mips-gic: Remove population of irq domain names
genirq/PM: Properly pretend disabled state when force resuming interrupts
Linus Torvalds [Fri, 21 Jul 2017 17:41:19 +0000 (10:41 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
"A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debug: Fix WARN_ON_ONCE() for modules
MAINTAINERS: Update the PTRACE entry
Trond Myklebust [Tue, 11 Jul 2017 21:54:35 +0000 (17:54 -0400)]
NFS: Be more careful about mapping file permissions
When mapping a directory, we want the MAY_WRITE permissions to reflect
whether or not we have permission to modify, add and delete the directory
entries. MAY_EXEC must map to lookup permissions.
On the other hand, for files, we want MAY_WRITE to reflect a permission
to modify and extend the file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 11 Jul 2017 21:54:34 +0000 (17:54 -0400)]
NFS: Store the raw NFS access mask in the inode's access cache
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 11 Jul 2017 21:54:33 +0000 (17:54 -0400)]
NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 11 Jul 2017 21:54:32 +0000 (17:54 -0400)]
NFS: Refactor NFS access to kernel access mask calculation
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Wed, 19 Jul 2017 04:05:01 +0000 (14:05 +1000)]
net/sunrpc/xprt_sock: fix regression in connection error reporting.
Commit
3d4762639dd3 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.
This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.
As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind. A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying. If it saw the ECONNREFUSED, it would abort.
To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().
Fixes: 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Eryu Guan [Tue, 18 Jul 2017 05:32:32 +0000 (13:32 +0800)]
nfs: count correct array for mnt3_counts array size
Array size of mnt3_counts should be the size of array
mnt3_procedures, not mnt_procedures, though they're same in size
right now. Found this by code inspection.
Fixes: 1c5876ddbdb4 ("sunrpc: move p_count out of struct rpc_procinfo")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Rob Herring [Tue, 18 Jul 2017 21:42:47 +0000 (16:42 -0500)]
x86/devicetree: Convert to using %pOF instead of ->full_name
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each device node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devicetree@vger.kernel.org
Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
[ Clarify the error message while at it, as 'node' is ambiguous. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Wed, 19 Jul 2017 07:52:47 +0000 (09:52 +0200)]
perf/x86/intel: Add proper condition to run sched_task callbacks
We have 2 functions using the same sched_task callback:
- PEBS drain for free running counters
- LBR save/store
Both of them are called from intel_pmu_sched_task() and
either of them can be unwillingly triggered when the
other one is configured to run.
Let's say there's PEBS drain configured in sched_task
callback for the event, but in the callback itself
(intel_pmu_sched_task()) we will also run the code for
LBR save/restore, which we did not ask for, but the
code in intel_pmu_sched_task() does not check for that.
This can lead to extra cycles in some perf monitoring,
like when we monitor PEBS event without LBR data.
# perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l
1000000
(We need PEBS, non freq/non timestamp event to enable
the sched_task callback)
The perf stat of cycles and msr:write_msr for above
command before the change:
...
Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
./perf bench sched pipe -l
1000000' (5 runs):
18,519,557,441 cycles:k
91,195,527 msr:write_msr
29.
334476406 seconds time elapsed
And after the change:
...
Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
./perf bench sched pipe -l
1000000' (5 runs):
18,704,973,540 cycles:k
27,184,720 msr:write_msr
16.
977875900 seconds time elapsed
There's no affect on cycles:k because the sched_task happens
with events switched off, however the msr:write_msr tracepoint
counter together with almost 50% of time speedup show the
improvement.
Monitoring LBR event and having extra PEBS drain processing
in sched_task callback showed just a little speedup, because
the drain function does not do much extra work in case there
is no PEBS data.
Adding conditions to recognize the configured work that needs
to be done in the x86_pmu's sched_task callback.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170719075247.GA27506@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>