Aneesh Kumar K.V [Wed, 13 Aug 2014 07:02:03 +0000 (12:32 +0530)]
powerpc/mm: Use read barrier when creating real_pte
On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:02:02 +0000 (12:32 +0530)]
powerpc/thp: Use ACCESS_ONCE when loading pmdp
We would get wrong results in compiler recomputed old_pmd. Avoid
that by using ACCESS_ONCE
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:02:01 +0000 (12:32 +0530)]
powerpc/thp: Invalidate with vpn in loop
As per ISA, for 4k base page size we compare 14..65 bits of VA specified
with the entry_VA in tlb. That implies we need to make sure we do a
tlbie with all the possible 4k va we used to access the 16MB hugepage.
With 64k base page size we compare 14..57 bits of VA. Hence we cannot
ignore the lower 24 bits of va while tlbie .We also cannot tlb
invalidate a 16MB entry with just one tlbie instruction because
we don't track which va was used to instantiate the tlb entry.
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:02:00 +0000 (12:32 +0530)]
powerpc/thp: Handle combo pages in invalidate
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.
We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.
Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:01:59 +0000 (12:31 +0530)]
powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault
for these pages.
We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.
Handle this correctly for 16M pages
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:01:58 +0000 (12:31 +0530)]
powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
The segment identifier and segment size will remain the same in
the loop, So we can compute it outside. We also change the
hugepage_invalidate interface so that we can use it the later patch
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:01:57 +0000 (12:31 +0530)]
powerpc/thp: Add write barrier after updating the valid bit
With hugepages, we store the hpte valid information in the pte page
whose address is stored in the second half of the PMD. Use a
write barrier to make sure clearing pmd busy bit and updating
hpte valid info are ordered properly.
CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Thu, 17 Jul 2014 23:15:12 +0000 (16:15 -0700)]
powerpc: reorder per-cpu NUMA information's initialization
There is an issue currently where NUMA information is used on powerpc
(and possibly ia64) before it has been read from the device-tree, which
leads to large slab consumption with CONFIG_SLUB and memoryless nodes.
NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
after start_secondary(), similar to ia64, which is invoked via
smp_init().
Commit
6ee0578b4daae ("workqueue: mark init_workqueues() as
early_initcall()") made init_workqueues() be invoked via
do_pre_smp_initcalls(), which is obviously before the secondary
processors are online.
Additionally, the following commits changed init_workqueues() to use
cpu_to_node to determine the node to use for kthread_create_on_node:
bce903809ab3f ("workqueue: add wq_numa_tbl_len and
wq_numa_possible_cpumask[]")
f3f90ad469342 ("workqueue: determine NUMA node of workers accourding to
the allowed cpumask")
Therefore, when init_workqueues() runs, it sees all CPUs as being on
Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
a high number of slab deactivations
(http://www.spinics.net/lists/linux-mm/msg67489.html).
Fix this by initializing the powerpc-specific CPU<->node/local memory
node mapping as early as possible, which on powerpc is
do_init_bootmem(). Currently that function initializes the mapping for
the boot CPU, but we extend it to setup the mapping for all possible
CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
per-cpu values for all possible CPUs. That ensures that before the
early_initcalls run (and really as early as possible), the per-cpu NUMA
mapping is accurate.
While testing memoryless nodes on PowerKVM guests with a fix to the
workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
guest topology of:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 1 size: 16336 MB
node 1 free: 15329 MB
node distances:
node 0 1
0: 10 40
1: 40 10
the slab consumption decreases from
Slab: 932416 kB
SUnreclaim: 902336 kB
to
Slab: 395264 kB
SUnreclaim: 359424 kB
And we a corresponding increase in the slab efficiency from
slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-16384 337 MB 11.28% 100.00%
task_struct 288 MB 9.93% 100.00%
to
slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-16384 37 MB 100.00% 100.00%
task_struct 31 MB 100.00% 100.00%
Powerpc didn't support memoryless nodes until recently (
64bb80d87f01
"powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and
8c272261194d
"powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Himangi Saraogi [Tue, 22 Jul 2014 18:10:19 +0000 (23:40 +0530)]
powerpc/perf/hv-24x7: Use kmem_cache_free
Free memory allocated using kmem_cache_zalloc using kmem_cache_free
rather than kfree.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@
expression x,E,c;
@@
x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
... when != x = E
when != &x
?-kfree(x)
+kmem_cache_free(c,x)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Thomas Falcon [Tue, 5 Aug 2014 21:42:39 +0000 (16:42 -0500)]
powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
A buffer returned by H_VTERM_PARTNER_INFO contains device information
in big endian format, causing problems for little endian architectures.
This patch ensures that they are in cpu endian.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 5 Aug 2014 04:55:00 +0000 (14:55 +1000)]
powerpc: Hard disable interrupts in xmon
xmon only soft disables interrupts. This seems like a bad idea - we
certainly don't want decrementer and PMU exceptions going off when
we are debugging something inside xmon.
This issue was uncovered when the hard lockup detector went off
inside xmon. To ensure we wont get a spurious hard lockup warning,
I also call touch_nmi_watchdog() when exiting xmon.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nishanth Aravamudan [Mon, 11 Aug 2014 23:43:18 +0000 (16:43 -0700)]
powerpc: remove duplicate definition of TEXASR_FS
It appears that commits
7f06f21d40a6 ("powerpc/tm: Add checking to
treclaim/trechkpt") and
e4e38121507a ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 11 Aug 2014 09:16:20 +0000 (19:16 +1000)]
powerpc/pseries: Avoid deadlock on removing ddw
Function remove_ddw() could be called in of_reconfig_notifier and
we potentially remove the dynamic DMA window property, which invokes
of_reconfig_notifier again. Eventually, it leads to the deadlock as
following backtrace shows.
The patch fixes the above issue by deferring releasing the dynamic
DMA window property while releasing the device node.
=============================================
[ INFO: possible recursive locking detected ]
3.16.0+ #428 Tainted: G W
---------------------------------------------
drmgr/2273 is trying to acquire lock:
((of_reconfig_chain).rwsem){.+.+..}, at: [<
c000000000091890>] \
.__blocking_notifier_call_chain+0x40/0x78
but task is already holding lock:
((of_reconfig_chain).rwsem){.+.+..}, at: [<
c000000000091890>] \
.__blocking_notifier_call_chain+0x40/0x78
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock((of_reconfig_chain).rwsem);
lock((of_reconfig_chain).rwsem);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by drmgr/2273:
#0: (sb_writers#4){.+.+.+}, at: [<
c0000000001cbe70>] \
.vfs_write+0xb0/0x1f8
#1: ((of_reconfig_chain).rwsem){.+.+..}, at: [<
c000000000091890>] \
.__blocking_notifier_call_chain+0x40/0x78
stack backtrace:
CPU: 17 PID: 2273 Comm: drmgr Tainted: G W 3.16.0+ #428
Call Trace:
[
c0000000137e7000] [
c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[
c0000000137e70b0] [
c00000000083cd34] .dump_stack+0x7c/0x9c
[
c0000000137e7130] [
c0000000000b8afc] .__lock_acquire+0x128c/0x1c68
[
c0000000137e7280] [
c0000000000b9a4c] .lock_acquire+0xe8/0x104
[
c0000000137e7350] [
c00000000083588c] .down_read+0x4c/0x90
[
c0000000137e73e0] [
c000000000091890] .__blocking_notifier_call_chain+0x40/0x78
[
c0000000137e7490] [
c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[
c0000000137e7520] [
c000000000682a28] .of_reconfig_notify+0x34/0x5c
[
c0000000137e75b0] [
c000000000682a9c] .of_property_notify+0x4c/0x54
[
c0000000137e7650] [
c000000000682bf0] .of_remove_property+0x30/0xd4
[
c0000000137e76f0] [
c000000000052a44] .remove_ddw+0x144/0x168
[
c0000000137e7790] [
c000000000053204] .iommu_reconfig_notifier+0x30/0xe0
[
c0000000137e7820] [
c00000000009137c] .notifier_call_chain+0x6c/0xb4
[
c0000000137e78c0] [
c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78
[
c0000000137e7970] [
c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[
c0000000137e7a00] [
c000000000682a28] .of_reconfig_notify+0x34/0x5c
[
c0000000137e7a90] [
c000000000682e14] .of_detach_node+0x44/0x1fc
[
c0000000137e7b40] [
c0000000000518e4] .ofdt_write+0x3ac/0x688
[
c0000000137e7c20] [
c000000000238430] .proc_reg_write+0xb8/0xd4
[
c0000000137e7cd0] [
c0000000001cbeac] .vfs_write+0xec/0x1f8
[
c0000000137e7d70] [
c0000000001cc3b0] .SyS_write+0x58/0xa0
[
c0000000137e7e30] [
c00000000000a064] syscall_exit+0x0/0x98
Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Mon, 11 Aug 2014 09:16:19 +0000 (19:16 +1000)]
powerpc/pseries: Failure on removing device node
While running command "drmgr -c phb -r -s 'PHB 528'", following
backtrace jumped out because the target device node isn't marked
with OF_DETACHED by of_detach_node(), which caused by error
returned from memory hotplug related reconfig notifier when
disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.
ERROR: Bad of_node_put() on /pci@
800000020000210/ethernet@0
CPU: 14 PID: 2252 Comm: drmgr Tainted: G W 3.16.0+ #427
Call Trace:
[
c000000012a776a0] [
c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[
c000000012a77750] [
c00000000083cd34] .dump_stack+0x7c/0x9c
[
c000000012a777d0] [
c0000000006807c4] .of_node_release+0x58/0xe0
[
c000000012a77860] [
c00000000038a7d0] .kobject_release+0x174/0x1b8
[
c000000012a77900] [
c00000000038a884] .kobject_put+0x70/0x78
[
c000000012a77980] [
c000000000681680] .of_node_put+0x28/0x34
[
c000000012a77a00] [
c000000000681ea8] .__of_get_next_child+0x64/0x70
[
c000000012a77a90] [
c000000000682138] .of_find_node_by_path+0x1b8/0x20c
[
c000000012a77b40] [
c000000000051840] .ofdt_write+0x308/0x688
[
c000000012a77c20] [
c000000000238430] .proc_reg_write+0xb8/0xd4
[
c000000012a77cd0] [
c0000000001cbeac] .vfs_write+0xec/0x1f8
[
c000000012a77d70] [
c0000000001cc3b0] .SyS_write+0x58/0xa0
[
c000000012a77e30] [
c00000000000a064] syscall_exit+0x0/0x98
Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 11 Aug 2014 04:37:22 +0000 (14:37 +1000)]
powerpc/boot: Use correct zlib types for comparison
Avoids this warning:
arch/powerpc/boot/gunzip_util.c:118:9: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Vasant Hegde [Sat, 9 Aug 2014 05:45:45 +0000 (11:15 +0530)]
powerpc/powernv: Interface to register/unregister opal dump region
PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.
This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Vasant Hegde [Sat, 9 Aug 2014 05:45:30 +0000 (11:15 +0530)]
printk: Add function to return log buffer address and size
Platforms like IBM Power Systems supports service processor
assisted dump. It provides interface to add memory region to
be captured when system is crashed.
During initialization/running we can add kernel memory region
to be collected.
Presently we don't have a way to get the log buffer base address
and size. This patch adds support to return log buffer address
and size.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Michael Ellerman [Wed, 6 Aug 2014 05:42:17 +0000 (15:42 +1000)]
powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS
We have been a bit slack about updating the CPU_FTRS_POSSIBLE and
CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot
to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to
update both the POSSIBLE and ALWAYS masks.
Luckily this hasn't caused any actual bugs AFAICS. Failing to update the
ALWAYS mask just forgoes a potential optimisation opportunity. Failing
to update the POSSIBLE mask for POWER8_DD1 is also OK because it only
removes a bit rather than adding any.
Regardless they should all be in both masks so as to avoid any future
bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks
themselves change.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alistair Popple [Wed, 6 Aug 2014 07:03:09 +0000 (17:03 +1000)]
powerpc/ppc476: Disable BTAC
This patch disables the branch target address CAM which under specific
circumstances may cause the processor to skip execution of 1-4
instructions. This fixes IBM Erratum #47.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Wed, 6 Aug 2014 07:10:16 +0000 (17:10 +1000)]
powerpc/powernv: Fix IOMMU group lost
When we take full hotplug to recover from EEH errors, PCI buses
could be involved. For the case, the child devices of involved
PCI buses can't be attached to IOMMU group properly, which is
caused by commit
3f28c5a ("powerpc/powernv: Reduce multi-hit of
iommu_add_device()").
When adding the PCI devices of the newly created PCI buses to
the system, the IOMMU group is expected to be added in (C).
(A) fails to bind the IOMMU group because bus->is_added is
false. (B) fails because the device doesn't have binding IOMMU
table yet. bus->is_added is set to true at end of (C) and
pdev->is_added is set to true at (D).
pcibios_add_pci_devices()
pci_scan_bridge()
pci_scan_child_bus()
pci_scan_slot()
pci_scan_single_device()
pci_scan_device()
pci_device_add()
pcibios_add_device() A: Ignore
device_add() B: Ignore
pcibios_fixup_bus()
pcibios_setup_bus_devices()
pcibios_setup_device() C: Hit
pcibios_finish_adding_to_bus()
pci_bus_add_devices()
pci_bus_add_device() D: Add device
If the parent PCI bus isn't involved in hotplug, the IOMMU
group is expected to be bound in (B). (A) should fail as the
sysfs entries aren't populated.
The patch fixes the issue by reverting commit
3f28c5a and remove
WARN_ON() in iommu_add_device() to allow calling the function
even the specified device already has associated IOMMU group.
Cc: <stable@vger.kernel.org> # 3.16+
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Thu, 7 Aug 2014 05:36:18 +0000 (15:36 +1000)]
powerpc: Add smp_mb()s to arch_spin_unlock_wait()
Similar to the previous commit which described why we need to add a
barrier to arch_spin_is_locked(), we have a similar problem with
spin_unlock_wait().
We need a barrier on entry to ensure any spinlock we have previously
taken is visibly locked prior to the load of lock->slock.
It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
semantics. For now be conservative and add a barrier on exit to give it
ACQUIRE semantics.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Thu, 7 Aug 2014 05:36:17 +0000 (15:36 +1000)]
powerpc: Add smp_mb() to arch_spin_is_locked()
The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.
Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.
There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.
Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:
The first CPU does:
spin_lock(*A)
if spin_is_locked(*B)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*A)
And the second CPU does:
spin_lock(*B)
if spin_is_locked(*A)
# nothing
else
smp_mb()
LOAD r = *COUNTER
r++
STORE *COUNTER = r
spin_unlock(*B)
Although this is a strange locking construct, it should work.
It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().
For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:
bool spin_is_locked(*LOCK) {
LOAD l = *LOCK
return l.locked
}
Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:
CPU 0 CPU 1
==================================================
spin_lock(*A) spin_lock(*B)
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:
CPU 0 CPU 1
==================================================
spin_lock(*A)
LOAD b = *B
spin_lock(*B)
if b.locked # false LOAD a = *A
else if a.locked # true
smp_mb() # nothing
LOAD r1 = *COUNTER spin_unlock(*B)
r1++
STORE *COUNTER = r1
spin_unlock(*A)
However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.
Ignoring the retry logic for the lost reservation case, it boils down to:
spin_lock(*LOCK) {
LOAD l = *LOCK
l.locked = true
STORE *LOCK = l
ACQUIRE_BARRIER
}
The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:
This acts as a one-way permeable barrier. It guarantees that all
memory operations after the ACQUIRE operation will appear to happen
after the ACQUIRE operation with respect to the other components of
the system.
On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".
As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.
Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.
What this means in practice is that the following can occur:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
LOAD b = *B LOAD a = *A
STORE *A = a STORE *B = b
if b.locked # false if a.locked # false
else else
smp_mb() smp_mb()
LOAD r1 = *COUNTER LOAD r2 = *COUNTER
r1++ r2++
STORE *COUNTER = r1
STORE *COUNTER = r2 # Lost update
spin_unlock(*A) spin_unlock(*B)
That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.
The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:
bool spin_is_locked(*LOCK) {
smp_mb()
LOAD l = *LOCK
return l.locked
}
The new barrier orders the store to the lock we are locking vs the load
of the other lock:
CPU 0 CPU 1
==================================================
LOAD a = *A LOAD b = *B
a.locked = true b.locked = true
STORE *A = a STORE *B = b
smp_mb() smp_mb()
LOAD b = *B LOAD a = *A
if b.locked # true if a.locked # true
# nothing # nothing
spin_unlock(*A) spin_unlock(*B)
Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Guenter Roeck [Sat, 9 Aug 2014 05:22:12 +0000 (22:22 -0700)]
powerpc: Fix "attempt to move .org backwards" error
Once again, we see
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:865: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:866: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:890: Error: attempt to move .org backwards
when compiling ppc:allmodconfig.
This time the problem has been caused by to commit
0869b6fd209bda
("powerpc/book3s: Add basic infrastructure to handle HMI in Linux"),
which adds functions hmi_exception_early and hmi_exception_after_realmode
into a critical (size-limited) code area, even though that does not appear
to be necessary.
Move those functions to a non-critical area of the file.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Scott Wood [Fri, 8 Aug 2014 23:44:01 +0000 (18:44 -0500)]
powerpc/nohash: Split __early_init_mmu() into boot and secondary
__early_init_mmu() does some things that are really only needed by the
boot cpu. On FSL booke, This includes calling
memblock_enforce_memory_limit(), which is labelled __init. Secondary
cpu init code can't be __init as that would break CPU hotplug.
While it's probably a bug that memblock_enforce_memory_limit() isn't
__init_memblock instead, there's no reason why we should be doing this
stuff for secondary cpus in the first place.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 10 Aug 2014 18:13:58 +0000 (11:13 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson:
"Updates to the Chromebook/box platform drivers:
- a bugfix to pstore registration that makes it also work on
non-Google systems
- addition of new shipped Chromebooks (later models have more probing
through ACPI so the need for these updates will be less over time).
- A couple of minor coding style updates"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
platform/chrome: chromeos_laptop - Add a limit for deferred retries
platform/chrome: Add support for the acer c720p touchscreen.
platform/chrome: pstore: fix dmi table to match all chrome systems
platform/chrome: coding style fixes
platform/chrome: chromeos_laptop - Add Toshiba CB35 Touch
platform/chrome: chromeos_laptop - Add Dell Chromebook 11 touch
platform/chrome: chromeos_laptop - Add HP Chromebook 14
platform/chrome: chromeos_laptop - Add support for Acer C720
Linus Torvalds [Sun, 10 Aug 2014 18:13:06 +0000 (11:13 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
- a short branch of OMAP fixes that we didn't merge before the window
opened.
- a small cleanup that sorts the rk3288 dts entries properly
- a build fix due to a reference to a removed DT node on exynos
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: exynos5420: remove disp_pd
ARM: EXYNOS: Fix suspend/resume sequences
ARM: dts: Fix the sort ordering of EHCI and HSIC in rk3288.dtsi
ARM: OMAP3: Fix coding style problems in arch/arm/mach-omap2/control.c
ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
ARM: OMAP2+: clock: allow omap2_dpll_round_rate() to round to next-lowest rate
Linus Torvalds [Sun, 10 Aug 2014 00:46:39 +0000 (17:46 -0700)]
Merge branch 'linux-3.17' of git://anongit.freedesktop.org/git/nouveau/linux-2.6
Pull nouveau drm updates from Ben Skeggs:
"Apologies for not getting this done in time for Dave's drm-next merge
window. As he mentioned, a pre-existing bug reared its head a lot
more obviously after this lot of changes. It took quite a bit of time
to track it down. In any case, Dave suggested I try my luck by
sending directly to you this time.
Overview:
- more code for Tegra GK20A from NVIDIA - probing, reclockig
- better fix for Kepler GPUs that have the graphics engine powered
off on startup, method courtesy of info provided by NVIDIA
- unhardcoding of a bunch of graphics engine setup on
Fermi/Kepler/Maxwell, will hopefully solve some issues people have
noticed on higher-end models
- support for "Zero Bandwidth Clear" on Fermi/Kepler/Maxwell, needs
userspace support in general, but some lucky apps will benefit
automagically
- reviewed/exposed the full object APIs to userspace (finally), gives
it access to perfctrs, ZBC controls, various events. More to come
in the future.
- various other fixes"
Acked-by: Dave Airlie <airlied@redhat.com>
* 'linux-3.17' of git://anongit.freedesktop.org/git/nouveau/linux-2.6: (87 commits)
drm/nouveau: expose the full object/event interfaces to userspace
drm/nouveau: fix headless mode
drm/nouveau: hide sysfs pstate file behind an option again
drm/nv50/disp: shhh compiler
drm/gf100-/gr: implement the proper SetShaderExceptions method
drm/gf100-/gr: remove some broken ltc bashing, for now
drm/gf100-/gr: unhardcode attribute cb config
drm/gf100-/gr: fetch tpcs-per-ppc info on startup
drm/gf100-/gr: unhardcode pagepool config
drm/gf100-/gr: unhardcode bundle cb config
drm/gf100-/gr: improve initial context patch list helpers
drm/gf100-/gr: add support for zero bandwidth clear
drm/nouveau/ltc: add zbc drivers
drm/nouveau/ltc: s/ltcg/ltc/ + cleanup
drm/nouveau: use ram info from nvif_device
drm/nouveau/disp: implement nvif event sources for vblank/connector notifiers
drm/nouveau/disp: allow user direct access to channel control registers
drm/nouveau/disp: audit and version display classes
drm/nouveau/disp: audit and version SCANOUTPOS method
drm/nv50-/disp: audit and version PIOR_PWR method
...
Linus Torvalds [Sun, 10 Aug 2014 00:33:44 +0000 (17:33 -0700)]
Merge tag 'trace-ipi-tracepoints' of git://git./linux/kernel/git/rostedt/linux-trace
Pull IPI tracepoints for ARM from Steven Rostedt:
"Nicolas Pitre added generic tracepoints for tracing IPIs and updated
the arm and arm64 architectures. It required some minor updates to
the generic tracepoint system, so it had to wait for me to implement
them"
* tag 'trace-ipi-tracepoints' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ARM64: add IPI tracepoints
ARM: add IPI tracepoints
tracepoint: add generic tracepoint definitions for IPI tracing
tracing: Do not do anything special with tracepoint_string when tracing is disabled
Linus Torvalds [Sun, 10 Aug 2014 00:29:36 +0000 (17:29 -0700)]
Merge tag 'trace-fixes-3.16' of git://git./linux/kernel/git/rostedt/linux-trace
Pull trace file read iterator fixes from Steven Rostedt:
"This contains a fix for two long standing bugs. Both of which are
rarely ever hit, and requires the user to do something that users
rarely do. It took a few special test cases to even trigger this bug,
and one of them was just one test in the process of finishing up as
another one started.
Both bugs have to do with the ring buffer iterator rb_iter_peek(), but
one is more indirect than the other.
The fist bug fix is simply an increase in the safety net loop counter.
The counter makes sure that the rb_iter_peek() only iterates the
number of times we expect it can, and no more. Well, there was one
way it could iterate one more than we expected, and that caused the
ring buffer to shutdown with a nasty warning. The fix was simply to
up that counter by one.
The other bug has to be with rb_iter_reset() (called by
rb_iter_peek()). This happens when a user reads both the trace_pipe
and trace files. The trace_pipe is a consuming read and does not use
the ring buffer iterator, but the trace file is not a consuming read
and does use the ring buffer iterator. When the trace file is being
read, if it detects that a consuming read occurred, it resets the
iterator and starts over. But the reset code that does this
(rb_iter_reset()), checks if the reader_page is linked to the ring
buffer or not, and will look into the ring buffer itself if it is not.
This is wrong, as it should always try to read the reader page first.
Not to mention, the code that looked into the ring buffer did it
wrong, and used the header_page "read" offset to start reading on that
page. That offset is bogus for pages in the writable ring buffer, and
was corrupting the iterator, and it would start returning bogus
events"
* tag 'trace-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Always reset iterator to reader page
ring-buffer: Up rb_iter_peek() loop count to 3
Linus Torvalds [Sun, 10 Aug 2014 00:10:41 +0000 (17:10 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
"This is a bunch of small changes built against 3.16-rc6. The most
significant change for users is the first patch which makes setns
drmatically faster by removing unneded rcu handling.
The next chunk of changes are so that "mount -o remount,.." will not
allow the user namespace root to drop flags on a mount set by the
system wide root. Aks this forces read-only mounts to stay read-only,
no-dev mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec
mounts to stay no exec and it prevents unprivileged users from messing
with a mounts atime settings. I have included my test case as the
last patch in this series so people performing backports can verify
this change works correctly.
The next change fixes a bug in NFS that was discovered while auditing
nsproxy users for the first optimization. Today you can oops the
kernel by reading /proc/fs/nfsfs/{servers,volumes} if you are clever
with pid namespaces. I rebased and fixed the build of the
!CONFIG_NFS_FS case yesterday when a build bot caught my typo. Given
that no one to my knowledge bases anything on my tree fixing the typo
in place seems more responsible that requiring a typo-fix to be
backported as well.
The last change is a small semantic cleanup introducing
/proc/thread-self and pointing /proc/mounts and /proc/net at it. This
prevents several kinds of problemantic corner cases. It is a
user-visible change so it has a minute chance of causing regressions
so the change to /proc/mounts and /proc/net are individual one line
commits that can be trivially reverted. Unfortunately I lost and
could not find the email of the original reporter so he is not
credited. From at least one perspective this change to /proc/net is a
refgression fix to allow pthread /proc/net uses that were broken by
the introduction of the network namespace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Point /proc/mounts at /proc/thread-self/mounts instead of /proc/self/mounts
proc: Point /proc/net at /proc/thread-self/net instead of /proc/self/net
proc: Implement /proc/thread-self to point at the directory of the current thread
proc: Have net show up under /proc/<tgid>/task/<tid>
NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes
mnt: Add tests for unprivileged remount cases that have found to be faulty
mnt: Change the default remount atime from relatime to the existing value
mnt: Correct permission checks in do_remount
mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
mnt: Only change user settable mount flags in remount
namespaces: Use task_lock and not rcu to protect nsproxy
Linus Torvalds [Sat, 9 Aug 2014 22:09:52 +0000 (15:09 -0700)]
Merge branch 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux
Pull SElinux fixes from Paul Moore:
"Two small patches to fix a couple of build warnings in SELinux and
NetLabel. The patches are obvious enough that I don't think any
additional explanation is necessary, but it basically boils down to
the usual: I was stupid, and these patches fix some of the stupid.
Both patches were posted earlier this week to the SELinux list, and
that is where they sat as I didn't think there were noteworthy enough
to go upstream at this point in time, but DaveM would rather see them
upstream now so who am I to argue. As the patches are both very
small"
* 'stable-3.17' of git://git.infradead.org/users/pcmoore/selinux:
selinux: remove unused variabled in the netport, netnode, and netif caches
netlabel: fix the netlbl_catmap_setlong() dummy function
Linus Torvalds [Sat, 9 Aug 2014 21:31:18 +0000 (14:31 -0700)]
Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"This includes a major rewrite of the NFSv4 state code, which has
always depended on a single mutex. As an example, open creates are no
longer serialized, fixing a performance regression on NFSv3->NFSv4
upgrades. Thanks to Jeff, Trond, and Benny, and to Christoph for
review.
Also some RDMA fixes from Chuck Lever and Steve Wise, and
miscellaneous fixes from Kinglong Mee and others"
* 'for-3.17' of git://linux-nfs.org/~bfields/linux: (167 commits)
svcrdma: remove rdma_create_qp() failure recovery logic
nfsd: add some comments to the nfsd4 object definitions
nfsd: remove the client_mutex and the nfs4_lock/unlock_state wrappers
nfsd: remove nfs4_lock_state: nfs4_state_shutdown_net
nfsd: remove nfs4_lock_state: nfs4_laundromat
nfsd: Remove nfs4_lock_state(): reclaim_complete()
nfsd: Remove nfs4_lock_state(): setclientid, setclientid_confirm, renew
nfsd: Remove nfs4_lock_state(): exchange_id, create/destroy_session()
nfsd: Remove nfs4_lock_state(): nfsd4_open and nfsd4_open_confirm
nfsd: Remove nfs4_lock_state(): nfsd4_delegreturn()
nfsd: Remove nfs4_lock_state(): nfsd4_open_downgrade + nfsd4_close
nfsd: Remove nfs4_lock_state(): nfsd4_lock/locku/lockt()
nfsd: Remove nfs4_lock_state(): nfsd4_release_lockowner
nfsd: Remove nfs4_lock_state(): nfsd4_test_stateid/nfsd4_free_stateid
nfsd: Remove nfs4_lock_state(): nfs4_preprocess_stateid_op()
nfsd: remove old fault injection infrastructure
nfsd: add more granular locking to *_delegations fault injectors
nfsd: add more granular locking to forget_openowners fault injector
nfsd: add more granular locking to forget_locks fault injector
nfsd: add a list_head arg to nfsd_foreach_client_lock
...
Linus Torvalds [Sat, 9 Aug 2014 20:03:34 +0000 (13:03 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French:
"The most visible change in this set is the additional of multi-credit
support for SMB2/SMB3 which dramatically improves the large file i/o
performance for these dialects and significantly increases the maximum
i/o size used on the wire for SMB2/SMB3.
Also reconnection behavior after network failure is improved"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (35 commits)
Add worker function to set allocation size
[CIFS] Fix incorrect hex vs. decimal in some debug print statements
update CIFS TODO list
Add Pavel to contributor list in cifs AUTHORS file
Update cifs version
CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
CIFS: Optimize readpages in a short read case on reconnects
CIFS: Optimize cifs_user_read() in a short read case on reconnects
CIFS: Improve indentation in cifs_user_read()
CIFS: Fix possible buffer corruption in cifs_user_read()
CIFS: Count got bytes in read_into_pages()
CIFS: Use separate var for the number of bytes got in async read
CIFS: Indicate reconnect with ECONNABORTED error code
CIFS: Use multicredits for SMB 2.1/3 reads
CIFS: Fix rsize usage for sync read
CIFS: Fix rsize usage in user read
CIFS: Separate page reading from user read
CIFS: Fix rsize usage in readpages
CIFS: Separate page search from readpages
CIFS: Use multicredits for SMB 2.1/3 writes
...
Ben Skeggs [Sat, 9 Aug 2014 18:10:31 +0000 (04:10 +1000)]
drm/nouveau: expose the full object/event interfaces to userspace
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:31 +0000 (04:10 +1000)]
drm/nouveau: fix headless mode
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:30 +0000 (04:10 +1000)]
drm/nouveau: hide sysfs pstate file behind an option again
No-one has yet had time to move this to debugfs as discussed during
the last merge window. Until this happens, hide the option to make
it clear it's not going to be here forever.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:30 +0000 (04:10 +1000)]
drm/nv50/disp: shhh compiler
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:30 +0000 (04:10 +1000)]
drm/gf100-/gr: implement the proper SetShaderExceptions method
We have another version of it implemented in SW, however, that version
isn't serialised with normal PGRAPH operation and can possibly clobber
the enables for another context.
This is the same method that's implemented by the NVIDIA binary driver.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:30 +0000 (04:10 +1000)]
drm/gf100-/gr: remove some broken ltc bashing, for now
... and hope that the defaults are good enough. This was always
supposed to be a read/modify/write thing anyway, so we're writing
very wrong stuff for some boards already.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:30 +0000 (04:10 +1000)]
drm/gf100-/gr: unhardcode attribute cb config
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:29 +0000 (04:10 +1000)]
drm/gf100-/gr: fetch tpcs-per-ppc info on startup
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:29 +0000 (04:10 +1000)]
drm/gf100-/gr: unhardcode pagepool config
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:29 +0000 (04:10 +1000)]
drm/gf100-/gr: unhardcode bundle cb config
Should be the same values as before, except:
GF117 has smaller buffer allocated, as per register setup.
GK20A now uses values from Tegra driver, not GK104's.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:29 +0000 (04:10 +1000)]
drm/gf100-/gr: improve initial context patch list helpers
Removes need for fixed buffer indices, and allows the functions
utilising them to also be run outside of context generation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:29 +0000 (04:10 +1000)]
drm/gf100-/gr: add support for zero bandwidth clear
Default ZBC table is compatible with binary driver defaults.
Userspace will need to be updated to take full advantage of this
feature, however, some applications will see a performance boost
without updated drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:28 +0000 (04:10 +1000)]
drm/nouveau/ltc: add zbc drivers
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:28 +0000 (04:10 +1000)]
drm/nouveau/ltc: s/ltcg/ltc/ + cleanup
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:28 +0000 (04:10 +1000)]
drm/nouveau: use ram info from nvif_device
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:28 +0000 (04:10 +1000)]
drm/nouveau/disp: implement nvif event sources for vblank/connector notifiers
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:28 +0000 (04:10 +1000)]
drm/nouveau/disp: allow user direct access to channel control registers
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:27 +0000 (04:10 +1000)]
drm/nouveau/disp: audit and version display classes
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:27 +0000 (04:10 +1000)]
drm/nouveau/disp: audit and version SCANOUTPOS method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:27 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version PIOR_PWR method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:27 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version SOR_DP_PWR method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:27 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version LVDS_SCRIPT method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:26 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version SOR_HDMI_PWR method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:26 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version SOR_HDA_ELD method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:26 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version SOR_PWR method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:26 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version DAC_LOAD method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:26 +0000 (04:10 +1000)]
drm/nv50-/disp: audit and version DAC_PWR method
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nv50-/disp: share channel creation between nv50/gf110 impls
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nv50/kms: don't assume same class versions for all channels
One of the next commits will remove some of the class IDs, leaving only
the ones used by NVIDIA which, presumably, mark where functionality
changes actually happened.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nouveau/fifo: implement nvif event source
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nouveau/fifo: allow direct access to channel control registers where possible
The indirect method has been left in-place here as a fallback path, as
it may not be possible to map the non-PAGE_SIZE aligned control areas
across some chipset+interface combinations.
This isn't a problem for the primary use-case where the core and drm
are linked together in kernel-land, but across a VM or (in the case
where it applies now) between the core in the kernel and a userspace
test tool.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nouveau/fifo: audit and version fifo channel classes
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:25 +0000 (04:10 +1000)]
drm/nouveau/device: audit and version NVIF_CONTROL class and methods
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:24 +0000 (04:10 +1000)]
drm/nouveau/pm: audit and version NVIF_PERFMON class and methods
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:24 +0000 (04:10 +1000)]
drm/nouveau/dma: audit and version NV_DMA classes
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:24 +0000 (04:10 +1000)]
drm/nouveau/dmaobj: switch to a slightly saner design
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:24 +0000 (04:10 +1000)]
drm/nouveau/dmaobj: update to an improved style of class definition
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:24 +0000 (04:10 +1000)]
drm/nouveau/device: audit and version NV_DEVICE class
The full object interfaces are about to be exposed to userspace, so we
need to check for any security-related issues and version the structs
to make it easier to handle any changes we may need in the future.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:23 +0000 (04:10 +1000)]
drm/nouveau: use ioctl interface for abi16 gpuobj free
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:23 +0000 (04:10 +1000)]
drm/nouveau: use ioctl interface for abi16 ntfy alloc
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:23 +0000 (04:10 +1000)]
drm/nouveau: use ioctl interface for abi16 grobj alloc
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:23 +0000 (04:10 +1000)]
drm/nouveau: remove as much direct use of core headers as possible
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:23 +0000 (04:10 +1000)]
drm/nouveau: remove (most) hardcoded object handle usage
The PFIFO<->EVO sync buffers will be fixed up later when inter-channel
sync in general is improved.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:22 +0000 (04:10 +1000)]
drm/nouveau: port to nvif client/device/objects
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:22 +0000 (04:10 +1000)]
drm/nouveau: initial pass at moving to struct nvif_device
This is an attempt at isolating some of the changes necessary to port
to NVIF in a separate commit.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:22 +0000 (04:10 +1000)]
drm/nouveau: kill nouveau_dev() + wrap register macros
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:22 +0000 (04:10 +1000)]
drm/nouveau: fix some usages of the wrong print function
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:21 +0000 (04:10 +1000)]
drm/nouveau/nvif: import library functions for the ioctl/event interfaces
This is a wrapper around the interfaces defined in an earlier commit,
and is also used by various userspace (either by a libdrm backend, or
libpciaccess) tools/tests.
In the future this will be extended to handle channels, replacing some
long-unloved code we currently use, and allow fifo/display/mpeg (hi
Ilia ;)) engines to all be exposed in the same way.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:21 +0000 (04:10 +1000)]
drm/nouveau/client: add method to retrieve device list
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:21 +0000 (04:10 +1000)]
drm/nouveau/core: remove NV_D0 family
The one place where it mattered has been replaced with a class check,
which is more appropriate anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:21 +0000 (04:10 +1000)]
drm/nouveau/device: add method to retrieve some basic device info
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau/core: import ioctl/event interfaces
This forms the basis for the new APIs that will be exposed to userspace,
giving it access to:
- Object method calls, the immediately useful of which is performance
counters and the abiity to manipulate the ZBC tables.
- Information on the child classes an object supports, in order to avoid
having to try all supported classes until successful.
- Notifications, which will be used in the future to inform the client
if its channel was killed due to a lockup, etc.
This commit imports the interfaces, but are not currently used. The DRM
portion of the driver will be ported to speak to the core using these
interfaces as much as possible.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau/core: add function to return list of supported children
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau/core: rework event interface
This is a lot of prep-work for being able to send event notifications
back to userspace. Events now contain data, rather than a "something
just happened" signal.
Handler data is now embedded into a containing structure, rather than
being kmalloc()'d, and can optionally have the notify routine handled
in a workqueue.
Various races between suspend/unload with display HPD/DP IRQ handlers
automagically solved as a result.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau/core: move handle-based object apis to handle.c
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau/core: fail creation of zero-argument objects, when arguments are passed
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:20 +0000 (04:10 +1000)]
drm/nouveau: store a pointer to vm in nouveau_cli
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:19 +0000 (04:10 +1000)]
drm/nouveau: store vblank event handler data in nv_crtc
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:19 +0000 (04:10 +1000)]
drm/nv50/kms: create ctxdma objects for framebuffers as required
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Sat, 9 Aug 2014 18:10:19 +0000 (04:10 +1000)]
drm/nv50/kms: move framebuffer wrangling out of common code
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Mario Kleiner [Wed, 6 Aug 2014 04:09:44 +0000 (06:09 +0200)]
drm/nouveau: Bump version from 1.1.1 to 1.1.2
Linux 3.16 fixed multiple bugs in kms pageflip completion events
and timestamping, which were originally introduced in Linux 3.13.
These fixes have been backported to all stable kernels since 3.13.
However, the userspace nouveau-ddx needs to be aware if it is
running on a kernel on which these bugs are fixed, or not.
Bump the patchlevel of the drm driver version to signal this,
so backporting this patch to stable 3.13+ kernels will give the
ddx the required info.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> #v3.13+
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Thu, 7 Aug 2014 21:21:53 +0000 (07:21 +1000)]
drm/nv50-/sw: use nv50_software_context_dtor....
You would not believe the troubles this caused me...
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Ben Skeggs [Tue, 5 Aug 2014 12:03:49 +0000 (22:03 +1000)]
drm/nv50-/fb: use dma_mapping_error() to check dma_map_page() result
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Mario Kleiner [Tue, 29 Jul 2014 00:36:44 +0000 (02:36 +0200)]
drm/nouveau: Dis/Enable vblank irqs during suspend/resume.
Vblank irqs don't get disabled during suspend or driver
unload, which causes irq delivery after "suspend" or
driver unload, at least until the gpu is powered off.
This could race with drm_vblank_cleanup() in the case
of nouveau and cause a use-after-free bug if the driver
is unloaded.
More annoyingly during everyday use, at least on nv50
display engine (likely also others), vblank irqs are
off after a resume from suspend, but the drm doesn't
know this, so all vblank related functionality is dead
after a resume. E.g., all windowed OpenGL clients will
hang at swapbuffers time, as well as many fullscreen
clients in many cases. This makes suspend/resume useless
if one wants to use any OpenGL apps after the resume.
In Linux 3.16, drm_vblank_on() was added, complementing
the older drm_vblank_off() to solve these problems
elegantly, so use those calls in nouveaus suspend/resume
code.
For kernels 3.8 - 3.15, we need to cherry-pick the
drm_vblank_on() patch to support this patch.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> #v3.16
Cc: <stable@vger.kernel.org> #v3.8+: f275228: drm: Add drm_vblank_on()
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Alexandre Courbot [Sat, 26 Jul 2014 09:36:02 +0000 (18:36 +0900)]
drm/nouveau: platform: update moved Tegra header
Header for tegra_powergate functions has moved to soc/tegra/pmc.h.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Alexandre Courbot [Sat, 26 Jul 2014 09:41:41 +0000 (18:41 +0900)]
drm/nouveau/gk20a: reclocking support
Add support for reclocking on GK20A, using a statically-defined pstates
table. The algorithms for calculating the coefficients and setting the
clocks are directly taken from the ChromeOS kernel.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Alexandre Courbot [Sat, 26 Jul 2014 09:41:40 +0000 (18:41 +0900)]
drm/nouveau/clk: support for non-BIOS pstates
Make nouveau_clock_create() take new two optional arguments: an array
of pstates and its size. When these are specified,
nouveau_clock_create() will use the provided pstates instead of
probing them using the BIOS.
This is useful for platforms which do not provide a BIOS, like Tegra.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>