Nicholas Piggin [Sun, 4 Jun 2017 12:06:22 +0000 (22:06 +1000)]
selftests/powerpc: context_switch use private futexes with threads
This reduces overhead of mutex locking and increases context switch
rate significantly (which helps to measure and profile the context
switch path).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Colin Ian King [Mon, 5 Jun 2017 06:49:12 +0000 (16:49 +1000)]
powerpc: Fix some spelling mistakes
Collation of some spelling fixes from Colin.
Attemping -> Attempting
intialized -> initialized
missmanaged -> mismanaged
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Matt Brown [Tue, 23 May 2017 23:45:59 +0000 (09:45 +1000)]
powerpc/lib/xor_vmx: Ensure no altivec code executes before enable_kernel_altivec()
The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled.
The code uses enable_kernel_altivec() around the core of the algorithm, however
the whole file is built with -maltivec, so the compiler is within its rights to
generate altivec code anywhere. This has been seen at least once in the wild:
0:mon> di $xor_altivec_2
c0000000000b97d0 3c4c01d9 addis r2,r12,473
c0000000000b97d4 3842db30 addi r2,r2,-9424
c0000000000b97d8 7c0802a6 mflr r0
c0000000000b97dc f8010010 std r0,16(r1)
c0000000000b97e0 60000000 nop
c0000000000b97e4 7c0802a6 mflr r0
c0000000000b97e8 faa1ffa8 std r21,-88(r1)
...
c0000000000b981c f821ff41 stdu r1,-192(r1)
c0000000000b9820 7f8101ce stvx v28,r1,r0 <-- POP
c0000000000b9824 38000030 li r0,48
c0000000000b9828 7fa101ce stvx v29,r1,r0
...
c0000000000b984c 4bf6a06d bl
c0000000000238b8 # enable_kernel_altivec
This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be executed
outside of the enable/disable block.
Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Rework change log and include disassembly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Hari Bathini [Fri, 2 Jun 2017 07:30:27 +0000 (13:00 +0530)]
powerpc/fadump: Set an upper limit for boot memory size
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.
Memory Reservation during first kernel looks like below:
Low memory Top of memory
0 boot memory size |
| | |<--Reserved dump area -->|
V V | Permanent Reservation V
+-----------+----------/ /----------+---+----+-----------+----+
| | |CPU|HPTE| DUMP |ELF |
+-----------+----------/ /----------+---+----+-----------+----+
| ^
| |
\ /
-------------------------------------------
Boot memory content gets transferred to
reserved area by firmware at the time of
crash
This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Hari Bathini [Mon, 22 May 2017 09:34:47 +0000 (15:04 +0530)]
powerpc/fadump: Update comment about offset where fadump is reserved
With commit
f6e6bedb7731 ("powerpc/fadump: Reserve memory at an offset
closer to bottom of RAM"), memory for fadump is no longer reserved at
the top of RAM. But there are still a few places which say so. Change
them appropriately.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Hari Bathini [Mon, 22 May 2017 09:34:23 +0000 (15:04 +0530)]
powerpc/fadump: Add a warning when 'fadump_reserve_mem=' is used
With commit
11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter
for fadump memory reservation"), 'fadump_reserve_mem=' parameter is
deprecated in favor of 'crashkernel=' parameter. Add a warning if
'fadump_reserve_mem=' is still used.
Fixes: 11550dc0a00b ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Suggested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Unsplit long printk strings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michal Suchanek [Sat, 27 May 2017 15:46:15 +0000 (17:46 +0200)]
powerpc/fadump: Return error when fadump registration fails
- log an error message when registration fails and no error code listed
in the switch is returned
- translate the hv error code to posix error code and return it from
fw_register
- return the posix error code from fw_register to the process writing
to sysfs
- return EEXIST on re-registration
- return success on deregistration when fadump is not registered
- return ENODEV when no memory is reserved for fadump
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Tested-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
[mpe: Use pr_err() to shrink the error print]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 21 Apr 2017 11:18:52 +0000 (13:18 +0200)]
powerpc: Remove __ilog2()s and use generic ones
With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.
The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.
This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()
For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
return __ilog2(x);
}
int test__ilog2_u32(u32 n)
{
return __ilog2_u32(n);
}
int test__ilog2_u64(u64 n)
{
return __ilog2_u64(n);
}
On PPC32 before the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr
0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr
On PPC32 after the patch:
00000000 <test__ilog2>:
0: 7c 63 00 34 cntlzw r3,r3
4: 20 63 00 1f subfic r3,r3,31
8: 4e 80 00 20 blr
0000000c <test__ilog2_u32>:
c: 7c 63 00 34 cntlzw r3,r3
10: 20 63 00 1f subfic r3,r3,31
14: 4e 80 00 20 blr
On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr
0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr
0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr
On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
0: 7c 63 00 74 cntlzd r3,r3
4: 20 63 00 3f subfic r3,r3,63
8: 7c 63 07 b4 extsw r3,r3
c: 4e 80 00 20 blr
0000000000000010 <.test__ilog2_u32>:
10: 7c 63 00 34 cntlzw r3,r3
14: 20 63 00 1f subfic r3,r3,31
18: 7c 63 07 b4 extsw r3,r3
1c: 4e 80 00 20 blr
0000000000000020 <.test__ilog2_u64>:
20: 7c 63 00 74 cntlzd r3,r3
24: 20 63 00 3f subfic r3,r3,63
28: 7c 63 07 b4 extsw r3,r3
2c: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 21 Apr 2017 11:18:50 +0000 (13:18 +0200)]
powerpc: Replace ffz() by equivalent generic function
With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.
This patch replaces ffz() by the generic function.
The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff
For non constant calls, the generated code is doing the same:
unsigned long testffz(unsigned long x)
{
return ffz(x);
}
On PPC32, before the patch:
00000018 <testffz>:
18: 7c 63 18 f9 not. r3,r3
1c: 40 82 00 0c bne 28 <testffz+0x10>
20: 38 60 00 20 li r3,32
24: 4e 80 00 20 blr
28: 7d 23 00 d0 neg r9,r3
2c: 7d 23 18 38 and r3,r9,r3
30: 7c 63 00 34 cntlzw r3,r3
34: 20 63 00 1f subfic r3,r3,31
38: 4e 80 00 20 blr
On PPC32, after the patch:
00000018 <testffz>:
18: 39 23 00 01 addi r9,r3,1
1c: 7d 23 18 78 andc r3,r9,r3
20: 7c 63 00 34 cntlzw r3,r3
24: 20 63 00 1f subfic r3,r3,31
28: 4e 80 00 20 blr
On PPC64, before the patch:
0000000000000030 <.testffz>:
30: 7c 60 18 f9 not. r0,r3
34: 38 60 00 40 li r3,64
38: 4d 82 00 20 beqlr
3c: 7c 60 00 d0 neg r3,r0
40: 7c 63 00 38 and r3,r3,r0
44: 7c 63 00 74 cntlzd r3,r3
48: 20 63 00 3f subfic r3,r3,63
4c: 7c 63 07 b4 extsw r3,r3
50: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000030 <.testffz>:
30: 38 03 00 01 addi r0,r3,1
34: 7c 03 18 78 andc r3,r0,r3
38: 7c 63 00 74 cntlzd r3,r3
3c: 20 63 00 3f subfic r3,r3,63
40: 4e 80 00 20 blr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 21 Apr 2017 11:18:48 +0000 (13:18 +0200)]
powerpc: Use builtin functions for fls()/__fls()/fls64()
With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.
This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly
For non constant calls, the generated code is doing the same:
int testfls(unsigned int x)
{
return fls(x);
}
unsigned long test__fls(unsigned long x)
{
return __fls(x);
}
int testfls64(__u64 x)
{
return fls64(x);
}
On PPC32, before the patch:
00000064 <testfls>:
64: 7c 63 00 34 cntlzw r3,r3
68: 20 63 00 20 subfic r3,r3,32
6c: 4e 80 00 20 blr
00000070 <test__fls>:
70: 7c 63 00 34 cntlzw r3,r3
74: 20 63 00 1f subfic r3,r3,31
78: 4e 80 00 20 blr
0000007c <testfls64>:
7c: 2c 03 00 00 cmpwi r3,0
80: 40 82 00 10 bne 90 <testfls64+0x14>
84: 7c 83 00 34 cntlzw r3,r4
88: 20 63 00 20 subfic r3,r3,32
8c: 4e 80 00 20 blr
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 40 subfic r3,r3,64
98: 4e 80 00 20 blr
On PPC32, after the patch:
00000054 <testfls>:
54: 7c 63 00 34 cntlzw r3,r3
58: 20 63 00 20 subfic r3,r3,32
5c: 4e 80 00 20 blr
00000060 <test__fls>:
60: 7c 63 00 34 cntlzw r3,r3
64: 20 63 00 1f subfic r3,r3,31
68: 4e 80 00 20 blr
0000006c <testfls64>:
6c: 2c 03 00 00 cmpwi r3,0
70: 41 82 00 10 beq 80 <testfls64+0x14>
74: 7c 63 00 34 cntlzw r3,r3
78: 20 63 00 40 subfic r3,r3,64
7c: 4e 80 00 20 blr
80: 7c 83 00 34 cntlzw r3,r4
84: 20 63 00 40 subfic r3,r3,32
88: 4e 80 00 20 blr
On PPC64, before the patch:
00000000000000a0 <.testfls>:
a0: 7c 63 00 34 cntlzw r3,r3
a4: 20 63 00 20 subfic r3,r3,32
a8: 7c 63 07 b4 extsw r3,r3
ac: 4e 80 00 20 blr
00000000000000b0 <.test__fls>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 3f subfic r3,r3,63
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr
00000000000000c0 <.testfls64>:
c0: 7c 63 00 74 cntlzd r3,r3
c4: 20 63 00 40 subfic r3,r3,64
c8: 7c 63 07 b4 extsw r3,r3
cc: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000090 <.testfls>:
90: 7c 63 00 34 cntlzw r3,r3
94: 20 63 00 20 subfic r3,r3,32
98: 7c 63 07 b4 extsw r3,r3
9c: 4e 80 00 20 blr
00000000000000a0 <.test__fls>:
a0: 7c 63 00 74 cntlzd r3,r3
a4: 20 63 00 3f subfic r3,r3,63
a8: 4e 80 00 20 blr
ac: 60 00 00 00 nop
00000000000000b0 <.testfls64>:
b0: 7c 63 00 74 cntlzd r3,r3
b4: 20 63 00 40 subfic r3,r3,64
b8: 7c 63 07 b4 extsw r3,r3
bc: 4e 80 00 20 blr
Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 21 Apr 2017 11:18:46 +0000 (13:18 +0200)]
powerpc: Discard ffs()/__ffs() function and use builtin functions instead
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.
int ffs_test(void)
{
return 4 << ffs(31);
}
c0012334 <ffs_test>:
c0012334: 39 20 00 01 li r9,1
c0012338: 38 60 00 04 li r3,4
c001233c: 7d 29 00 34 cntlzw r9,r9
c0012340: 21 29 00 20 subfic r9,r9,32
c0012344: 7c 63 48 30 slw r3,r3,r9
c0012348: 4e 80 00 20 blr
With this patch, the same function will compile as follows:
c0012334 <ffs_test>:
c0012334: 38 60 00 08 li r3,8
c0012338: 4e 80 00 20 blr
The same happens with __ffs()
For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():
unsigned long test__ffs(unsigned long x)
{
return __ffs(x);
}
int testffs(int x)
{
return ffs(x);
}
On PPC32, before the patch:
0000003c <test__ffs>:
3c: 7d 23 00 d0 neg r9,r3
40: 7d 23 18 38 and r3,r9,r3
44: 7c 63 00 34 cntlzw r3,r3
48: 20 63 00 1f subfic r3,r3,31
4c: 4e 80 00 20 blr
00000050 <testffs>:
50: 7d 23 00 d0 neg r9,r3
54: 7d 23 18 38 and r3,r9,r3
58: 7c 63 00 34 cntlzw r3,r3
5c: 20 63 00 20 subfic r3,r3,32
60: 4e 80 00 20 blr
On PPC32, after the patch:
0000002c <test__ffs>:
2c: 7d 23 00 d0 neg r9,r3
30: 7d 23 18 38 and r3,r9,r3
34: 7c 63 00 34 cntlzw r3,r3
38: 20 63 00 1f subfic r3,r3,31
3c: 4e 80 00 20 blr
00000040 <testffs>:
40: 7d 23 00 d0 neg r9,r3
44: 7d 23 18 38 and r3,r9,r3
48: 7c 63 00 34 cntlzw r3,r3
4c: 20 63 00 20 subfic r3,r3,32
50: 4e 80 00 20 blr
On PPC64, before the patch:
0000000000000060 <.test__ffs>:
60: 7c 03 00 d0 neg r0,r3
64: 7c 03 18 38 and r3,r0,r3
68: 7c 63 00 74 cntlzd r3,r3
6c: 20 63 00 3f subfic r3,r3,63
70: 7c 63 07 b4 extsw r3,r3
74: 4e 80 00 20 blr
0000000000000080 <.testffs>:
80: 7c 03 00 d0 neg r0,r3
84: 7c 03 18 38 and r3,r0,r3
88: 7c 63 00 74 cntlzd r3,r3
8c: 20 63 00 40 subfic r3,r3,64
90: 7c 63 07 b4 extsw r3,r3
94: 4e 80 00 20 blr
On PPC64, after the patch:
0000000000000050 <.test__ffs>:
50: 7c 03 00 d0 neg r0,r3
54: 7c 03 18 38 and r3,r0,r3
58: 7c 63 00 74 cntlzd r3,r3
5c: 20 63 00 3f subfic r3,r3,63
60: 4e 80 00 20 blr
0000000000000070 <.testffs>:
70: 7c 03 00 d0 neg r0,r3
74: 7c 03 18 38 and r3,r0,r3
78: 7c 63 00 34 cntlzw r3,r3
7c: 20 63 00 20 subfic r3,r3,32
80: 7c 63 07 b4 extsw r3,r3
84: 4e 80 00 20 blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)
In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.
__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h
Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 16 Mar 2017 08:55:45 +0000 (09:55 +0100)]
powerpc: Handle simultaneous interrupts at once
It often happens to have simultaneous interrupts, for instance
when having double Ethernet attachment. With the current
implementation, we suffer the cost of kernel entry/exit for each
interrupt.
This patch introduces a loop in __do_irq() to handle all interrupts
at once before returning.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 10 Mar 2017 10:37:01 +0000 (11:37 +0100)]
powerpc/8xx: fix mpc8xx_get_irq() return on no irq
IRQ 0 is a valid HW interrupt. So get_irq() shall return 0 when
there is no irq, instead of returning irq_linear_revmap(... ,0)
Fixes: f2a0bd3753dad ("[POWERPC] 8xx: powerpc port of core CPM PIC")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Fri, 5 Aug 2016 11:28:05 +0000 (13:28 +0200)]
powerpc/40x: Clear MSR_DR in one insn instead of two
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 19 Apr 2017 12:56:32 +0000 (14:56 +0200)]
powerpc/mm: The 8xx doesn't call do_page_fault() for breakpoints
The 8xx has a dedicated exception for breakpoints, that directly
calls do_break()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 19 Apr 2017 12:56:30 +0000 (14:56 +0200)]
powerpc/mm: Evaluate user_mode(regs) only once in do_page_fault()
Analysis of the assembly code shows that when using user_mode(regs),
at least the 'andi.' is redone all the time, and also
the 'lwz ,132(r31)' most of the time. With the new form, the 'is_user'
is mapped to cr4, then all further use of is_user results in just
things like 'beq cr4,218 <do_page_fault+0x218>'
Without the patch:
50: 81 1e 00 84 lwz r8,132(r30)
54: 71 09 40 00 andi. r9,r8,16384
58: 40 82 00 0c bne 64 <do_page_fault+0x64>
84: 81 3e 00 84 lwz r9,132(r30)
8c: 71 2a 40 00 andi. r10,r9,16384
90: 41 a2 01 64 beq 1f4 <do_page_fault+0x1f4>
d4: 81 3e 00 84 lwz r9,132(r30)
dc: 71 28 40 00 andi. r8,r9,16384
e0: 41 82 02 08 beq 2e8 <do_page_fault+0x2e8>
108: 81 3e 00 84 lwz r9,132(r30)
110: 71 28 40 00 andi. r8,r9,16384
118: 41 82 02 28 beq 340 <do_page_fault+0x340>
1e4: 81 3e 00 84 lwz r9,132(r30)
1e8: 71 2a 40 00 andi. r10,r9,16384
1ec: 40 82 01 68 bne 354 <do_page_fault+0x354>
228: 81 3e 00 84 lwz r9,132(r30)
22c: 71 28 40 00 andi. r8,r9,16384
230: 41 82 ff c4 beq 1f4 <do_page_fault+0x1f4>
288: 71 2a 40 00 andi. r10,r9,16384
294: 41 a2 fe 60 beq f4 <do_page_fault+0xf4>
50c: 81 3e 00 84 lwz r9,132(r30)
514: 71 2a 40 00 andi. r10,r9,16384
518: 40 a2 fc e0 bne 1f8 <do_page_fault+0x1f8>
534: 81 3e 00 84 lwz r9,132(r30)
53c: 71 2a 40 00 andi. r10,r9,16384
540: 41 82 fc b8 beq 1f8 <do_page_fault+0x1f8>
This patch creates a local var called 'is_user' which contains the
result of user_mode(regs)
With the patch:
20: 81 03 00 84 lwz r8,132(r3)
48: 55 09 97 fe rlwinm r9,r8,18,31,31
58: 2e 09 00 00 cmpwi cr4,r9,0
5c: 40 92 00 0c bne cr4,68 <do_page_fault+0x68>
88: 41 b2 01 90 beq cr4,218 <do_page_fault+0x218>
d4: 40 92 01 d0 bne cr4,2a4 <do_page_fault+0x2a4>
120: 41 b2 00 f8 beq cr4,218 <do_page_fault+0x218>
138: 41 b2 ff a0 beq cr4,d8 <do_page_fault+0xd8>
1d4: 40 92 00 e0 bne cr4,2b4 <do_page_fault+0x2b4>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 19 Apr 2017 12:56:28 +0000 (14:56 +0200)]
powerpc/mm: Remove a redundant test in do_page_fault()
The result of (trap == 0x400) is already in is_exec.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 19 Apr 2017 12:56:24 +0000 (14:56 +0200)]
powerpc/mm: Only call store_updates_sp() on stores in do_page_fault()
Function store_updates_sp() checks whether the faulting
instruction is a store updating r1. Therefore we can limit its calls
to store exceptions.
This patch is an improvement of commit
a7a9dcd882a67 ("powerpc: Avoid
taking a data miss on every userspace instruction miss")
With the same microbenchmark app, run with 500 as argument, on an
MPC885 we get:
Before this patch: 152000 DTLB misses
After this patch: 147000 DTLB misses
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Mon, 29 May 2017 15:32:06 +0000 (17:32 +0200)]
powerpc/mm: Remove __this_fixmap_does_not_exist()
This function has not been used since commit
9494a1e8428ea
("powerpc: use generic fixmap.h)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Balbir Singh [Thu, 25 May 2017 03:36:50 +0000 (13:36 +1000)]
powerpc/mm/ptdump: Dump the first entry of the linear mapping as well
The check in hpte_find() should be < and not <= for PAGE_OFFSET
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 17:40:40 +0000 (03:40 +1000)]
powerpc: Link warning for orphan sections
Add --orphan-handling=warn to final link flags. This ensures we can
handle all sections explicitly. This would have caught subtle breakage
such as
7de3b27bac47da9de08409df1d69664acbb72197 at build-time.
Also bring existing orphan sections into the fold:
- .text.hot and .text.unlikely are compiler generated sections.
- .sdata2, .dynsbss, .plt are used by PPC32
- We previously did not specify DWARF_DEBUG or STABS_DEBUG
- DWARF_DEBUG did not include all DWARF sections that can be emitted
- A number of sections are unused and can be discarded.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 17:40:39 +0000 (03:40 +1000)]
powerpc/64: Tool to check head sections location sanity
Use a tool to check that the location of "fixed sections" are where
we expected them to be, which catches cases the linker script can't
(stubs being added to start of .text section), and which ends up
being neater.
Sample output:
ERROR: start_text address is
c000000000008100, should be
c000000000008000
ERROR: see comments in arch/powerpc/tools/head_check.sh
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fix from Nick for 4.6 era toolchains]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Mon, 29 May 2017 07:39:40 +0000 (17:39 +1000)]
powerpc/64: Handle linker stubs in low .text code
Very large kernels may require linker stubs for branches from HEAD
text code. The linker may place these stubs before the HEAD text
sections, which breaks the assumption that HEAD text is located at 0
(or the .text section being located at 0x7000/0x8000 on Book3S
kernels).
Provide an option to create a small section just before the .text
section with an empty 256 - 4 bytes, and adjust the start of the .text
section to match. The linker will tend to put stubs in that section
and not break our relative-to-absolute offset assumptions.
This causes a small waste of space on common kernels, but allows large
kernels to build and boot. For now, it is an EXPERT config option,
defaulting to =n, but a reference is provided for it in the build-time
check for such breakage. This is good enough for allyesconfig and
custom users / hackers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 17:40:38 +0000 (03:40 +1000)]
powerpc/64s: Tool to flag direct branches from unrelocated interrupt vectors
Direct banches from code below __end_interrupts to code above
__end_interrupts when built with CONFIG_RELOCATABLE are disallowed
because they will break when the kernel is not located at 0.
Sample output:
WARNING: Unrelocated relative branches
c000000000000118 bl-> 0xc000000000038fb8 <pnv_restore_hyp_resource>
c00000000000013c b-> 0xc0000000001068a4 <kvm_start_guest>
c000000000000148 b-> 0xc00000000003919c <pnv_wakeup_loss>
c00000000000014c b-> 0xc00000000003923c <pnv_wakeup_noloss>
c0000000000005a4 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
c000000000001af0 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
c000000000001b24 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
c000000000001b58 b-> 0xc000000000106ffc <kvmppc_interrupt_hv>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:56:52 +0000 (01:56 +1000)]
powerpc/64: Linker on-demand sfpr functions for modules
For final link, the powerpc64 linker generates fpr save/restore
functions on-demand, placing them in the .sfpr section. Starting with
binutils 2.25, these can be provided for non-final links with
--save-restore-funcs. Use that where possible for module links.
This saves about 200 bytes per module (~60kB) on powernv defconfig
build.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:56:51 +0000 (01:56 +1000)]
powerpc/64: Do not create new section for save/restore functions
There is no need to create a new section for these. Consolidate with
32-bit and just use .text.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:56:50 +0000 (01:56 +1000)]
powerpc/64: Do not link crtsaveres.o in boot
crtsaveres.S is empty with 64-bit builds already, so just don't
build and link it to match the vmlinux build.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use CONFIG_PPC64_BOOT_WRAPPER not CONFIG_PPC32 to fix BE build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:56:49 +0000 (01:56 +1000)]
powerpc/64: Do not link crtsavres.o in vmlinux
The 64-bit linker creates save/restore functions on demand with final
links, so vmlinux does not require crtsavres.o.
Make crtsavres.o extra-y on 64-bit (it is still required by modules).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:56:48 +0000 (01:56 +1000)]
powerpc/64: Place sfpr section explicitly with the linker script
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Stephen Rothwell [Fri, 26 May 2017 08:32:29 +0000 (18:32 +1000)]
powerpc: Use uapi/asm-generic/sockios.h
The arch version is identical except for comments and white space.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Stephen Rothwell [Fri, 26 May 2017 06:19:46 +0000 (16:19 +1000)]
powerpc: Use the asm-generic versions of some uapi includes
These are completely obvious as all they do is include the asm-generic
versions.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ivan Mikhaylov [Fri, 19 May 2017 15:47:05 +0000 (18:47 +0300)]
powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE]
Prevent a kernel panic caused by unintentionally clearing TCR watchdog
bits. At this point in the kernel boot, the watchdog may have already
been enabled by u-boot. The original code's attempt to write to the TCR
register results in an inadvertent clearing of the watchdog
configuration bits, causing the 476 to reset.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ivan Mikhaylov [Mon, 15 May 2017 13:07:53 +0000 (16:07 +0300)]
powerpc/44x/fsp2: Add defconfig for FSP2 board
This patch adds default FSP2 config for main usage.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ivan Mikhaylov [Mon, 15 May 2017 13:07:52 +0000 (16:07 +0300)]
powerpc/44x/fsp2: Add device tree for FSP2 board
Add a device tree for FSP2 board (476 based).
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Ivan Mikhaylov [Mon, 15 May 2017 13:07:51 +0000 (16:07 +0300)]
powerpc/44x/fsp2: Platform support for FSP2 (476fpe) board
Add platform code support for FSP2 (476fpe) board.
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 16 May 2017 08:49:48 +0000 (14:19 +0530)]
cpuidle-powernv: Allow Deep stop states that don't stop time
The current code in the cpuidle-powernv intialization only allows deep
stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
(indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
POWER8 time where deep states used to lose the timebase. However, on
POWER9, we do have stop states that are deep (they lose hypervisor
state) but retain the timebase.
Fix the initialization code in the cpuidle-powernv driver to allow
such deep states.
Further, there is a bug in cpuidle-powernv driver with
CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
even if a platform idle state which loses time base was not added to
the cpuidle table.
Fix this by ensuring that the nr_idle_states variable gets incremented
only when the platform idle state was added to the cpuidle table.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 16 May 2017 08:49:47 +0000 (14:19 +0530)]
powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:
1) All the threads of the core are in deep stop.
2) One of the threads is woken up. The PLS for this thread will
correctly reflect that it is waking up from deep stop.
3) The thread that has woken up now executes a shallow stop.
4) When some other thread in the core is woken, its PLS will reflect
the shallow stop state.
Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.
Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Akshay Adiga [Tue, 16 May 2017 08:49:46 +0000 (14:19 +0530)]
powerpc/powernv/idle: Restore SPRs for deep idle states via stop API.
Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
time of a booted kernel, once they have been initialized.
The contents of these SPRs are lost when the CPUs enter deep stop
states. So instead saving and restoring SPRs from the kernel, use the
stop-api provided by the firmware by which the firmware can restore
the contents of these SPRs to their initialized values after wakeup
from a deep stop state.
Apart from these, program the PSSCR value to that of the deepest stop
state via the stop-api. This will be used to indicate to the
underlying firmware as to what stop state to put the threads that have
been woken up by a special-wakeup.
And while we are at programming SPRs via stop-api, ensure that HID1,
HID4 and HID5 registers which are only available on POWER8 are not
requested to be restored by the firware on POWER9.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 16 May 2017 08:49:45 +0000 (14:19 +0530)]
powerpc/powernv/idle: Restore LPCR on wakeup from deep-stop
On wakeup from a deep stop state which is supposed to lose the
hypervisor state, we don't restore the LPCR to the old value but set
it to a "sane" value via cur_cpu_spec->cpu_restore().
The problem is that the "sane" value doesn't include UPRT and the HR
bits which are required to run correctly in Radix mode.
Fix this on POWER9 onwards by restoring the LPCR value whatever it was
before executing the stop instruction.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 16 May 2017 08:49:44 +0000 (14:19 +0530)]
powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restore
On POWER8, in case of
- nap: both timebase and hypervisor state is retained.
- fast-sleep: timebase is lost. But the hypervisor state is retained.
- winkle: timebase and hypervisor state is lost.
Hence, the current code for handling exit from a idle state assumes
that if the timebase value is retained, then so is the hypervisor
state. Thus, the current code doesn't restore per-core hypervisor
state in such cases.
But that is no longer the case on POWER9 where we do have stop states
in which timebase value is retained, but the hypervisor state is
lost. So we have to ensure that the per-core hypervisor state gets
restored in such cases.
Fix this by ensuring that even in the case when timebase is retained,
we explicitly check if we are waking up from a deep stop that loses
per-core hypervisor state (indicated by cr4 being eq or gt), and if
this is the case, we restore the per-core hypervisor state.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Gautham R. Shenoy [Tue, 16 May 2017 08:49:43 +0000 (14:19 +0530)]
powerpc/powernv/idle: Correctly initialize core_idle_state_ptr
The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
threads in the core. This is supposed to be initialized to bit-map
corresponding to the threads_per_core. However, currently it is
initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
POWER8 which has 8 threads per core, but not for POWER9 which has 4
threads per core.
As a result, on POWER9, core_idle_state_ptr gets initialized to
0xFF. In case when all the threads of the core are idle, the bits
corresponding tracking the idle-threads are non-zero. As a result, the
idle entry/exit code fails to save/restore per-core hypervisor state
since it assumes that there are threads in the cores which are still
active.
Fix this by correctly initializing the lower bits of the
core_idle_state_ptr on the basis of threads_per_core.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anton Blanchard [Fri, 19 May 2017 05:46:41 +0000 (15:46 +1000)]
powerpc: Add HAVE_IRQ_TIME_ACCOUNTING
Allow us to enable IRQ_TIME_ACCOUNTING. Even though we currently
use VIRT_CPU_ACCOUNTING_NATIVE, that option is quite heavy
weight and IRQ_TIME_ACCOUNTING might be better in some cases.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pavel Machek [Sun, 2 Apr 2017 10:05:36 +0000 (12:05 +0200)]
powerpc/sequoia: Fix NAND partitions not to overlap
Currently the DTS defines two partitions at the same addresses, if you
use one, you will corrupt information on the other one. Fix it by
shifting the second partition.
Signed-off-by: Pavel Machek <pavel@denx.de>
[mpe: Reconstruct change log from email thread]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Andrew Jeffery [Fri, 12 May 2017 03:58:10 +0000 (13:28 +0930)]
powerpc: Tweak copy selection parameter in __copy_tofrom_user_power7()
Experiments with the netperf benchmark indicated that the size selecting
VMX-based copies in __copy_tofrom_user_power7() was suboptimal on POWER8.
Measurements showed that parity was in the neighbourhood of 3328 bytes,
rather than greater than 4096. The change gives a 1.5-2.0% improvement in
performance for 4096-byte buffers, reducing the relative time spent in
__copy_tofrom_user_power7() from approximately 7% to approximately 5% in
the TCP_RR benchmark.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 12 May 2017 00:47:07 +0000 (10:47 +1000)]
powerpc/xmon: Fix compile error with PPC_8xx=y
Rearrange the code so that mode and badaddr are only defined when
they're used.
Also unsplit the string for easier grepping, and switch from CONFIG_8xx
which is deprecated to CONFIG_PPC_8xx.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 11 May 2017 15:15:20 +0000 (01:15 +1000)]
powerpc/powernv: Fix CPU_HOTPLUG=n idle.c compile error
Fixes: a7cd88da97 ("powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c")
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Mon, 1 May 2017 12:01:10 +0000 (22:01 +1000)]
powerpc/64s: Fix OPAL_CALL non-maskable interrupt reentrancy
OPAL_CALL uses SRR[01] with MSR_RI=1, which gets corrupted if there
is an interleaving system reset or machine check interrupt.
Use HSRR[01] instead, which does not require MSR_RI=0.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Mon, 1 May 2017 12:01:09 +0000 (22:01 +1000)]
powerpc/64s: Fix FIXUP_ENDIAN non-maskable interrupt reentrancy
FIXUP_ENDIAN uses SRR[01] with MSR_RI=1, which gets corrupted if there
is an interleaving system reset or machine check interrupt.
Set MSR_RI=0 before setting SRRs. The rfid will restore MSR.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Mon, 29 May 2017 00:20:53 +0000 (17:20 -0700)]
Linux 4.12-rc3
Linus Torvalds [Sun, 28 May 2017 23:18:27 +0000 (16:18 -0700)]
Merge branch 'fixes' of git://git./linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC management fixes from Eduardo Valentin:
- fixes to TI SoC driver, Broadcom, qoriq
- small sparse warning fix on thermal core
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: broadcom: ns-thermal: default on iProc SoCs
ti-soc-thermal: Fix a typo in a comment line
ti-soc-thermal: Delete error messages for failed memory allocations in ti_bandgap_build()
ti-soc-thermal: Use devm_kcalloc() in ti_bandgap_build()
thermal: core: make thermal_emergency_poweroff static
thermal: qoriq: remove useless call for of_thermal_get_trip_points()
Linus Torvalds [Sat, 27 May 2017 16:39:09 +0000 (09:39 -0700)]
Merge tag 'tty-4.12-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some serial and tty fixes for 4.12-rc3. They are a bit bigger
than normal, which is why I had them bake in linux-next for a few
weeks and didn't send them to you for -rc2.
They revert a few of the serdev patches from 4.12-rc1, and bring
things back to how they were in 4.11, to try to make things a bit more
stable there. Rob and Johan both agree that this is the way forward,
so this isn't people squabbling over semantics. Other than that, just
a few minor serial driver fixes that people have had problems with.
All of these have been in linux-next for a few weeks with no reported
issues"
* tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: altera_uart: call iounmap() at driver remove
serial: imx: ensure UCR3 and UFCR are setup correctly
MAINTAINERS/serial: Change maintainer of jsm driver
serial: enable serdev support
tty/serdev: add serdev registration interface
serdev: Restore serdev_device_write_buf for atomic context
serial: core: fix crash in uart_suspend_port
tty: fix port buffer locking
tty: ehv_bytechan: clean up init error handling
serial: ifx6x60: fix use-after-free on module unload
serial: altera_jtaguart: adding iounmap()
serial: exar: Fix stuck MSIs
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
serdev: fix tty-port client deregistration
Revert "tty_port: register tty ports with serdev bus"
drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
Linus Torvalds [Sat, 27 May 2017 16:28:34 +0000 (09:28 -0700)]
Merge tag 'powerpc-4.12-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix running SPU programs on Cell, and a few other minor fixes.
Thanks to Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas
Piggin"
* tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
powerpc/spufs: Fix hash faults for kernel regions
powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
powerpc/powernv/npu-dma.c: Fix opal_npu_destroy_context() call
selftests/powerpc: Fix TM resched DSCR test with some compilers
Linus Torvalds [Sat, 27 May 2017 16:17:58 +0000 (09:17 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A series of fixes for X86:
- The final fix for the end-of-stack issue in the unwinder
- Handle non PAT systems gracefully
- Prevent access to uninitiliazed memory
- Move early delay calaibration after basic init
- Fix Kconfig help text
- Fix a cross compile issue
- Unbreak older make versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timers: Move simple_udelay_calibration past init_hypervisor_platform
x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives()
x86/PAT: Fix Xorg regression on CPUs that don't support PAT
x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation
x86/build: Permit building with old make versions
x86/unwind: Add end-of-stack check for ftrace handlers
Revert "x86/entry: Fix the end of the stack for newly forked tasks"
x86/boot: Use CROSS_COMPILE prefix for readelf
Linus Torvalds [Sat, 27 May 2017 16:14:24 +0000 (09:14 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixlet from Thomas Gleixner:
"Silence dmesg spam by making the posix cpu timer printks depend on
print_fatal_signals"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-timers: Make signal printks conditional
Linus Torvalds [Sat, 27 May 2017 16:06:43 +0000 (09:06 -0700)]
Merge branch 'ras-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two fixlets for RAS:
- Export memory_error() so the NFIT module can utilize it
- Handle memory errors in NFIT correctly"
* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
acpi, nfit: Fix the memory error check in nfit_handle_mce()
x86/MCE: Export memory_error()
Linus Torvalds [Sat, 27 May 2017 16:02:41 +0000 (09:02 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
- Synchronization of tools and kernel headers
- A series of fixes for perf report addressing various failures:
* Handle invalid maps proper
* Plug a memory leak
* Handle frames and callchain order correctly
- Fixes for handling inlines and children mode
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/include: Sync kernel ABI headers with tooling headers
perf tools: Put caller above callee in --children mode
perf report: Do not drop last inlined frame
perf report: Always honor callchain order for inlined nodes
perf script: Add --inline option for debugging
perf report: Fix off-by-one for non-activation frames
perf report: Fix memory leak in addr2line when called by addr2inlines
perf report: Don't crash on invalid maps in `-g srcline` mode
Linus Torvalds [Sat, 27 May 2017 15:59:37 +0000 (08:59 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A fix for a state leak which was introduced in the recent rework of
futex/rtmutex interaction"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex,rt_mutex: Fix rt_mutex_cleanup_proxy_lock()
Linus Torvalds [Sat, 27 May 2017 15:52:27 +0000 (08:52 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull kthread fix from Thomas Gleixner:
"A single fix which prevents a use after free when kthread fork fails"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kthread: Fix use-after-free if kthread fork fails
Linus Torvalds [Sat, 27 May 2017 15:30:30 +0000 (08:30 -0700)]
Merge tag 'trace-v4.12-rc2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull ftrace fixes from Steven Rostedt:
"There's been a few memory issues found with ftrace.
One was simply a memory leak where not all was being freed that should
have been in releasing a file pointer on set_graph_function.
Then Thomas found that the ftrace trampolines were marked for
read/write as well as execute. To shrink the possible attack surface,
he added calls to set them to ro. Which also uncovered some other
issues with freeing module allocated memory that had its permissions
changed.
Kprobes had a similar issue which is fixed and a selftest was added to
trigger that issue again"
* tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/ftrace: Make sure that ftrace trampolines are not RWX
x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
selftests/ftrace: Add a testcase for many kprobe events
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
ftrace: Fix memory leak in ftrace_graph_release()
Thomas Gleixner [Thu, 25 May 2017 08:57:51 +0000 (10:57 +0200)]
x86/ftrace: Make sure that ftrace trampolines are not RWX
ftrace use module_alloc() to allocate trampoline pages. The mapping of
module_alloc() is RWX, which makes sense as the memory is written to right
after allocation. But nothing makes these pages RO after writing to them.
Add proper set_memory_rw/ro() calls to protect the trampolines after
modification.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705251056410.1862@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Steven Rostedt (VMware) [Fri, 26 May 2017 14:14:11 +0000 (10:14 -0400)]
x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
With function tracing starting in early bootup and having its trampoline
pages being read only, a bug triggered with the following:
kernel BUG at arch/x86/mm/pageattr.c:189!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc2-test+ #3
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
task:
ffffffffb4222500 task.stack:
ffffffffb4200000
RIP: 0010:change_page_attr_set_clr+0x269/0x302
RSP: 0000:
ffffffffb4203c88 EFLAGS:
00010046
RAX:
0000000000000046 RBX:
0000000000000000 RCX:
00000001b6000000
RDX:
ffffffffb4203d40 RSI:
0000000000000000 RDI:
ffffffffb4240d60
RBP:
ffffffffb4203d18 R08:
00000001b6000000 R09:
0000000000000001
R10:
ffffffffb4203aa8 R11:
0000000000000003 R12:
ffffffffc029b000
R13:
ffffffffb4203d40 R14:
0000000000000001 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff9a639ea00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff9a636b384000 CR3:
00000001ea21d000 CR4:
00000000000406b0
Call Trace:
change_page_attr_clear+0x1f/0x21
set_memory_ro+0x1e/0x20
arch_ftrace_update_trampoline+0x207/0x21c
? ftrace_caller+0x64/0x64
? 0xffffffffc029b000
ftrace_startup+0xf4/0x198
register_ftrace_function+0x26/0x3c
function_trace_init+0x5e/0x73
tracer_init+0x1e/0x23
tracing_set_tracer+0x127/0x15a
register_tracer+0x19b/0x1bc
init_function_trace+0x90/0x92
early_trace_init+0x236/0x2b3
start_kernel+0x200/0x3f5
x86_64_start_reservations+0x29/0x2b
x86_64_start_kernel+0x17c/0x18f
secondary_startup_64+0x9f/0x9f
? secondary_startup_64+0x9f/0x9f
Interrupts should not be enabled at this early in the boot process. It is
also fine to leave interrupts enabled during this time as there's only one
CPU running, and on_each_cpu() means to only run on the current CPU.
If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with
interrupts disabled. Don't trigger a BUG_ON() in that case.
Link: http://lkml.kernel.org/r/20170526093717.0be3b849@gandalf.local.home
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Fri, 26 May 2017 04:44:54 +0000 (13:44 +0900)]
selftests/ftrace: Add a testcase for many kprobe events
Add a testcase to test kprobes via ftrace interface
with many concurrent kprobe events.
This tries to add many kprobe events (up to 256) on
kernel functions. To avoid making ftrace-based
kprobes (kprobes on fentry), it skips first N bytes
(on x86 N=5, on ppc or arm N=4) of function entry.
After that, it enables all those events, disable it,
and remove it.
Since the unoptimization buffer reclaiming will
be delayed, after removing events, it will wait
enough time.
Link: http://lkml.kernel.org/r/149577388470.11702.11832460851769204511.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Masami Hiramatsu [Thu, 25 May 2017 10:38:17 +0000 (19:38 +0900)]
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
Fix kprobes to set(recover) RWX bits correctly on trampoline
buffer before releasing it. Releasing readonly page to
module_memfree() crash the kernel.
Without this fix, if kprobes user register a bunch of kprobes
in function body (since kprobes on function entry usually
use ftrace) and unregister it, kernel hits a BUG and crash.
Link: http://lkml.kernel.org/r/149570868652.3518.14120169373590420503.stgit@devbox
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: d0381c81c2f7 ("kprobes/x86: Set kprobes pages read-only")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Luis Henriques [Thu, 25 May 2017 15:20:38 +0000 (16:20 +0100)]
ftrace: Fix memory leak in ftrace_graph_release()
ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not. This results in a memory leak that is easily
captured by kmemleak:
unreferenced object 0xffff880038afe000 (size 8192):
comm "trace-cmd", pid 238, jiffies
4294916898 (age 9.736s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
[<
ffffffff8113964d>] __kmalloc+0x12d/0x1a0
[<
ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
[<
ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
[<
ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
[<
ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
[<
ffffffff81140df7>] vfs_open+0x47/0x60
[<
ffffffff81150f95>] path_openat+0x2a5/0x1020
[<
ffffffff81152d6a>] do_filp_open+0x8a/0xf0
[<
ffffffff811411df>] do_sys_open+0x12f/0x200
[<
ffffffff811412ce>] SyS_open+0x1e/0x20
[<
ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
[<
ffffffffffffffff>] 0xffffffffffffffff
Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com
Cc: stable@vger.kernel.org
Fixes: b9b0c831bed2 ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Fri, 26 May 2017 23:45:13 +0000 (16:45 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"Just a few fixups to a couple of drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - ignore signals when finishing updating firmware
Input: elan_i2c - clear INT before resetting controller
Input: atmel_mxt_ts - add T100 as a readable object
Input: edt-ft5x06 - increase allowed data range for threshold parameter
Linus Torvalds [Fri, 26 May 2017 21:02:30 +0000 (14:02 -0700)]
Merge tag 'led_fixes_for_4-12-rc3' of git://git./linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski:
"A single LED fix for 4.12-rc3.
leds-pca955x driver uses only i2c_smbus API and thus it should pass
I2C_FUNC_SMBUS_BYTE_DATA flag to i2c_check_functionality"
* tag 'led_fixes_for_4-12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: pca955x: Correct I2C Functionality
Linus Torvalds [Fri, 26 May 2017 20:51:01 +0000 (13:51 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
Borkmann.
2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
Caratti.
3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
driver, from Jesper Brouer.
4) Defer slave->link updates until bonding is ready to do a full commit
to the new settings, from Nithin Sujir.
5) Properly reference count ipv4 FIB metrics to avoid use after free
situations, from Eric Dumazet and several others including Cong Wang
and Julian Anastasov.
6) Fix races in llc_ui_bind(), from Lin Zhang.
7) Fix regression of ESP UDP encapsulation for TCP packets, from
Steffen Klassert.
8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.
9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
Dawson.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add reference counting to metrics
net: ethernet: ax88796: don't call free_irq without request_irq first
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
sctp: fix ICMP processing if skb is non-linear
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
bonding: Don't update slave->link until ready to commit
test_bpf: Add a couple of tests for BPF_JSGE.
bpf: add various verifier test cases
bpf: fix wrong exposure of map_flags into fdinfo for lpm
bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
bpf: properly reset caller saved regs after helper call and ld_abs/ind
bpf: fix incorrect pruning decision when alignment must be tracked
arp: fixed -Wuninitialized compiler warning
tcp: avoid fastopen API to be used on AF_UNSPEC
net: move somaxconn init from sysctl code
net: fix potential null pointer dereference
geneve: fix fill_info when using collect_metadata
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
be2net: Fix offload features for Q-in-Q packets
vlan: Fix tcp checksum offloads in Q-in-Q vlans
...
Linus Torvalds [Fri, 26 May 2017 19:13:08 +0000 (12:13 -0700)]
Merge tag 'xfs-4.12-fixes-2' of git://git./fs/xfs/xfs-linux
Pull XFS fixes from Darrick Wong:
"A few miscellaneous bug fixes & cleanups:
- Fix indlen block reservation accounting bug when splitting delalloc
extent
- Fix warnings about unused variables that appeared in -rc1.
- Don't spew errors when bmapping a local format directory
- Fix an off-by-one error in a delalloc eof assertion
- Make fsmap only return inode information for CAP_SYS_ADMIN
- Fix a potential mount time deadlock recovering cow extents
- Fix unaligned memory access in _btree_visit_blocks
- Fix various SEEK_HOLE/SEEK_DATA bugs"
* tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
xfs: Fix missed holes in SEEK_HOLE implementation
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs: fix unaligned access in xfs_btree_visit_blocks
xfs: avoid mount-time deadlock in CoW extent recovery
xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN
xfs: bad assertion for delalloc an extent that start at i_size
xfs: fix warnings about unused stack variables
xfs: BMAPX shouldn't barf on inline-format directories
xfs: fix indlen accounting error on partial delalloc conversion
Eric Dumazet [Thu, 25 May 2017 21:27:35 +0000 (14:27 -0700)]
ipv4: add reference counting to metrics
Andrey Konovalov reported crashes in ipv4_mtu()
I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :
1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
2) At the same time run following loop :
while :
do
ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done
Cong Wang attempted to add back rt->fi in commit
82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.
Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)
I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.
Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.
Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 25 May 2017 20:54:53 +0000 (22:54 +0200)]
net: ethernet: ax88796: don't call free_irq without request_irq first
The function ax_init_dev (which is called only from the driver's .probe
function) calls free_irq in the error path without having requested the
irq in the first place. So drop the free_irq call in the error path.
Fixes: 825a2ff1896e ("AX88796 network driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Dawson [Thu, 25 May 2017 20:35:18 +0000 (06:35 +1000)]
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
This fix addresses two problems in the way the DSCP field is formulated
on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
1) The IPv6 tunneling code was manipulating the DSCP field of the
encapsulating packet using the 32b flowlabel. Since the flowlabel is
only the lower 20b it was incorrect to assume that the upper 12b
containing the DSCP and ECN fields would remain intact when formulating
the encapsulating header. This fix handles the 'inherit' and
'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
incorrect and resulted in the DSCP value always being set to 0.
Commit
90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
is non-0") caused the regression by masking out the flowlabel
which exposed the incorrect handling of the DSCP portion of the
flowlabel in ip6_tunnel and ip6_gre.
Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 25 May 2017 17:14:56 +0000 (19:14 +0200)]
sctp: fix ICMP processing if skb is non-linear
sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.
v2:
- don't use skb_header_pointer() to read the transport header, since
icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
linzhang [Thu, 25 May 2017 06:07:18 +0000 (14:07 +0800)]
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
There is a race condition in llc_ui_bind if two or more processes/threads
try to bind a same socket.
If more processes/threads bind a same socket success that will lead to
two problems, one is this action is not what we expected, another is
will lead to kernel in unstable status or oops(in my simple test case,
cause llc2.ko can't unload).
The current code is test SOCK_ZAPPED bit to avoid a process to
bind a same socket twice but that is can't avoid more processes/threads
try to bind a same socket at the same time.
So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 26 May 2017 18:05:22 +0000 (11:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into this series. This contains:
- A set of NVMe fixes, pulled from Christoph. This includes a set of
fixes for the fiber channel bits from James Smart, rdma queue depth
fix from Marta, controller removal fixes from Ming, and some more
APST quirk updates from Andy.
- A blk-mq debugfs fix from Bart, fixing a problem with the
untangling of the sysfs and debugfs blk-mq bits that was added in
this series.
- Error code fix in add_partition() from Dan.
- A small series of fixes for the new blk-throttle code from Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block: (21 commits)
blk-mq: Only register debugfs attributes for blk-mq queues
nvme: Quirk APST on Intel 600P/P3100 devices
nvme: only setup block integrity if supported by the driver
nvme: replace is_flags field in nvme_ctrl_ops with a flags field
nvme-pci: consistencly use ctrl->device for logging
partitions/msdos: FreeBSD UFS2 file systems are not recognized
block: fix an error code in add_partition()
blk-throttle: force user to configure all settings for io.low
blk-throttle: respect 0 bps/iops settings for io.low
blk-throttle: output some debug info in trace
blk-throttle: add hierarchy support for latency target and idle time
nvme_fc: remove extra controller reference taken on reconnect
nvme_fc: correct nvme status set on abort
nvme_fc: set logging level on resets/deletes
nvme_fc: revise comment on teardown
nvme_fc: Support ctrl_loss_tmo
nvme_fc: get rid of local reconnect_delay
blk-mq: remove blk_mq_abort_requeue_list()
nvme: avoid to use blk_mq_abort_requeue_list()
nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
...
Linus Torvalds [Fri, 26 May 2017 17:51:18 +0000 (10:51 -0700)]
Merge tag 'pci-v4.12-fixes-1' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix PCI_ENDPOINT build error (merged for v4.12)
- fix Switchtec driver (merged for v4.12)
- fix imx6 config read timeouts, fallout from changing to non-postable
reads
- add PM "needs_resume" flag for i915 suspend issue
* tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/PM: Add needs_resume flag to avoid suspend complete optimization
PCI: imx6: Fix config read timeout handling
switchtec: Fix minor bug with partition ID register
switchtec: Use new cdev_device_add() helper function
PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA
Linus Torvalds [Fri, 26 May 2017 16:35:22 +0000 (09:35 -0700)]
Merge tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client
Pul ceph fixes from Ilya Dryomov:
"A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
messenger patch from Zheng and Luis' fallocate fix"
* tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client:
ceph: check that the new inode size is within limits in ceph_fallocate()
libceph: cleanup old messages according to reconnect seq
libceph: NULL deref on crush_decode() error path
libceph: fix error handling in process_one_ticket()
libceph: validate blob_struct_v in process_one_ticket()
libceph: drop version variable from ceph_monmap_decode()
libceph: make ceph_msg_data_advance() return void
libceph: use kbasename() and kill ceph_file_part()
Linus Torvalds [Fri, 26 May 2017 16:05:35 +0000 (09:05 -0700)]
Merge tag 'mmc-v4.12-rc2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"This contains fixes to make the WiFi work again for the ARM64 Hikey
board.
Together with a couple of DTS updates for the Hikey board we have also
extended the mmc pwrseq_simple, to support a new power-off-delay-us DT
property, as that was required to enable a graceful power off sequence
for the WiFi chip"
* tag 'mmc-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
arm64: dts: hikey: Fix WiFi support
arm64: dts: hi6220: Move board data from the dwmmc nodes to hikey dts
arm64: dts: hikey: Add the SYS_5V and the VDD_3V3 regulators
arm64: dts: hi6220: Move the fixed_5v_hub regulator to the hikey dts
arm64: dts: hikey: Add clock for the pmic mfd
mfd: dts: hi655x: Add clock binding for the pmic
mmc: pwrseq_simple: Parse DTS for the power-off-delay-us property
mmc: dt: pwrseq-simple: Invent power-off-delay-us
Linus Torvalds [Fri, 26 May 2017 16:03:09 +0000 (09:03 -0700)]
Merge tag 'sound-4.12-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains a few HD-audio device-specific quirks and an endianess
fix for USB-audio, as well as the update of quirk model list document.
All fixes are small and trivial.
The document update could have been postponed, but it's a good thing
for user and has absolutely zero risk of breakage, so included here"
* tag 'sound-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
ALSA: hda - Update the list of quirk models
ALSA: hda - Provide dual-codecs model option for a few Realtek codecs
ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo
ALSA: hda - No loopback on ALC299 codec
ALSA: usb-audio: fix Amanero Combo384 quirk on big-endian hosts
Linus Torvalds [Fri, 26 May 2017 15:54:06 +0000 (08:54 -0700)]
Merge tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Not a whole lot happening here, a set of amdgpu fixes and one core
deadlock fix, and some misc drivers fixes"
* tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: fix null point error when rmmod amdgpu.
drm/amd/powerplay: fix a signedness bugs
drm/amdgpu: fix NULL pointer panic of emit_gds_switch
drm/radeon: Unbreak HPD handling for r600+
drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu: fix fundamental suspend/resume issue
drm/gma500/psb: Actually use VBT mode when it is found
drm: Fix deadlock retry loop in page_flip_ioctl
drm: qxl: Delay entering atomic context during cursor update
drm/radeon: Fix oops upon driver load on PowerXpress laptops
Christoph Hellwig [Sat, 20 May 2017 16:59:54 +0000 (18:59 +0200)]
PCI/msi: fix the pci_alloc_irq_vectors_affinity stub
We need to return an error for any call that asks for MSI / MSI-X
vectors only, so that non-trivial fallback logic can work properly.
Also valid dev->irq and use the "correct" errno value based on feedback
from Linus.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: aff17164 ("PCI: Provide sensible IRQ vector alloc/free routines")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Fri, 26 May 2017 15:11:19 +0000 (09:11 -0600)]
Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus
Christoph writes:
"A couple of fixes for the next rc on the nvme front. Various FC fixes
from James, controller removal fixes from Ming (including a block layer
patch), a APST related device quirk from Andy, a RDMA fix for small
queue depth device from Marta, as well as fixes for the lack of
metadata support in non-PCIe drivers and the printk logging format from
me."
Bart Van Assche [Thu, 25 May 2017 23:38:06 +0000 (16:38 -0700)]
blk-mq: Only register debugfs attributes for blk-mq queues
The code in blk-mq-debugfs.c assumes that it is working on a blk-mq
queue and is not intended to work on a blk-sq queue. Hence only
register blk-mq debugfs attributes for blk-mq queues.
Fixes: commit 9c1051aacde8 ("blk-mq: untangle debugfs and sysfs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Jan Kiszka [Wed, 24 May 2017 18:04:41 +0000 (20:04 +0200)]
x86/timers: Move simple_udelay_calibration past init_hypervisor_platform
This ensures that adjustments to x86_platform done by the hypervisor
setup is already respected by this simple calibration.
The current user of this, introduced by
1b5aeebf3a92 ("x86/earlyprintk:
Add support for earlyprintk via USB3 debug port"), comes much later
into play.
Fixes: dd759d93f4dd ("x86/timers: Add simple udelay calibration")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: http://lkml.kernel.org/r/5e89fe60-aab3-2c1c-aba8-32f8ad376189@siemens.com
Andy Lutomirski [Wed, 24 May 2017 22:06:31 +0000 (15:06 -0700)]
nvme: Quirk APST on Intel 600P/P3100 devices
They have known firmware bugs. A fix is apparently in the works --
once fixed firmware is available, someone from Intel (Hi, Keith!)
can adjust the quirk accordingly.
Cc: stable@vger.kernel.org # v4.11
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Sat, 20 May 2017 13:14:45 +0000 (15:14 +0200)]
nvme: only setup block integrity if supported by the driver
Currently only the PCIe driver supports metadata, so we should not claim
integrity support for the other drivers. This prevents nasty crashes
with targets that advertise metadata support on fabrics.
Also use the opportunity to factor out some code into a separate helper
that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Christoph Hellwig [Sat, 20 May 2017 13:14:44 +0000 (15:14 +0200)]
nvme: replace is_flags field in nvme_ctrl_ops with a flags field
So that we can have more flags for transport-specific behavior.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Christoph Hellwig [Sat, 20 May 2017 13:14:43 +0000 (15:14 +0200)]
nvme-pci: consistencly use ctrl->device for logging
This is what most of the code already does and gives much more useful
prefixes than the device embedded in the pci_dev.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Dave Airlie [Fri, 26 May 2017 01:51:55 +0000 (11:51 +1000)]
Merge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A bunch of bug fixes:
- Fix display flickering on some chips at high refresh rates
- suspend/resume fix
- hotplug fix
- a couple of segfault fixes for certain cases
* 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: fix null point error when rmmod amdgpu.
drm/amd/powerplay: fix a signedness bugs
drm/amdgpu: fix NULL pointer panic of emit_gds_switch
drm/radeon: Unbreak HPD handling for r600+
drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu: fix fundamental suspend/resume issue
Dave Airlie [Fri, 26 May 2017 01:51:28 +0000 (11:51 +1000)]
Merge tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Core Changes:
- Don't drop vblank reference more than once in cases of ww retry (Daniel)
Driver Changes:
- radeon: Fix oops during radeon probe trying to reference wrong device (Lukas)
- qxl: Avoid sleeping while in atomic context on cursor update (Gabriel)
- gma500: Use VBT mode instead of pre-programmed mode for LVDS (Patrik)
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
* tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc:
drm/gma500/psb: Actually use VBT mode when it is found
drm: Fix deadlock retry loop in page_flip_ioctl
drm: qxl: Delay entering atomic context during cursor update
drm/radeon: Fix oops upon driver load on PowerXpress laptops
Nithin Sujir [Thu, 25 May 2017 02:45:17 +0000 (19:45 -0700)]
bonding: Don't update slave->link until ready to commit
In the loadbalance arp monitoring scheme, when a slave link change is
detected, the slave->link is immediately updated and slave_state_changed
is set. Later down the function, the rtnl_lock is acquired and the
changes are committed, updating the bond link state.
However, the acquisition of the rtnl_lock can fail. The next time the
monitor runs, since slave->link is already updated, it determines that
link is unchanged. This results in the bond link state permanently out
of sync with the slave link.
This patch modifies bond_loadbalance_arp_mon() to handle link changes
identical to bond_ab_arp_{inspect/commit}(). The new link state is
maintained in slave->new_link until we're ready to commit at which point
it's copied into slave->link.
NOTE: miimon_{inspect/commit}() has a more complex state machine
requiring the use of the bond_{propose,commit}_link_state() functions
which maintains the intermediate state in slave->link_new_state. The arp
monitors don't require that.
Testing: This bug is very easy to reproduce with the following steps.
1. In a loop, toggle a slave link of a bond slave interface.
2. In a separate loop, do ifconfig up/down of an unrelated interface to
create contention for rtnl_lock.
Within a few iterations, the bond link goes out of sync with the slave
link.
Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Wed, 24 May 2017 23:35:49 +0000 (16:35 -0700)]
test_bpf: Add a couple of tests for BPF_JSGE.
Some JITs can optimize comparisons with zero. Add a couple of
BPF_JSGE tests against immediate zero.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 May 2017 17:44:29 +0000 (13:44 -0400)]
Merge branch 'bpf-fixes'
Daniel Borkmann says:
====================
Various BPF fixes
Follow-up to fix incorrect pruning when alignment tracking is
in use and to properly clear regs after call to not leave stale
data behind, also a fix that adds bpf_clone_redirect to the
bpf_helper_changes_pkt_data helper and exposes correct map_flags
for lpm map into fdinfo. For details, please see individual
patches.
v1 -> v2:
- Reworked first patch so that env->strict_alignment is the
final indicator on whether we have to deal with strict
alignment rather than having CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
checks on various locations, so only checking env->strict_alignment
is sufficient after that. Thanks for spotting, Dave!
- Added patch 3 and 4.
- Rest as is.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:09 +0000 (01:05 +0200)]
bpf: add various verifier test cases
This patch adds various verifier test cases:
1) A test case for the pruning issue when tracking alignment
is used.
2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
arithmetic turns such register into UNKNOWN_VALUE type.
3) Test cases for the special treatment of LD_ABS/LD_IND to
make sure verifier doesn't break calling convention here.
Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
storing temporary data, so they really must be marked as
NOT_INIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:08 +0000 (01:05 +0200)]
bpf: fix wrong exposure of map_flags into fdinfo for lpm
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:07 +0000 (01:05 +0200)]
bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.
Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:06 +0000 (01:05 +0200)]
bpf: properly reset caller saved regs after helper call and ld_abs/ind
Currently, after performing helper calls, we clear all caller saved
registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
specification. The way we reset these regs can affect pruning decisions
in later paths, since we only reset register's imm to 0 and type to
NOT_INIT. However, we leave out clearing of other variables such as id,
min_value, max_value, etc, which can later on lead to pruning mismatches
due to stale data.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:05 +0000 (01:05 +0200)]
bpf: fix incorrect pruning decision when alignment must be tracked
Currently, when we enforce alignment tracking on direct packet access,
the verifier lets the following program pass despite doing a packet
write with unaligned access:
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (61) r7 = *(u32 *)(r1 +8)
3: (bf) r0 = r2
4: (07) r0 += 14
5: (25) if r7 > 0x1 goto pc+4
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
6: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
7: (63) *(u32 *)(r0 -4) = r0
8: (b7) r0 = 0
9: (95) exit
from 6 to 8:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
8: (b7) r0 = 0
9: (95) exit
from 5 to 10:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=2 R10=fp
10: (07) r0 += 1
11: (05) goto pc-6
6: safe <----- here, wrongly found safe
processed 15 insns
However, if we enforce a pruning mismatch by adding state into r8
which is then being mismatched in states_equal(), we find that for
the otherwise same program, the verifier detects a misaligned packet
access when actually walking that path:
0: (61) r2 = *(u32 *)(r1 +76)
1: (61) r3 = *(u32 *)(r1 +80)
2: (61) r7 = *(u32 *)(r1 +8)
3: (b7) r8 = 1
4: (bf) r0 = r2
5: (07) r0 += 14
6: (25) if r7 > 0x1 goto pc+4
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
7: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
8: (63) *(u32 *)(r0 -4) = r0
9: (b7) r0 = 0
10: (95) exit
from 7 to 9:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
9: (b7) r0 = 0
10: (95) exit
from 6 to 11:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=2
R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
11: (07) r0 += 1
12: (b7) r8 = 0
13: (05) goto pc-7 <----- mismatch due to r8
7: (2d) if r0 > r3 goto pc+1
R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
R3=pkt_end R7=inv,min_value=2
R8=imm0,min_value=0,max_value=0,min_align=
2147483648 R10=fp
8: (63) *(u32 *)(r0 -4) = r0
misaligned packet access off 2+15+-4 size 4
The reason why we fail to see it in states_equal() is that the
third test in compare_ptrs_to_packet() ...
if (old->off <= cur->off &&
old->off >= old->range && cur->off >= cur->range)
return true;
... will let the above pass. The situation we run into is that
old->off <= cur->off (14 <= 15), meaning that prior walked paths
went with smaller offset, which was later used in the packet
access after successful packet range check and found to be safe
already.
For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
as in above program to it, results in R0=pkt(id=0,off=14,r=0)
before the packet range test. Now, testing this against R3=pkt_end
with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
for the case when we're within bounds. A write into the packet
at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
aligned (2 is for NET_IP_ALIGN). After processing this with
all fall-through paths, we later on check paths from branches.
When the above skb->mark test is true, then we jump near the
end of the program, perform r0 += 1, and jump back to the
'if r0 > r3 goto out' test we've visited earlier already. This
time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
that part because this time we'll have a larger safe packet
range, and we already found that with off=14 all further insn
were already safe, so it's safe as well with a larger off.
However, the problem is that the subsequent write into the packet
with 2 + 15 -4 is then unaligned, and not caught by the alignment
tracking. Note that min_align, aux_off, and aux_off_align were
all 0 in this example.
Since we cannot tell at this time what kind of packet access was
performed in the prior walk and what minimal requirements it has
(we might do so in the future, but that requires more complexity),
fix it to disable this pruning case for strict alignment for now,
and let the verifier do check such paths instead. With that applied,
the test cases pass and reject the program due to misalignment.
Fixes: d1174416747d ("bpf: Track alignment of register values in the verifier.")
Reference: http://patchwork.ozlabs.org/patch/761909/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ihar Hrachyshka [Wed, 24 May 2017 22:19:35 +0000 (15:19 -0700)]
arp: fixed -Wuninitialized compiler warning
Commit
7d472a59c0e5ec117220a05de6b370447fb6cb66 ("arp: always override
existing neigh entries with gratuitous ARP") introduced a compiler
warning:
net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]
While the code logic seems to be correct and doesn't allow the variable
to be used uninitialized, and the warning is not consistently
reproducible, it's still worth fixing it for other people not to waste
time looking at the warning in case it pops up in the build environment.
Yes, compiler is probably at fault, but we will need to accommodate.
Fixes: 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP")
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Wed, 24 May 2017 16:59:31 +0000 (09:59 -0700)]
tcp: avoid fastopen API to be used on AF_UNSPEC
Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.
One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A: Thread B:
------------------------------------------------------------------------
sendto()
- tcp_sendmsg()
- sk_stream_memory_free() = 0
- goto wait_for_sndbuf
- sk_stream_wait_memory()
- sk_wait_event() // sleep
| sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
| - tcp_sendmsg()
| - tcp_sendmsg_fastopen()
| - __inet_stream_connect()
| - tcp_disconnect() //because of AF_UNSPEC
| - tcp_transmit_skb()// send RST
| - return 0; // no reconnect!
| - sk_stream_wait_connect()
| - sock_error()
| - xchg(&sk->sk_err, 0)
| - return -ECONNRESET
- ... // wake up, see sk->sk_err == 0
- skb_entail() on TCP_CLOSE socket
If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.
When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.
Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Kapl [Wed, 24 May 2017 08:22:22 +0000 (10:22 +0200)]
net: move somaxconn init from sysctl code
The default value for somaxconn is set in sysctl_core_net_init(), but this
function is not called when kernel is configured without CONFIG_SYSCTL.
This results in the kernel not being able to accept TCP connections,
because the backlog has zero size. Usually, the user ends up with:
"TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request. Check SNMP counters."
If SYN cookies are not enabled the connection is rejected.
Before
ef547f2ac16 (tcp: remove max_qlen_log), the effects were less
severe, because the backlog was always at least eight slots long.
Signed-off-by: Roman Kapl <roman.kapl@sysgo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>