Anton Volkov [Mon, 7 Aug 2017 12:54:14 +0000 (15:54 +0300)]
hysdn: fix to a race condition in put_log_buffer
The synchronization type that was used earlier to guard the loop that
deletes unused log buffers may lead to a situation that prevents any
thread from going through the loop.
The patch deletes previously used synchronization mechanism and moves
the loop under the spin_lock so the similar cases won't be feasible in
the future.
Found by by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 7 Aug 2017 11:28:39 +0000 (13:28 +0200)]
s390/qeth: fix L3 next-hop in xmit qeth hdr
On L3, the qeth_hdr struct needs to be filled with the next-hop
IP address.
The current code accesses rtable->rt_gateway without checking that
rtable is a valid address. The accidental access to a lowcore area
results in a random next-hop address in the qeth_hdr.
rtable (or more precisely, skb_dst(skb)) can be NULL in rare cases
(for instance together with AF_PACKET sockets).
This patch adds the missing NULL-ptr checks.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 87e7597b5a3 qeth: Move away from using neighbour entries in qeth_l3_fill_header()
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Aug 2017 17:10:19 +0000 (10:10 -0700)]
Merge branch 'asix-Improve-robustness'
Dean Jenkins says:
====================
asix: Improve robustness
Please consider taking these patches to improve the robustness of the ASIX USB
to Ethernet driver.
Failures prompting an ASIX driver code review
=============================================
On an ARM i.MX6 embedded platform some strange one-off and two-off failures were
observed in and around the ASIX USB to Ethernet driver. This was observed on a
highly modified kernel 3.14 with the ASIX driver containing back-ported changes
from kernel.org up to kernel 4.8 approximately.
a) A one-off failure in asix_rx_fixup_internal():
There was an occurrence of an attempt to write off the end of the netdev buffer
which was trapped by skb_over_panic() in skb_put().
[20030.846440] skbuff: skb_over_panic: text:
7f2271c0 len:120 put:60 head:
8366ecc0 data:
8366ed02 tail:0x8366ed7a end:0x8366ed40 dev:eth0
[20030.863007] Kernel BUG at
8044ce38 [verbose debug info unavailable]
[20031.215345] Backtrace:
[20031.217884] [<
8044cde0>] (skb_panic) from [<
8044d50c>] (skb_put+0x50/0x5c)
[20031.227408] [<
8044d4bc>] (skb_put) from [<
7f2271c0>] (asix_rx_fixup_internal+0x1c4/0x23c [asix])
[20031.242024] [<
7f226ffc>] (asix_rx_fixup_internal [asix]) from [<
7f22724c>] (asix_rx_fixup_common+0x14/0x18 [asix])
[20031.260309] [<
7f227238>] (asix_rx_fixup_common [asix]) from [<
7f21f7d4>] (usbnet_bh+0x74/0x224 [usbnet])
[20031.269879] [<
7f21f760>] (usbnet_bh [usbnet]) from [<
8002f834>] (call_timer_fn+0xa4/0x1f0)
[20031.283961] [<
8002f790>] (call_timer_fn) from [<
80030834>] (run_timer_softirq+0x230/0x2a8)
[20031.302782] [<
80030604>] (run_timer_softirq) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[20031.321511] [<
80028624>] (__do_softirq) from [<
80028c38>] (irq_exit+0x8c/0xe8)
[20031.339298] [<
80028bac>] (irq_exit) from [<
8000e9c8>] (handle_IRQ+0x8c/0xc8)
[20031.350038] [<
8000e93c>] (handle_IRQ) from [<
800085c8>] (gic_handle_irq+0xb8/0xf8)
[20031.365528] [<
80008510>] (gic_handle_irq) from [<
8050de80>] (__irq_svc+0x40/0x70)
Analysis of the logic of the ASIX driver (containing backported changes from
kernel.org up to kernel 4.8 approximately) suggested that the software could not
trigger skb_over_panic(). The analysis of the kernel BUG() crash information
suggested that the netdev buffer was written with 2 minimal 60 octet length
Ethernet frames (ASIX hardware drops the 4 octet FCS field) and the 2nd Ethernet
frame attempted to write off the end of the netdev buffer.
Note that the netdev buffer should only contain 1 Ethernet frame so if an
attempt to write 2 Ethernet frames into the buffer is made then that is wrong.
However, the logic of the asix_rx_fixup_internal() only allows 1 Ethernet frame
to be written into the netdev buffer.
Potentially this failure was due to memory corruption because it was only seen
once.
b) Two-off failures in the NAPI layer's backlog queue:
There were 2 crashes in the NAPI layer's backlog queue presumably after
asix_rx_fixup_internal() called usbnet_skb_return().
[24097.273945] Unable to handle kernel NULL pointer dereference at virtual address
00000004
[24097.398944] PC is at process_backlog+0x80/0x16c
[24097.569466] Backtrace:
[24097.572007] [<
8045ad98>] (process_backlog) from [<
8045b64c>] (net_rx_action+0xcc/0x248)
[24097.591631] [<
8045b580>] (net_rx_action) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[24097.610022] [<
80028624>] (__do_softirq) from [<
800289cc>] (run_ksoftirqd+0x2c/0x84)
and
[ 1059.828452] Unable to handle kernel NULL pointer dereference at virtual address
00000000
[ 1059.953715] PC is at process_backlog+0x84/0x16c
[ 1060.140896] Backtrace:
[ 1060.143434] [<
8045ad98>] (process_backlog) from [<
8045b64c>] (net_rx_action+0xcc/0x248)
[ 1060.163075] [<
8045b580>] (net_rx_action) from [<
80028780>] (__do_softirq+0x15c/0x37c)
[ 1060.181474] [<
80028624>] (__do_softirq) from [<
80028c38>] (irq_exit+0x8c/0xe8)
[ 1060.199256] [<
80028bac>] (irq_exit) from [<
8000e9c8>] (handle_IRQ+0x8c/0xc8)
[ 1060.210006] [<
8000e93c>] (handle_IRQ) from [<
800085c8>] (gic_handle_irq+0xb8/0xf8)
[ 1060.225492] [<
80008510>] (gic_handle_irq) from [<
8050de80>] (__irq_svc+0x40/0x70)
The embedded board was only using an ASIX USB to Ethernet adaptor eth0.
Analysis suggested that the doubly-linked list pointers of the backlog queue had
been corrupted because one of the link pointers was NULL.
Potentially this failure was due to memory corruption because it was only seen
twice.
Results of the ASIX driver code review
======================================
During the code review some weaknesses were observed in the ASIX driver and the
following patches have been created to improve the robustness.
Brief overview of the patches
-----------------------------
1. asix: Add rx->ax_skb = NULL after usbnet_skb_return()
The current ASIX driver sends the received Ethernet frame to the NAPI layer of
the network stack via the call to usbnet_skb_return() in
asix_rx_fixup_internal() but retains the rx->ax_skb pointer to the netdev
buffer. The driver no longer needs the rx->ax_skb pointer at this point because
the NAPI layer now has the Ethernet frame.
This means that asix_rx_fixup_internal() must not use rx->ax_skb after the call
to usbnet_skb_return() because it could corrupt the handling of the Ethernet
frame within the network layer.
Therefore, to remove the risk of erroneous usage of rx->ax_skb, set rx->ax_skb
to NULL after the call to usbnet_skb_return(). This avoids potential erroneous
freeing of rx->ax_skb and erroneous writing to the netdev buffer. If the
software now somehow inappropriately reused rx->ax_skb, then a NULL pointer
dereference of rx->ax_skb would occur which makes investigation easier.
2. asix: Ensure asix_rx_fixup_info members are all reset
This patch creates reset_asix_rx_fixup_info() to allow all the
asix_rx_fixup_info structure members to be consistently reset to initial
conditions.
Call reset_asix_rx_fixup_info() upon each detectable error condition so that the
next URB is processed from a known state.
Otherwise, there is a risk that some members of the asix_rx_fixup_info structure
may be incorrect after an error occurred so potentially leading to a
malfunction.
3. asix: Fix small memory leak in ax88772_unbind()
This patch creates asix_rx_fixup_common_free() to allow the rx->ax_skb to be
freed when necessary.
asix_rx_fixup_common_free() is called from ax88772_unbind() before the parent
private data structure is freed.
Without this patch, there is a risk of a small netdev buffer memory leak each
time ax88772_unbind() is called during the reception of an Ethernet frame that
spans across 2 URBs.
Testing
=======
The patches have been sanity tested on a 64-bit Linux laptop running kernel
4.13-rc2 with the 3 patches applied on top.
The ASIX USB to Adaptor used for testing was (output of lsusb):
ID 0b95:772b ASIX Electronics Corp. AX88772B
Test #1
-------
The test ran a flood ping test script which slowly incremented the ICMP Echo
Request's payload from 0 to 5000 octets. This eventually causes IPv4
fragmentation to occur which causes Ethernet frames to be sent very close to
each other so increases the probability that an Ethernet frame will span 2 URBs.
The test showed that all pings were successful. The test took about 15 minutes
to complete.
Test #2
-------
A script was run on the laptop to periodically run ifdown and ifup every second
so that the ASIX USB to Adaptor was up for 1 second and down for 1 second.
From a Linux PC connected to the laptop, the following ping command was used
ping -f -s 5000 <ip address of laptop>
The large ICMP payload causes IPv4 fragmentation resulting in multiple
Ethernet frames per original IP packet.
Kernel debug within the ASIX driver was enabled to see whether any ASIX errors
were generated. The test was run for about 24 hours and no ASIX errors were
seen.
Patches
=======
The 3 patches have been rebased off the net-next repo master branch with HEAD
fbbeefd net: fec: Allow reception of frames bigger than 1522 bytes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:16 +0000 (09:50 +0100)]
asix: Fix small memory leak in ax88772_unbind()
When Ethernet frames span mulitple URBs, the netdev buffer memory
pointed to by the asix_rx_fixup_info structure remains allocated
during the time gap between the 2 executions of asix_rx_fixup_internal().
This means that if ax88772_unbind() is called within this time
gap to free the memory of the parent private data structure then
a memory leak of the part filled netdev buffer memory will occur.
Therefore, create a new function asix_rx_fixup_common_free() to
free the memory of the netdev buffer and add a call to
asix_rx_fixup_common_free() from inside ax88772_unbind().
Consequently when an unbind occurs part way through receiving
an Ethernet frame, the netdev buffer memory that is holding part
of the received Ethernet frame will now be freed.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:15 +0000 (09:50 +0100)]
asix: Ensure asix_rx_fixup_info members are all reset
There is a risk that the members of the structure asix_rx_fixup_info
become unsynchronised leading to the possibility of a malfunction.
For example, rx->split_head was not being set to false after an
error was detected so potentially could cause a malformed 32-bit
Data header word to be formed.
Therefore add function reset_asix_rx_fixup_info() to reset all the
members of asix_rx_fixup_info so that future processing will start
with known initial conditions.
Also, if (skb->len != offset) becomes true then call
reset_asix_rx_fixup_info() so that the processing of the next URB
starts with known initial conditions. Without the call, the check
does nothing which potentially could lead to a malfunction
when the next URB is processed.
In addition, for robustness, call reset_asix_rx_fixup_info() before
every error path's "return 0". This ensures that the next URB is
processed from known initial conditions.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Jenkins [Mon, 7 Aug 2017 08:50:14 +0000 (09:50 +0100)]
asix: Add rx->ax_skb = NULL after usbnet_skb_return()
In asix_rx_fixup_internal() there is a risk that rx->ax_skb gets
reused after passing the Ethernet frame into the network stack via
usbnet_skb_return().
The risks include:
a) asynchronously freeing rx->ax_skb after passing the netdev buffer
to the NAPI layer which might corrupt the backlog queue.
b) erroneously reusing rx->ax_skb such as calling skb_put_data() multiple
times which causes writing off the end of the netdev buffer.
Therefore add a defensive rx->ax_skb = NULL after usbnet_skb_return()
so that it is not possible to free rx->ax_skb or to apply
skb_put_data() too many times.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Richter [Mon, 7 Aug 2017 08:16:36 +0000 (10:16 +0200)]
bpf: fix selftest/bpf/test_pkt_md_access on s390x
Commit
18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
introduced new eBPF test cases. One of them (test_pkt_md_access.c)
fails on s390x. The BPF verifier error message is:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 349 nsec
test_pkt_access:PASS:ipv6 212 nsec
[....]
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (71) r2 = *(u8 *)(r1 +0)
invalid bpf_context access off=0 size=1
libbpf: -- END LOG --
libbpf: failed to load program 'test1'
libbpf: failed to load object './test_pkt_md_access.o'
Summary: 29 PASSED, 1 FAILED
[root@s8360046 bpf]#
This is caused by a byte endianness issue. S390x is a big endian
architecture. Pointer access to the lowest byte or halfword of a
four byte value need to add an offset.
On little endian architectures this offset is not needed.
Fix this and use the same approach as the originator used for other files
(for example test_verifier.c) in his original commit.
With this fix the test program test_progs succeeds on s390x:
[root@s8360046 bpf]# ./test_progs
test_pkt_access:PASS:ipv4 236 nsec
test_pkt_access:PASS:ipv6 217 nsec
test_xdp:PASS:ipv4 3624 nsec
test_xdp:PASS:ipv6 1722 nsec
test_l4lb:PASS:ipv4 926 nsec
test_l4lb:PASS:ipv6 1322 nsec
test_tcp_estats:PASS: 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-prog-id 0 nsec
test_bpf_obj_id:PASS:get-fd-by-notexist-map-id 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-map-info(fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-prog-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-prog-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total prog id found by get_next_id 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:get-map-fd(next_id) 0 nsec
test_bpf_obj_id:PASS:check get-map-info(next_id->fd) 0 nsec
test_bpf_obj_id:PASS:check total map id found by get_next_id 0 nsec
test_pkt_md_access:PASS: 277 nsec
Summary: 30 PASSED, 0 FAILED
[root@s8360046 bpf]#
Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
stephen hemminger [Fri, 4 Aug 2017 00:13:54 +0000 (17:13 -0700)]
netvsc: fix race on sub channel creation
The existing sub channel code did not wait for all the sub-channels
to completely initialize. This could lead to race causing crash
in napi_netif_del() from bad list. The existing code would send
an init message, then wait only for the initial response that
the init message was received. It thought it was waiting for
sub channels but really the init response did the wakeup.
The new code keeps track of the number of open channels and
waits until that many are open.
Other issues here were:
* host might return less sub-channels than was requested.
* the new init status is not valid until after init was completed.
Fixes: b3e6b82a0099 ("hv_netvsc: Wait for sub-channels to be processed during probe")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 4 Aug 2017 20:24:41 +0000 (22:24 +0200)]
bpf: fix byte order test in test_verifier
We really must check with #if __BYTE_ORDER == XYZ instead of
just presence of #ifdef __LITTLE_ENDIAN. I noticed that when
actually running this on big endian machine, the latter test
resolves to true for user space, same for #ifdef __BIG_ENDIAN.
E.g., looking at endian.h from libc, both are also defined
there, so we really must test this against __BYTE_ORDER instead
for proper insns selection. For the kernel, such checks are
fine though e.g. see
13da9e200fe4 ("Revert "endian: #define
__BYTE_ORDER"") and
415586c9e6d3 ("UAPI: fix endianness conditionals
in M32R's asm/stat.h") for some more context, but not for
user space. Lets also make sure to properly include endian.h.
After that, suite passes for me:
./test_verifier: ELF 64-bit MSB executable, [...]
Linux foo 4.13.0-rc3+ #4 SMP Fri Aug 4 06:59:30 EDT 2017 s390x s390x s390x GNU/Linux
Before fix: Summary: 505 PASSED, 11 FAILED
After fix: Summary: 516 PASSED, 0 FAILED
Fixes: 18f3d6be6be1 ("selftests/bpf: Add test cases to test narrower ctx field loads")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Bogendoerfer [Thu, 3 Aug 2017 13:43:14 +0000 (15:43 +0200)]
xgene: Always get clk source, but ignore if it's missing for SGMII ports
Even the driver doesn't do anything with the clk source for SGMII
ports it needs to be enabled by doing a devm_clk_get(), if there is
a clk source in DT.
Fixes: 0db01097cabd ('xgene: Don't fail probe, if there is no clk resource for SGMII interfaces')
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Tested-by: Laura Abbott <labbott@redhat.com>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Fri, 4 Aug 2017 00:10:12 +0000 (17:10 -0700)]
MIPS: Add missing file for eBPF JIT.
Inexplicably, commit
f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
lost a file somewhere on its path to Linus' tree. Add back the
missing ebpf_jit.c so that we can build with CONFIG_BPF_JIT selected.
This version of ebpf_jit.c is identical to the original except for two
minor change need to resolve conflicts with changes merged from the
BPF branch:
A) Set prog->jited_len = image_size;
B) Use BPF_TAIL_CALL instead of BPF_CALL | BPF_X
Fixes: f381bf6d82f0 ("MIPS: Add support for eBPF JIT.")
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Aug 2017 18:18:01 +0000 (11:18 -0700)]
Merge branch 's390-bpf-jit-fixes'
Daniel Borkmann says:
====================
Two BPF fixes for s390
Found while testing some other work touching JITs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 4 Aug 2017 12:20:55 +0000 (14:20 +0200)]
bpf, s390: fix build for libbpf and selftest suite
The BPF feature test as well as libbpf is missing the __NR_bpf
define for s390 and currently refuses to compile (selftest suite
depends on libbpf as well). Similar issue was fixed some time
ago via
b0c47807d31d ("bpf: Add sparc support to tools and
samples."), just do the same and add definitions.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 4 Aug 2017 12:20:54 +0000 (14:20 +0200)]
bpf, s390: fix jit branch offset related to ldimm64
While testing some other work that required JIT modifications, I
run into test_bpf causing a hang when JIT enabled on s390. The
problematic test case was the one from
ddc665a4bb4b (bpf, arm64:
fix jit branch offset related to ldimm64), and turns out that we
do have a similar issue on s390 as well. In bpf_jit_prog() we
update next instruction address after returning from bpf_jit_insn()
with an insn_count. bpf_jit_insn() returns either -1 in case of
error (e.g. unsupported insn), 1 or 2. The latter is only the
case for ldimm64 due to spanning 2 insns, however, next address
is only set to i + 1 not taking actual insn_count into account,
thus fix is to use insn_count instead of 1. bpf_jit_enable in
mode 2 provides also disasm on s390:
Before fix:
000003ff800349b6:
a7f40003 brc 15,
3ff800349bc ; target
000003ff800349ba: 0000 unknown
000003ff800349bc:
e3b0f0700024 stg %r11,112(%r15)
000003ff800349c2:
e3e0f0880024 stg %r14,136(%r15)
000003ff800349c8: 0db0 basr %r11,%r0
000003ff800349ca:
c0ef00000000 llilf %r14,0
000003ff800349d0:
e320b0360004 lg %r2,54(%r11)
000003ff800349d6:
e330b03e0004 lg %r3,62(%r11)
000003ff800349dc:
ec23ffeda065 clgrj %r2,%r3,10,
3ff800349b6 ; jmp
000003ff800349e2:
e3e0b0460004 lg %r14,70(%r11)
000003ff800349e8:
e3e0b04e0004 lg %r14,78(%r11)
000003ff800349ee:
b904002e lgr %r2,%r14
000003ff800349f2:
e3b0f0700004 lg %r11,112(%r15)
000003ff800349f8:
e3e0f0880004 lg %r14,136(%r15)
000003ff800349fe: 07fe bcr 15,%r14
After fix:
000003ff80ef3db4:
a7f40003 brc 15,
3ff80ef3dba
000003ff80ef3db8: 0000 unknown
000003ff80ef3dba:
e3b0f0700024 stg %r11,112(%r15)
000003ff80ef3dc0:
e3e0f0880024 stg %r14,136(%r15)
000003ff80ef3dc6: 0db0 basr %r11,%r0
000003ff80ef3dc8:
c0ef00000000 llilf %r14,0
000003ff80ef3dce:
e320b0360004 lg %r2,54(%r11)
000003ff80ef3dd4:
e330b03e0004 lg %r3,62(%r11)
000003ff80ef3dda:
ec230006a065 clgrj %r2,%r3,10,
3ff80ef3de6 ; jmp
000003ff80ef3de0:
e3e0b0460004 lg %r14,70(%r11)
000003ff80ef3de6:
e3e0b04e0004 lg %r14,78(%r11) ; target
000003ff80ef3dec:
b904002e lgr %r2,%r14
000003ff80ef3df0:
e3b0f0700004 lg %r11,112(%r15)
000003ff80ef3df6:
e3e0f0880004 lg %r14,136(%r15)
000003ff80ef3dfc: 07fe bcr 15,%r14
test_bpf.ko suite runs fine after the fix.
Fixes: 054623105728 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Aug 2017 18:15:08 +0000 (11:15 -0700)]
Merge branch 'mlxsw-Couple-of-fixes'
Jiri Pirko says:
====================
mlxsw: Couple of fixes
Ido says:
The first patch prevents us from warning about valid situations that can
happen due to the fact that some operations in switchdev are deferred.
Second patch fixes a long standing problem in which we didn't correctly
free resources upon module removal, resulting in a memory leak.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 4 Aug 2017 12:12:30 +0000 (14:12 +0200)]
mlxsw: spectrum_switchdev: Release multicast groups during fini
Each multicast group (MID) stores a bitmap of ports to which a packet
should be forwarded to in case an MDB entry associated with the MID is
hit.
Since the initial introduction of IGMP snooping in commit
3a49b4fde2a1
("mlxsw: Adding layer 2 multicast support") the driver didn't correctly
free these multicast groups upon ungraceful situations such as the
removal of the upper bridge device or module removal.
The correct way to fix this is to associate each MID with the bridge
ports member in it and then drop the reference in case the bridge port
is destroyed, but this will result in a lot more code and will be fixed
in net-next.
For now, upon module removal, traverse the MID list and release each
one.
Fixes: 3a49b4fde2a1 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 4 Aug 2017 12:12:29 +0000 (14:12 +0200)]
mlxsw: spectrum_switchdev: Don't warn about valid situations
Some operations in the bridge driver such as MDB deletion are preformed
in an atomic context and thus deferred to a process context by the
switchdev infrastructure.
Therefore, by the time the operation is performed by the underlying
device driver it's possible the bridge port context is already gone.
This is especially true for removal flows, but theoretically can also be
invoked during addition.
Remove the warnings in such situations and return normally.
Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port attributes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Aug 2017 22:38:31 +0000 (15:38 -0700)]
Merge branch 'tcp-xmit-timer-rearming'
Neal Cardwell says:
====================
tcp: fix xmit timer rearming to avoid stalls
This patch series is a bug fix for a TCP loss recovery performance bug
reported independently in recent netdev threads:
(i) July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
"[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
outstanding TLP retransmission"
Many thanks to Klavs Klavsen and Mao Wenan for the detailed reports,
traces, and packetdrill test cases, which enabled us to root-cause
this issue and verify the fix.
- v1 -> v2:
- In patch 2/3, changed an unclear comment in the pre-existing code
in tcp_schedule_loss_probe() to be more clear (thanks to Eric Dumazet
for suggesting we improve this).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 3 Aug 2017 13:19:54 +0000 (09:19 -0400)]
tcp: fix xmit timer to only be reset if data ACKed/SACKed
Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:
(i) July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
"[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
outstanding TLP retransmission"
The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:
- Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
could cause TCP to repeatedly schedule TLPs forever. We kept
sending TLPs after every ~200ms, which elicited bogus SACKs, which
caused more TLPs, ad infinitum; we never fired an RTO to fill in
the holes.
- Incoming data segments could, in some cases, cause us to reschedule
our RTO or TLP timer further out in time, for no good reason. This
could cause repeated inbound data to result in stalls in outbound
data, in the presence of packet loss.
This commit fixes these bugs by changing the TLP and RTO ACK
processing to:
(a) Only reschedule the xmit timer once per ACK.
(b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
ACK indicates sufficient forward progress (a packet was
cumulatively ACKed, or we got a SACK for a packet that was sent
before the most recent retransmit of the write queue head).
This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.
As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:
1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
tcp_rearm_rto(sk);
2) Once in tcp_clean_rtx_queue(), to update the RTO:
if (flag & FLAG_ACKED) {
tcp_rearm_rto(sk);
3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
to TLP:
if (icsk->icsk_pending == ICSK_TIME_RETRANS)
tcp_schedule_loss_probe(sk);
This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.
This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.
This patch is a bug fix patch intended to be queued for -stable
relases.
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 3 Aug 2017 13:19:53 +0000 (09:19 -0400)]
tcp: enable xmit timer fix by having TLP use time when RTO should fire
Have tcp_schedule_loss_probe() base the TLP scheduling decision based
on when the RTO *should* fire. This is to enable the upcoming xmit
timer fix in this series, where tcp_schedule_loss_probe() cannot
assume that the last timer installed was an RTO timer (because we are
no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
ACK). So tcp_schedule_loss_probe() must independently figure out when
an RTO would want to fire.
In the new TLP implementation following in this series, we cannot
assume that icsk_timeout was set based on an RTO; after processing a
cumulative ACK the icsk_timeout we see can be from a previous TLP or
RTO. So we need to independently recalculate the RTO time (instead of
reading it out of icsk_timeout). Removing this dependency on the
nature of icsk_timeout makes things a little easier to reason about
anyway.
Note that the old and new code should be equivalent, since they are
both saying: "if the RTO is in the future, but at an earlier time than
the normal TLP time, then set the TLP timer to fire when the RTO would
have fired".
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 3 Aug 2017 13:19:52 +0000 (09:19 -0400)]
tcp: introduce tcp_rto_delta_us() helper for xmit timer fix
Pure refactor. This helper will be required in the xmit timer fix
later in the patch series. (Because the TLP logic will want to make
this calculation.)
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 3 Aug 2017 06:13:46 +0000 (14:13 +0800)]
ipv6: set rt6i_protocol properly in the route when it is installed
After commit
c2ed1880fd61 ("net: ipv6: check route protocol when
deleting routes"), ipv6 route checks rt protocol when trying to
remove a rt entry.
It introduced a side effect causing 'ip -6 route flush cache' not
to work well. When flushing caches with iproute, all route caches
get dumped from kernel then removed one by one by sending DELROUTE
requests to kernel for each cache.
The thing is iproute sends the request with the cache whose proto
is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
it. But in kernel the rt_cache protocol is still 0, which causes
the cache not to be matched and removed.
So the real reason is rt6i_protocol in the route is not set when
it is allocated. As David Ahern's suggestion, this patch is to
set rt6i_protocol properly in the route when it is installed and
remove the codes setting rtm_protocol according to rt6i_flags in
rt6_fill_node.
This is also an improvement to keep rt6i_protocol consistent with
rtm_protocol.
Fixes: c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 3 Aug 2017 06:10:46 +0000 (23:10 -0700)]
net: fix keepalive code vs TCP_FASTOPEN_CONNECT
syzkaller was able to trigger a divide by 0 in TCP stack [1]
Issue here is that keepalive timer needs to be updated to not attempt
to send a probe if the connection setup was deferred using
TCP_FASTOPEN_CONNECT socket option added in linux-4.11
[1]
divide error: 0000 [#1] SMP
CPU: 18 PID: 0 Comm: swapper/18 Not tainted
task:
ffff986f62f4b040 ti:
ffff986f62fa2000 task.ti:
ffff986f62fa2000
RIP: 0010:[<
ffffffff8409cc0d>] [<
ffffffff8409cc0d>] __tcp_select_window+0x8d/0x160
Call Trace:
<IRQ>
[<
ffffffff8409d951>] tcp_transmit_skb+0x11/0x20
[<
ffffffff8409da21>] tcp_xmit_probe_skb+0xc1/0xe0
[<
ffffffff840a0ee8>] tcp_write_wakeup+0x68/0x160
[<
ffffffff840a151b>] tcp_keepalive_timer+0x17b/0x230
[<
ffffffff83b3f799>] call_timer_fn+0x39/0xf0
[<
ffffffff83b40797>] run_timer_softirq+0x1d7/0x280
[<
ffffffff83a04ddb>] __do_softirq+0xcb/0x257
[<
ffffffff83ae03ac>] irq_exit+0x9c/0xb0
[<
ffffffff83a04c1a>] smp_apic_timer_interrupt+0x6a/0x80
[<
ffffffff83a03eaf>] apic_timer_interrupt+0x7f/0x90
<EOI>
[<
ffffffff83fed2ea>] ? cpuidle_enter_state+0x13a/0x3b0
[<
ffffffff83fed2cd>] ? cpuidle_enter_state+0x11d/0x3b0
Tested:
Following packetdrill no longer crashes the kernel
`echo 0 >/proc/sys/net/ipv4/tcp_timestamps`
// Cache warmup: send a Fast Open cookie request
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
+0 > S 0:0(0) <mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop>
+.01 < S. 123:123(0) ack 1 win 14600 <mss 1460,nop,nop,sackOK,nop,wscale 6,FO
abcd1234,nop,nop>
+0 > . 1:1(0) ack 1
+0 close(3) = 0
+0 > F. 1:1(0) ack 1
+0 < F. 1:1(0) ack 2 win 92
+0 > . 2:2(0) ack 2
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
+0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
+.01 connect(4, ..., ...) = 0
+0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
+10 close(4) = 0
`echo 1 >/proc/sys/net/ipv4/tcp_timestamps`
Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Aug 2017 16:23:11 +0000 (09:23 -0700)]
Merge tag 'batadv-net-for-davem-
20170802' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix TT sync flag inconsistency problems, which can lead to excess packets,
by Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Tue, 1 Aug 2017 20:22:32 +0000 (13:22 -0700)]
tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 1 Aug 2017 20:04:36 +0000 (15:04 -0500)]
ibmvnic: Initialize SCRQ's during login renegotiation
SCRQ resources are freed during renegotiation, but they are not
re-allocated afterwards due to some changes in the initialization
process. Fix that by re-allocating the memory after renegotation.
SCRQ's can also be freed if a server capabilities request fails.
If this were encountered during a device reset for example,
SCRQ's may not be re-allocated. This operation is not necessary
anymore so remove it.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hector Martin [Tue, 1 Aug 2017 15:45:44 +0000 (00:45 +0900)]
usb: qmi_wwan: add D-Link DWM-222 device ID
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Aug 2017 17:44:10 +0000 (10:44 -0700)]
Merge branch 'mlx4-misc-fixes'
Tariq Toukan says:
====================
mlx4 misc fixes
This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.
Patch 1 by Inbar fixes a wrong ethtool indication for Wake-on-LAN.
The other 3 patches by Jack add a missing capability description,
and fixes the off-by-1 misalignment for the following capabilities
descriptions.
Series generated against net commit:
cc75f8514db6 samples/bpf: fix bpf tunnel cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Tue, 1 Aug 2017 13:43:46 +0000 (16:43 +0300)]
net/mlx4_core: Fixes missing capability bit in flags2 capability dump
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:
QUERY_DEV_CAP_DIAG_RPRT_PER_PORT
However, it failed to introduce a corresponding entry in function
dump_dev_cap_flags2() for outputting a line in the message log
when this capability bit is set.
The change here fixes that omission.
Fixes: c7c122ed67e4 ("net/mlx4: Add diagnostic counters capability bit")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Tue, 1 Aug 2017 13:43:45 +0000 (16:43 +0300)]
net/mlx4_core: Fix namespace misalignment in QinQ VST support commit
The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:
MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP
However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay
consistent with the value used in another namespace in
function dump_dev_cap_flags2(), which is manually kept in sync.
The change here restores that consistency.
Fixes: 7c3d21c8153c ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Tue, 1 Aug 2017 13:43:44 +0000 (16:43 +0300)]
net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump
The index value in function dump_dev_cap_flags2() for outputting
"sl to vl mapping table change event support" needs to be
consistent with the value of the enumerated constant
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file
include/linux/mlx4_device.h
The change here restores that consistency.
Fixes: fd10ed8e6f42 ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets")
Reported-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inbar Karmy [Tue, 1 Aug 2017 13:43:43 +0000 (16:43 +0300)]
net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support
Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.
Tested:
As accepted, when NIC supports WoL- ethtool reports:
Supports Wake-on: g
Wake-on: d
when NIC doesn't support WoL- ethtool reports:
Supports Wake-on: d
Wake-on: d
Fixes: 14c07b1358ed ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Aug 2017 17:39:58 +0000 (10:39 -0700)]
Merge branch 'lan78xx-Fixes'
Nisar Sayed says:
====================
lan78xx: Fixes to lan78xx driver
This series of patches are for lan78xx driver.
These patches fixes potential issues associated with lan78xx driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nisar Sayed [Tue, 1 Aug 2017 10:24:33 +0000 (10:24 +0000)]
lan78xx: Fix to handle hard_header_len update
Fix to handle hard_header_len update
When ifconfig up/down sequence is initiated hard_header_len
get updated incrementally for each ifconfig up /down sequence,
this leads invalid hard_header_len, moving to lan78xx_bind
to have one time update of hard_header_len addresses the issue.
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nisar Sayed [Tue, 1 Aug 2017 10:24:17 +0000 (10:24 +0000)]
lan78xx: USB fast connect/disconnect crash fix
USB fast connect/disconnect crash fix
When USB plugged/unplugged at fast rate,
lan78xx_mdio_init() in lan78xx_bind() failing case is not handled.
Whenever lan78xx_mdio_init() failed, dev->mdiobus will be freed, however
since lan78xx_bind() not consider as error and try to proceed for
further initialization in lan78xx_probe() which leads system hung/crash.
Also when register_netdev() failed, netdev is freed without calling lan78xx_unbind().
Hence halting the failed cases right manner fixes the system crash/hung issue.
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 2 Aug 2017 03:06:07 +0000 (20:06 -0700)]
Merge branch 'net-Fix-64-bit-statistics-seqcount-init'
Florian Fainelli says:
====================
drivers: net: Fix 64-bit statistics seqcount init
This patch series fixes a bunch of drivers to have their 64-bit statistics
seqcount cookie be initialized correctly. Most of these drivers (except b44,
gtp) are probably used on 64-bit only hosts and so the lockdep splat might have
never been seen.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:13 +0000 (12:11 -0700)]
ipvlan: Fix 64-bit statistics seqcount initialization
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using the proper helper function: netdev_alloc_pcpu_stats().
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:12 +0000 (12:11 -0700)]
netvsc: Initialize 64-bit stats seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that. In commit
6c80f3fc2398 ("netvsc: report per-channel stats in
ethtool statistics") netdev_alloc_pcpu_stats() was removed in favor of
open-coding the 64-bits statistics, except that u64_stats_init() was
missed.
Fixes: 6c80f3fc2398 ("netvsc: report per-channel stats in ethtool statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:10 +0000 (12:11 -0700)]
gtp: Initialize 64-bit per-cpu stats correctly
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that by using netdev_alloc_pcpu_stats() instead of an open coded
allocation.
Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:09 +0000 (12:11 -0700)]
nfp: Initialize RX and TX ring 64-bit stats seqcounts
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:08 +0000 (12:11 -0700)]
ixgbe: Initialize 64-bit stats seqcounts
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.
Fixes: 4197aa7bb818 ("ixgbevf: provide 64 bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:07 +0000 (12:11 -0700)]
i40e: Initialize 64-bit statistics TX ring seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.
Fixes: 980e9b118642 ("i40e: Add support for 64 bit netstats")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 1 Aug 2017 19:11:06 +0000 (12:11 -0700)]
b44: Initialize 64-bit stats seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.
Fixes: eeda8585522b ("b44: add 64 bit stats")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
K. Den [Mon, 31 Jul 2017 16:05:39 +0000 (01:05 +0900)]
gue: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since
b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
K. Den [Mon, 31 Jul 2017 16:05:20 +0000 (01:05 +0900)]
vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP
In the case that GRO is turned on and the original received packet is
CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
csum-unnecessary point, which for instance could occur if the packet
comes from another Linux guest on the same Linux host, we have to do
either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
csum_start properly reset considering RCO.
However, since
b7fe10e5ebac("gro: Fix remcsum offload to deal with frags
in GRO") that barrier in such case could be skipped if GRO turned on,
hence we pass over it and the inner L4 validation mistakenly reckons
it as a bad csum.
This patch makes remcsum_offload being reset at the same time of GRO
remcsum cleanup, so as to make it work in such case as before.
Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO")
Signed-off-by: Koichiro Den <den@klaipeden.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
yujuan.qi [Mon, 31 Jul 2017 03:23:01 +0000 (11:23 +0800)]
Cipso: cipso_v4_optptr enter infinite loop
in for(),if((optlen > 0) && (optptr[1] == 0)), enter infinite loop.
Test: receive a packet which the ip length > 20 and the first byte of ip option is 0, produce this issue
Signed-off-by: yujuan.qi <yujuan.qi@mediatek.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 1 Aug 2017 22:22:56 +0000 (15:22 -0700)]
Merge branch 'ethernet-ti-cpts-fix-tx-timestamping-timeout'
Grygorii Strashko says:
====================
net: ethernet: ti: cpts: fix tx timestamping timeout
With the low Ethernet connection speed cpdma notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo. As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.
Issue was discovered using timestamping tool on am57x boards with Ethernet link
speed forced to 100M and on am335x-evm with Ethernet link speed forced to 10M.
Patch3 - This series fixes it by introducing TX SKB queue to store PTP SKBs for
which Ethernet Transmit Event hasn't been received yet and then re-check this
queue with new Ethernet Transmit Events by scheduling CPTS overflow
work more often until TX SKB queue is not empty.
Patch 1,2 - As CPTS overflow work is time critical task it important to ensure
that its scheduling is not delayed. Unfortunately, There could be significant
delay in CPTS work schedule under high system load and on -RT which could cause
CPTS misbehavior due to internal counter overflow and there is no way to tune
CPTS overflow work execution policy and priority manually. The kthread_worker
can be used instead of workqueues, as it creates separate named kthread for
each worker and its its execution policy and priority can be configured
using chrt tool. Instead of modifying CPTS driver itself it was proposed to
it was proposed to add PTP auxiliary worker to the PHC subsystem [1], so
other drivers can benefit from this feature also.
[1] https://www.spinics.net/lists/netdev/msg445392.html
changes in v4:
- fixed memleak in ptp_clock_register()
- undocumented change in cpts_find_ts() moved to separate patch (minor fix)
changes in v3:
- patch 1: added proper error handling in ptp_clock_register.
minor comments applied.
changes in v2:
- added PTP auxiliary worker to the PHC subsystem
Links
v3: https://www.spinics.net/lists/netdev/msg446058.html
v2: https://www.spinics.net/lists/netdev/msg445859.html
v1: https://www.spinics.net/lists/netdev/msg445387.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 28 Jul 2017 22:30:05 +0000 (17:30 -0500)]
net: ethernet: ti: cpts: fix fifo read in cpts_find_ts
Now the call chain
cpts_find_ts()
|- cpts_fifo_read(cpts, CPTS_EV_PUSH)
will stop reading CPTS FIFO if PUSH event is found. But this is not
expected and CPTS FIFI should be completely drained here. This is most
probably copy-paste error and it has no negative impact as CPTS_EV_PUSH
should not be present in FIFO without TS_PUSH request and
cpts_systim_read() and cpts_find_ts() synchronized by spin_lock.
Correct above by calling cpts_fifo_read() with -1 parameter, so it will
read all CPTS event from FIFO.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 28 Jul 2017 22:30:04 +0000 (17:30 -0500)]
net: ethernet: ti: cpts: fix tx timestamping timeout
With the low speed Ethernet connection CPDMA notification about packet
processing can be received before CPTS TX timestamp event, which is set
when packet actually left CPSW while cpdma notification is sent when packet
pushed in CPSW fifo. As result, when connection is slow and CPU is fast
enough TX timestamping is not working properly.
Fix it, by introducing TX SKB queue to store PTP SKBs for which Ethernet
Transmit Event hasn't been received yet and then re-check this queue
with new Ethernet Transmit Events by scheduling CPTS overflow
work more often (every 1 jiffies) until TX SKB queue is not empty.
Side effect of this change is:
- User space tools require to take into account possible delay in TX
timestamp processing (for example ptp4l works with tx_timestamp_timeout=400
under net traffic and tx_timestamp_timeout=25 in idle).
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 28 Jul 2017 22:30:03 +0000 (17:30 -0500)]
net: ethernet: ti: cpts: convert to use ptp auxiliary worker
There could be significant delay in CPTS work schedule under high system
load and on -RT which could cause CPTS misbehavior due to internal counter
overflow. Usage of own kthread_worker allows to avoid such kind of issues
and makes it possible to tune priority of CPTS kthread_worker thread on -RT
(thread name "cpts").
Hence, the CPTS driver is converted to use PTP auxiliary worker as PHC
subsystem implements such functionality in a generic way now.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grygorii Strashko [Fri, 28 Jul 2017 22:30:02 +0000 (17:30 -0500)]
ptp: introduce ptp auxiliary worker
Many PTP drivers required to perform some asynchronous or periodic work,
like periodically handling PHC counter overflow or handle delayed timestamp
for RX/TX network packets. In most of the cases, such work is implemented
using workqueues. Unfortunately, Kernel workqueues might introduce
significant delay in work scheduling under high system load and on -RT,
which could cause misbehavior of PTP drivers due to internal counter
overflow, for example, and there is no way to tune its execution policy and
priority manuallly.
Hence, The kthread_worker can be used insted of workqueues, as it create
separte named kthread for each worker and its its execution policy and
priority can be configured using chrt tool.
This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
instead of modifying each of these driver it was proposed to add PTP
auxiliary worker to the PHC subsystem.
The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
and kthread_delayed_work and introduces two new PHC subsystem APIs:
- long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
ptp_clock_info structure, which driver should assign if it require to
perform asynchronous or periodic work. Driver should return the delay of
the PTP next auxiliary work scheduling time (>=0) or negative value in case
further scheduling is not required.
- int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
allows schedule PTP auxiliary work.
The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 1 Aug 2017 05:36:42 +0000 (22:36 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Handle notifier registry failures properly in tun/tap driver, from
Tonghao Zhang.
2) Fix bpf verifier handling of subtraction bounds and add a testcase
for this, from Edward Cree.
3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.
4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
from Cong Wang.
5) Fix SElinux regression due to recent UDP optimizations, from Paolo
Abeni.
6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
paths, fix from Stefano Brivio.
7) Fix some mem leaks in dccp, from Xin Long.
8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
Bergmann.
9) Mac address length check and buffer size fixes from Cong Wang.
10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.
11) Fix return value when copy_from_user() fails in
bpf_prog_get_info_by_fd(), from Daniel Borkmann.
12) Handle PHY_HALTED properly in phy library state machine, from
Florian Fainelli.
13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.
14) Fix truesize calculation in virtio_net which led to performance
regressions, from Michael S Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
samples/bpf: fix bpf tunnel cleanup
udp6: fix jumbogram reception
ppp: Fix a scheduling-while-atomic bug in del_chan
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
virtio_net: fix truesize for mergeable buffers
mv643xx_eth: fix of_irq_to_resource() error check
MAINTAINERS: Add more files to the PHY LIBRARY section
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
tcp: avoid bogus gcc-7 array-bounds warning
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
bpf: don't indicate success when copy_from_user fails
udp6: fix socket leak on early demux
net: thunderx: Fix BGX transmit stall due to underflow
Revert "vhost: cache used event for better performance"
team: use a larger struct for mac address
net: check dev->addr_len for dev_set_mac_address()
phy: bcm-ns-usb3: fix MDIO_BUS dependency
...
William Tu [Mon, 31 Jul 2017 21:40:50 +0000 (14:40 -0700)]
samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails. In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 31 Jul 2017 14:52:36 +0000 (16:52 +0200)]
udp6: fix jumbogram reception
Since commit
67a51780aebb ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)
This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.
Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Mon, 31 Jul 2017 10:07:38 +0000 (18:07 +0800)]
ppp: Fix a scheduling-while-atomic bug in del_chan
The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
trigger this bug when __sk_free is invoked in atomic context, because of
the call path pptp_sock_destruct->del_chan->synchronize_rcu.
Now move the synchronize_rcu to pptp_release from del_chan. This is the
only one case which would free the sock and need the synchronize_rcu.
The following is the panic I met with kernel 3.3.8, but this issue should
exist in current kernel too according to the codes.
BUG: scheduling while atomic
__schedule_bug+0x5e/0x64
__schedule+0x55/0x580
? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
? dev_hard_start_xmit+0x423/0x530
? sch_direct_xmit+0x73/0x170
__cond_resched+0x16/0x30
_cond_resched+0x22/0x30
wait_for_common+0x18/0x110
? call_rcu_bh+0x10/0x10
wait_for_completion+0x12/0x20
wait_rcu_gp+0x34/0x40
? wait_rcu_gp+0x40/0x40
synchronize_sched+0x1e/0x20
0xf8417298
0xf8417484
? sock_queue_rcv_skb+0x109/0x130
__sk_free+0x16/0x110
? udp_queue_rcv_skb+0x1f2/0x290
sk_free+0x16/0x20
__udp4_lib_rcv+0x3b8/0x650
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 31 Jul 2017 18:05:32 +0000 (11:05 -0700)]
Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
This reverts commit
28b45910ccda ("net: bcmgenet: Remove init parameter
from bcmgenet_mii_config") because in the process of moving from
dev_info() to dev_info_once() we essentially lost the helpful printed
messages once the second instance of the driver is loaded.
dev_info_once() does not actually print the message once per device
instance, but once period.
Fixes: 28b45910ccda ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael S. Tsirkin [Mon, 31 Jul 2017 18:49:49 +0000 (21:49 +0300)]
virtio_net: fix truesize for mergeable buffers
Seth Forshee noticed a performance degradation with some workloads.
This turns out to be due to packet drops. Euan Kemp noticed that this
is because we drop all packets where length exceeds the truesize, but
for some packets we add in extra memory without updating the truesize.
This in turn was kept around unchanged from
ab7db91705e95 ("virtio-net:
auto-tune mergeable rx buffer size for improved performance"). That
commit had an internal reason not to account for the extra space: not
enough bits to do it. No longer true so let's account for the allocated
length exactly.
Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
for debugging the issue.
Fixes: 680557cf79f8 ("virtio_net: rework mergeable buffer handling")
Reported-by: Euan Kemp <euan.kemp@coreos.com>
Tested-by: Euan Kemp <euan.kemp@coreos.com>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 29 Jul 2017 19:18:41 +0000 (22:18 +0300)]
mv643xx_eth: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Marvell MV643xx Ethernet
driver still only regards 0 as invalid IRQ -- fix it up.
Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 31 Jul 2017 16:47:50 +0000 (09:47 -0700)]
MAINTAINERS: Add more files to the PHY LIBRARY section
Include missing files that are provided by, used, or directly maintained
within the PHY LIBRARY, this include uapi header, header files used by
Device Tree code etc.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 28 Jul 2017 20:27:44 +0000 (23:27 +0300)]
ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
Michał reported a NULL pointer deref during fib_sync_down_dev() when
unregistering a netdevice. The problem is that we don't check for
'in_dev' being NULL, which can happen in very specific cases.
Usually routes are flushed upon NETDEV_DOWN sent in either the netdev or
the inetaddr notification chains. However, if an interface isn't
configured with any IP address, then it's possible for host routes to be
flushed following NETDEV_UNREGISTER, after NULLing dev->ip_ptr in
inetdev_destroy().
To reproduce:
$ ip link add type dummy
$ ip route add local 1.1.1.0/24 dev dummy0
$ ip link del dev dummy0
Fix this by checking for the presence of 'in_dev' before referencing it.
Fixes: 982acb97560c ("ipv4: fib: Notify about nexthop status changes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 28 Jul 2017 18:58:36 +0000 (11:58 -0700)]
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.
Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.
Reported-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Cave-Ayland [Thu, 27 Jul 2017 16:26:00 +0000 (17:26 +0100)]
sunhme: fix up GREG_STAT and GREG_IMASK register offsets
Update the values to match those from the STP2002QFP documentation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 31 Jul 2017 21:03:05 +0000 (14:03 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Several cgroup bug fixes.
- cgroup core was calling a migration callback on empty migrations,
which could make cpuset crash.
- There was a very subtle bug where the controller interface files
aren't created directly when cgroup2 is mounted. Because later
operations create them, this bug didn't get noticed earlier.
- Failed writes to cgroup.subtree_control were incorrectly returning
zero"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix error return value from cgroup_subtree_control()
cgroup: create dfl_root files on subsys registration
cgroup: don't call migration methods if there are no tasks to migrate
Linus Torvalds [Mon, 31 Jul 2017 20:37:28 +0000 (13:37 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Two notable fixes.
- While adding NUMA affinity support to unbound workqueues, the
assumption that an unbound workqueue with max_active == 1 is
ordered was broken.
The plan was to use explicit alloc_ordered_workqueue() for those
cases. Unfortunately, I forgot to update the documentation properly
and we grew a handful of use cases which depend on that assumption.
While we want to convert them to alloc_ordered_workqueue(), we
don't really lose anything by enforcing ordered execution on
unbound max_active == 1 workqueues and it doesn't make sense to
risk subtle bugs. Restore the assumption.
- Workqueue assumes that CPU <-> NUMA node mapping remains static.
This is a general assumption - we don't have any synchronization
mechanism around CPU <-> node mapping. Unfortunately, powerpc may
change the mapping dynamically leading to crashes. Michael added a
workaround so that we at least don't crash while powerpc hotplug
code gets updated"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Work around edge cases for calc of pool's cpumask
workqueue: implicit ordered attribute should be overridable
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
Linus Torvalds [Mon, 31 Jul 2017 20:33:21 +0000 (13:33 -0700)]
Merge branch 'for-4.13-fixes' of git://git./linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Dan found a really old bug where libata hotplug code wasn't sanitizing
index value from userland and may end up indexing with a negative
number. It is scary but fortunately can only be triggered by root.
Other than that, minor fixes"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: fix a couple of doc build warnings
libata: array underflow in ata_find_dev()
ata: sata_rcar: add gen[23] fallback compatibility strings
libata: remove unused rc in ata_eh_handle_port_resume
libata: Cleanup ata_read_log_page()
ata: fix gemini Kconfig dependencies
Jonathan Corbet [Sun, 30 Jul 2017 22:16:04 +0000 (16:16 -0600)]
libata: fix a couple of doc build warnings
The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:
./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'
Update the comments and make the warnings go away.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Lüssing [Thu, 6 Jul 2017 05:02:25 +0000 (07:02 +0200)]
batman-adv: fix TT sync flag inconsistencies
This patch fixes an issue in the translation table code potentially
leading to a TT Request + Response storm. The issue may occur for nodes
involving BLA and an inconsistent configuration of the batman-adv AP
isolation feature. However, since the new multicast optimizations, a
single, malformed packet may lead to a mesh-wide, persistent
Denial-of-Service, too.
The issue occurs because nodes are currently OR-ing the TT sync flags of
all originators announcing a specific MAC address via the
translation table. When an intermediate node now receives a TT Request
and wants to answer this on behalf of the destination node, then this
intermediate node now responds with an altered flag field and broken
CRC. The next OGM of the real destination will lead to a CRC mismatch
and triggering a TT Request and Response again.
Furthermore, the OR-ing is currently never undone as long as at least
one originator announcing the according MAC address remains, leading to
the potential persistency of this issue.
This patch fixes this issue by storing the flags used in the CRC
calculation on a a per TT orig entry basis to be able to respond with
the correct, original flags in an intermediate TT Response for one
thing. And to be able to correctly unset sync flags once all nodes
announcing a sync flag vanish for another.
Fixes: e9c00136a475 ("batman-adv: fix tt_global_entries flags update")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <a@unstable.cc>
[sw: typo in commit message]
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Linus Torvalds [Sun, 30 Jul 2017 19:40:36 +0000 (12:40 -0700)]
Linux 4.13-rc3
Linus Torvalds [Sun, 30 Jul 2017 19:19:35 +0000 (12:19 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of x86 fixes:
- prevent the kernel from using the EFI reboot method when EFI is
disabled.
- two patches addressing clang issues"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Disable the address-of-packed-member compiler warning
x86/efi: Fix reboot_mode when EFI runtime services are disabled
x86/boot: #undef memcpy() et al in string.c
Linus Torvalds [Sun, 30 Jul 2017 18:54:08 +0000 (11:54 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two patches addressing build warnings caused by inconsistent kernel
doc comments"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/wait: Clean up some documentation warnings
sched/core: Fix some documentation build warnings
Linus Torvalds [Sun, 30 Jul 2017 18:52:15 +0000 (11:52 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A couple of fixes for performance counters and kprobes:
- a series of small patches which make the uncore performance
counters on Skylake server systems work correctly
- add a missing instruction slot release to the failure path of
kprobes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Release insn_slot in failure path
perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
perf/x86/intel/uncore: Fix SKX CHA event extra regs
perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
Linus Torvalds [Sun, 30 Jul 2017 18:27:33 +0000 (11:27 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"Fix for a regression caused by the conversion of x86 to the generic
hotplug code.
Instead of doing a plain single line revert, this adds a pile of
comments so the semantics of the force argument are clear"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
Daniel Borkmann [Fri, 28 Jul 2017 15:05:25 +0000 (17:05 +0200)]
bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
bpf_prog_size(prog->len) is not the correct length we want to dump
back to user space. The code in bpf_prog_get_info_by_fd() uses this
to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
includes the size of struct bpf_prog itself plus program instructions
and is usually used either in context of accounting or for bpf_prog_alloc()
et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
potentially. Use the correct bpf_prog_insn_size() instead.
Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 28 Jul 2017 14:41:37 +0000 (16:41 +0200)]
tcp: avoid bogus gcc-7 array-bounds warning
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
false-positive warning:
net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
I have opened a gcc bug for this, but distros have already shipped
compilers with this problem, and it's not clear yet whether there is
a way for gcc to avoid the warning. As the problem is related to the
bitfield access, this introduces a temporary variable to store the old
enum value.
I did not notice this warning earlier, since UBSAN is disabled when
building with COMPILE_TEST, and that was always turned on in both
allmodconfig and randconfig tests.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 22:30:08 +0000 (15:30 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-07-28' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.13
Two fixes for for brcmfmac, the crash was reported by two people
already so it's a high priority fix.
brcmfmac
* fix a crash in skb headroom handling in v4.13-rc1
* fix a memory leak due to a merge error in v4.6
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 27 Jul 2017 22:15:09 +0000 (23:15 +0100)]
net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
Trivial fix to spelling mistake in printk message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 27 Jul 2017 19:02:46 +0000 (21:02 +0200)]
bpf: don't indicate success when copy_from_user fails
err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior
check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so
user space can be notified of buggy behavior.
Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 27 Jul 2017 12:45:09 +0000 (14:45 +0200)]
udp6: fix socket leak on early demux
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.
In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.
Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.
Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".
The newly added code is derived from the current ipv4 code for the
similar path.
v1 -> v2:
fixed the __udp6_lib_rcv() return code for resubmission,
as suggested by Eric
Reported-by: Sam Edwards <CFSworks@gmail.com>
Reported-by: Marc Haber <mh+netdev@zugschlus.de>
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 27 Jul 2017 07:23:04 +0000 (12:53 +0530)]
net: thunderx: Fix BGX transmit stall due to underflow
For SGMII/RGMII/QSGMII interfaces when physical link goes down
while traffic is high is resulting in underflow condition being set
on that specific BGX's LMAC. Which assets a backpresure and VNIC stops
transmitting packets.
This is due to BGX being disabled in link status change callback while
packet is in transit. This patch fixes this issue by not disabling BGX
but instead just disables packet Rx and Tx.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 27 Jul 2017 03:22:05 +0000 (11:22 +0800)]
Revert "vhost: cache used event for better performance"
This reverts commit
809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. The assumption was that guest won't move
the event idx back, but this could happen in fact when 16 bit index
wraps around after 64K entries.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Jul 2017 18:26:45 +0000 (11:26 -0700)]
Merge tag 'mlx5-fixes-2017-07-27-V2' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-07-27
This series contains some misc fixes to the mlx5 driver.
Please pull and let me know if there's any problem.
V1->V2:
- removed redundant braces
for -stable:
4.7
net/mlx5: Fix command bad flow on command entry allocation failure
4.9
net/mlx5: Consider tx_enabled in all modes on remap
net/mlx5e: Fix outer_header_zero() check size
4.10
net/mlx5: Fix mlx5_add_flow_rules call with correct num of dests
4.11
net/mlx5: Fix mlx5_ifc_mtpps_reg_bits structure size
net/mlx5e: Add field select to MTPPS register
net/mlx5e: Fix broken disable 1PPS flow
net/mlx5e: Change 1PPS out scheme
net/mlx5e: Add missing support for PTP_CLK_REQ_PPS request
net/mlx5e: Fix wrong delay calculation for overflow check scheduling
net/mlx5e: Schedule overflow check work to mlx5e workqueue
4.12
net/mlx5: Fix command completion after timeout access invalid structure
net/mlx5e: IPoIB, Modify add/remove underlay QPN flows
I hope this is not too much, but most of the patches do apply cleanly on -stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 26 Jul 2017 22:22:07 +0000 (15:22 -0700)]
team: use a larger struct for mac address
IPv6 tunnels use sizeof(struct in6_addr) as dev->addr_len,
but in many places especially bonding, we use struct sockaddr
to copy and set mac addr, this could lead to stack out-of-bounds
access.
Fix it by using a larger address storage like bonding.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 26 Jul 2017 22:22:06 +0000 (15:22 -0700)]
net: check dev->addr_len for dev_set_mac_address()
Historically, dev_ifsioc() uses struct sockaddr as mac
address definition, this is why dev_set_mac_address()
accepts a struct sockaddr pointer as input but now we
have various types of mac addresse whose lengths
are up to MAX_ADDR_LEN, longer than struct sockaddr,
and saved in dev->addr_len.
It is too late to fix dev_ifsioc() due to API
compatibility, so just reject those larger than
sizeof(struct sockaddr), otherwise we would read
and use some random bytes from kernel stack.
Fortunately, only a few IPv6 tunnel devices have addr_len
larger than sizeof(struct sockaddr) and they don't support
ndo_set_mac_addr(). But with team driver, in lb mode, they
can still be enslaved to a team master and make its mac addr
length as the same.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 29 Jul 2017 00:21:41 +0000 (17:21 -0700)]
Merge tag 'devicetree-fixes-for-4.13' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
"Two small DT fixes:
- Fix error handling in of_irq_to_resource_table() due to
of_irq_to_resource() error return changes.
- Fix dtx_diff script due to dts include path changes"
* tag 'devicetree-fixes-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: irq: fix of_irq_to_resource() error check
scripts/dtc: dtx_diff - update include dts paths to match build
Linus Torvalds [Fri, 28 Jul 2017 21:44:56 +0000 (14:44 -0700)]
Merge tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"More NFS client bugfixes for 4.13.
Most of these fix locking bugs that Ben and Neil noticed, but I also
have a patch to fix one more access bug that was reported after last
week.
Stable fixes:
- Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
- Invalidate file size when taking a lock to prevent corruption
Other fixes:
- Don't excessively generate tiny writes with fallocate
- Use the raw NFS access mask in nfs4_opendata_access()"
* tag 'nfs-for-4.13-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
NFS: Optimize fallocate by refreshing mapping when needed.
NFS: invalidate file size when taking a lock.
NFS: Use raw NFS access mask in nfs4_opendata_access()
Linus Torvalds [Fri, 28 Jul 2017 21:29:48 +0000 (14:29 -0700)]
Merge tag 'xfs-4.13-fixes-2' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- fix firstfsb variables that we left uninitialized, which could lead
to locking problems.
- check for NULL metadata buffer pointers before using them.
- don't allow btree cursor manipulation if the btree block is corrupt.
Better to just shut down.
- fix infinite loop problems in quotacheck.
- fix buffer overrun when validating directory blocks.
- fix deadlock problem in bunmapi.
* tag 'xfs-4.13-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix multi-AG deadlock in xfs_bunmapi
xfs: check that dir block entries don't off the end of the buffer
xfs: fix quotacheck dquot id overflow infinite loop
xfs: check _alloc_read_agf buffer pointer before using
xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write
xfs: check _btree_check_block value
Linus Torvalds [Fri, 28 Jul 2017 20:36:56 +0000 (13:36 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
Linus Torvalds [Fri, 28 Jul 2017 20:35:12 +0000 (13:35 -0700)]
Merge tag 'for-linus-4.13b-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Three minor cleanups for xen related drivers"
* tag 'for-linus-4.13b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: dont fiddle with event channel masking in suspend/resume
xen: selfballoon: remove unnecessary static in frontswap_selfshrink()
xen: Drop un-informative message during boot
Linus Torvalds [Fri, 28 Jul 2017 20:29:36 +0000 (13:29 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'd been collecting these whilst we debugged a CPU hotplug failure,
but we ended up diagnosing that one to tglx, who has taken a fix via
the -tip tree separately.
We're seeing some NFS issues that we haven't gotten to the bottom of
yet, and we've uncovered some issues with our backtracing too so there
might be another fixes pull before we're done.
Summary:
- Ensure we have a guard page after the kernel image in vmalloc
- Fix incorrect prefetch stride in copy_page
- Ensure irqs are disabled in die()
- Fix for event group validation in QCOM L2 PMU driver
- Fix requesting of PMU IRQs on AMD Seattle
- Minor cleanups and fixes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mmu: Place guard page after mapping of kernel image
drivers/perf: arm_pmu: Request PMU SPIs with IRQF_PER_CPU
arm64: sysreg: Fix unprotected macro argmuent in write_sysreg
perf: qcom_l2: fix column exclusion check
arm64/lib: copy_page: use consistent prefetch stride
arm64/numa: Drop duplicate message
perf: Convert to using %pOF instead of full_name
arm64: Convert to using %pOF instead of full_name
arm64: traps: disable irq in die()
arm64: atomics: Remove '&' from '+&' asm constraint in lse atomics
arm64: uaccess: Remove redundant __force from addr cast in __range_ok
Linus Torvalds [Fri, 28 Jul 2017 20:25:15 +0000 (13:25 -0700)]
Merge tag 'powerpc-4.13-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The highlight is Ben's patch to work around a host killing bug when
running KVM guests with the Radix MMU on Power9. See the long change
log of that commit for more detail.
And then three fairly minor fixes:
- fix of_node_put() underflow during reconfig remove, using old DLPAR
tools.
- fix recently introduced ld version check with 64-bit LE-only
toolchain.
- free the subpage_prot_table correctly, avoiding a memory leak.
Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier"
* tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Free the subpage_prot_table correctly
powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
powerpc/pseries: Fix of_node_put() underflow during reconfig remove
powerpc/mm/radix: Workaround prefetch issue with KVM
Benjamin Coddington [Fri, 28 Jul 2017 16:33:54 +0000 (12:33 -0400)]
NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter
nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.
Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.
Fixes: a1d617d8f134 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Fri, 28 Jul 2017 19:31:49 +0000 (12:31 -0700)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- remove broken dt bindings in inside-secure
- fix authencesn crash when used with digest_null
- fix cavium/nitrox firmware path
- fix SHA3 failure in brcm
- fix Kconfig dependency for brcm
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: authencesn - Fix digest_null crash
crypto: brcm - remove BCM_PDC_MBOX dependency in Kconfig
Documentation/bindings: crypto: remove the dma-mask property
crypto: inside-secure - do not parse the dma mask from dt
crypto: cavium/nitrox - Change in firmware path.
crypto: brcm - Fix SHA3-512 algorithm failure
Linus Torvalds [Fri, 28 Jul 2017 19:26:59 +0000 (12:26 -0700)]
Merge branch 'for-4.13-part3' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Fixes addressing problems reported by users, and there's one more
regression fix"
* 'for-4.13-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: round down size diff when shrinking/growing device
Btrfs: fix early ENOSPC due to delalloc
btrfs: fix lockup in find_free_extent with read-only block groups
Btrfs: fix dir item validation when replaying xattr deletes
Linus Torvalds [Fri, 28 Jul 2017 19:24:21 +0000 (12:24 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This fixes several bugs, three of them are marked for stable:
- an initialization issue fixed by Ming
- a bio clone race issue fixed by me
- an async tx flush issue fixed by Ofer
- other cleanups"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
MD: fix warnning for UP case
md/raid5: add thread_group worker async_tx_issue_pending_all
md: simplify code with bio_io_error
md/raid1: fix writebehind bio clone
md: raid1-10: move raid1/raid10 common code into raid1-10.c
md: raid1/raid10: initialize bvec table via bio_add_page()
md: remove 'idx' from 'struct resync_pages'
Linus Torvalds [Fri, 28 Jul 2017 19:17:17 +0000 (12:17 -0700)]
Merge tag 'for-4.13/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a few DM integrity fixes that improve performance. One that address
inefficiencies in the on-disk journal device layout. Another that
makes use of the block layer's on-stack plugging when writing the
journal.
- a dm-bufio fix for the blk_status_t conversion that went in during
the merge window.
- a few DM raid fixes that address correctness when suspending the
device and a validation fix for validation that occurs during device
activation.
- a couple DM zoned target fixes. Important one being the fix to not
use GFP_KERNEL in the IO path due to concerns about deadlock in
low-memory conditions (e.g. swap over a DM zoned device, etc).
- a DM DAX device fix to make sure dm_dax_flush() is called if the
underlying DAX device is operating as a write cache.
* tag 'for-4.13/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm, dax: Make sure dm_dax_flush() is called if device supports it
dm verity fec: fix GFP flags used with mempool_alloc()
dm zoned: use GFP_NOIO in I/O path
dm zoned: remove test for impossible REQ_OP_FLUSH conditions
dm raid: bump target version
dm raid: avoid mddev->suspended access
dm raid: fix activation check in validate_raid_redundancy()
dm raid: remove WARN_ON() in raid10_md_layout_to_format()
dm bufio: fix error code in dm_bufio_write_dirty_buffers()
dm integrity: test for corrupted disk format during table load
dm integrity: WARN_ON if variables representing journal usage get out of sync
dm integrity: use plugging when writing the journal
dm integrity: fix inefficient allocation of journal space
Linus Torvalds [Fri, 28 Jul 2017 19:13:34 +0000 (12:13 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A small collection of fixes that should go into this series. This
contains:
- NVMe pull request from Christoph, with various fixes for nvme
proper and nvme-fc.
- disable runtime PM for blk-mq for now.
With scsi now defaulting to using blk-mq, this reared its head as
an issue. Longer term we'll fix up runtime PM for blk-mq, for now
just disable it to prevent a hang on laptop resume for some folks.
- blk-mq CPU <-> hw queue map fix from Christoph.
- xen/blkfront pull request from Konrad, with two small fixes for the
blkfront driver.
- a few fixups for nbd from Joseph.
- a stable fix for pblk from Javier"
* 'for-linus' of git://git.kernel.dk/linux-block:
lightnvm: pblk: advance bio according to lba index
nvme: validate admin queue before unquiesce
nbd: clear disconnected on reconnect
nvme-pci: fix HMB size calculation
nvme-fc: revise TRADDR parsing
nvme-fc: address target disconnect race conditions in fcp io submit
nvme: fabrics commands should use the fctype field for data direction
nvme: also provide a UUID in the WWID sysfs attribute
xen/blkfront: always allocate grants first from per-queue persistent grants
xen-blkfront: fix mq start/stop race
blk-mq: map queues to all present CPUs
block: disable runtime-pm for blk-mq
xen-blkfront: Fix handling of non-supported operations
nbd: only set sndtimeo if we have a timeout set
nbd: take tx_lock before disconnecting
nbd: allow multiple disconnects to be sent
Linus Torvalds [Fri, 28 Jul 2017 19:04:36 +0000 (12:04 -0700)]
Merge tag 'mmc-v4.13-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here are a couple of mmc fixes intended for v4.13-rc1.
I have also included a couple of cleanup patches in this pull request
for OMAP2+, related to the omap_hsmmc driver. The reason is because of
the changes are also depending on OMAP SoC specific code, so this
simplifies how to deal with this.
Summary:
MMC host:
- sunxi: Correct time phase settings
- omap_hsmmc: Clean up some dead code
- dw_mmc: Fix message printed for deprecated num-slots DT binding
- dw_mmc: Fix DT documentation"
* tag 'mmc-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
Documentation: dw-mshc: deprecate num-slots
mmc: dw_mmc: fix the wrong condition check of getting num-slots from DT
mmc: host: omap_hsmmc: remove unused platform callbacks
ARM: OMAP2+: hsmmc.c: Remove dead code
mmc: sunxi: Keep default timing phase settings for new timing mode
Michael Bringmann [Thu, 27 Jul 2017 21:27:14 +0000 (16:27 -0500)]
workqueue: Work around edge cases for calc of pool's cpumask
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static. This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory. PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.
Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules. In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.
This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools. It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code. It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.
[With additions to the patch provided by Tejun Hao <tj@kernel.org>]
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Javier González [Fri, 28 Jul 2017 13:13:16 +0000 (15:13 +0200)]
lightnvm: pblk: advance bio according to lba index
When a lba either hits the cache or corresponds to an empty entry in the
L2P table, we need to advance the bio according to the position in which
the lba is located. Otherwise, we will copy data in the wrong page, thus
causing data corruption for the application.
In case of a cache hit, we assumed that bio->bi_iter.bi_idx would
contain the correct index, but this is no necessarily true. Instead, use
the local bio advance counter and iterator. This guarantees that lbas
hitting the cache are copied into the right bv_page.
In case of an empty L2P entry, we omitted to advance the bio. In the
cases when the same I/O also contains a cache hit, data corresponding
to this lba will be copied to the wrong bv_page. Fix this by advancing
the bio as we do in the case of a cache hit.
Fixes: a4bd217b4326 lightnvm: physical block device (pblk) target
Signed-off-by: Javier González <javier@javigon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Will Deacon [Mon, 24 Jul 2017 10:46:09 +0000 (11:46 +0100)]
arm64: mmu: Place guard page after mapping of kernel image
The vast majority of virtual allocations in the vmalloc region are followed
by a guard page, which can help to avoid overruning on vma into another,
which may map a read-sensitive device.
This patch adds a guard page to the end of the kernel image mapping (i.e.
following the data/bss segments).
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>