Baruch Siach [Tue, 21 Apr 2020 09:04:46 +0000 (12:04 +0300)]
net: phy: marvell10g: limit soft reset to 88x3310
The MV_V2_PORT_CTRL_SWRST bit in MV_V2_PORT_CTRL is reserved on
88E2110.
Setting SWRST on
88E2110 breaks packets transfer after interface down/up
cycle.
Fixes: 8f48c2ac85ed ("net: marvell10g: soft-reset the PHY when coming out of low power")
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 22 Apr 2020 21:40:20 +0000 (15:40 -0600)]
ipv4: Update fib_select_default to handle nexthop objects
A user reported [0] hitting the WARN_ON in fib_info_nh:
[ 8633.839816] ------------[ cut here ]------------
[ 8633.839819] WARNING: CPU: 0 PID: 1719 at include/net/nexthop.h:251 fib_select_path+0x303/0x381
...
[ 8633.839846] RIP: 0010:fib_select_path+0x303/0x381
...
[ 8633.839848] RSP: 0018:
ffffb04d407f7d00 EFLAGS:
00010286
[ 8633.839850] RAX:
0000000000000000 RBX:
ffff9460b9897ee8 RCX:
00000000000000fe
[ 8633.839851] RDX:
0000000000000000 RSI:
00000000ffffffff RDI:
0000000000000000
[ 8633.839852] RBP:
ffff946076049850 R08:
0000000059263a83 R09:
ffff9460840e4000
[ 8633.839853] R10:
0000000000000014 R11:
0000000000000000 R12:
ffffb04d407f7dc0
[ 8633.839854] R13:
ffffffffa4ce3240 R14:
0000000000000000 R15:
ffff9460b7681f60
[ 8633.839857] FS:
00007fcac2e02700(0000) GS:
ffff9460bdc00000(0000) knlGS:
0000000000000000
[ 8633.839858] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 8633.839859] CR2:
00007f27beb77e28 CR3:
0000000077734000 CR4:
00000000000006f0
[ 8633.839867] Call Trace:
[ 8633.839871] ip_route_output_key_hash_rcu+0x421/0x890
[ 8633.839873] ip_route_output_key_hash+0x5e/0x80
[ 8633.839876] ip_route_output_flow+0x1a/0x50
[ 8633.839878] __ip4_datagram_connect+0x154/0x310
[ 8633.839880] ip4_datagram_connect+0x28/0x40
[ 8633.839882] __sys_connect+0xd6/0x100
...
The WARN_ON is triggered in fib_select_default which is invoked when
there are multiple default routes. Update the function to use
fib_info_nhc and convert the nexthop checks to use fib_nh_common.
Add test case that covers the affected code path.
[0] https://github.com/FRRouting/frr/issues/6089
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salvatore Bonaccorso [Wed, 22 Apr 2020 19:07:53 +0000 (21:07 +0200)]
netlabel: Kconfig: Update reference for NetLabel Tools project
The NetLabel Tools project has moved from http://netlabel.sf.net to a
GitHub project. Update to directly refer to the new home for the tools.
Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Wed, 22 Apr 2020 17:52:54 +0000 (20:52 +0300)]
MAINTAINERS: update dpaa2-eth maintainer list
Add myself as another maintainer of dpaa2-eth.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 22 Apr 2020 16:24:56 +0000 (18:24 +0200)]
mptcp: fix data_fin handing in RX path
The data fin flag is set only via a DSS option, but
mptcp_incoming_options() copies it unconditionally from the
provided RX options.
Since we do not clear all the mptcp sock RX options in a
socket free/alloc cycle, we can end-up with a stray data_fin
value while parsing e.g. MPC packets.
That would lead to mapping data corruption and will trigger
a few WARN_ON() in the RX path.
Instead of adding a costly memset(), fetch the data_fin flag
only for DSS packets - when we always explicitly initialize
such bit at option parsing time.
Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 21 Apr 2020 23:48:27 +0000 (17:48 -0600)]
vrf: Fix IPv6 with qdisc and xfrm
When a qdisc is attached to the VRF device, the packet goes down the ndo
xmit function which is setup to send the packet back to the VRF driver
which does a lookup to send the packet out. The lookup in the VRF driver
is not considering xfrm policies. Change it to use ip6_dst_lookup_flow
rather than ip6_route_output.
Fixes: 35402e313663 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 21 Apr 2020 20:34:48 +0000 (13:34 -0700)]
Documentation: add documentation of ping_group_range
Support for non-root users to send ICMP ECHO requests was added
back in Linux 3.0 kernel, but the documentation for the sysctl
to enable it has been missing.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Apr 2020 02:27:40 +0000 (19:27 -0700)]
Merge branch 'sctp-fixes'
Jere Leppänen says:
====================
sctp: Fix problems with peer restart when in SHUTDOWN-PENDING state and socket is closed
These patches are related to the scenario described in commit
bdf6fa52f01b ("sctp: handle association restarts when the socket is
closed."). To recap, when our association is in SHUTDOWN-PENDING state
and we've closed our one-to-one socket, while the peer crashes without
being detected, restarts and reconnects using the same addresses and
ports, we start association shutdown.
In this case, Cumulative TSN Ack in the SHUTDOWN that we send has
always been incorrect. Additionally, bundling of the SHUTDOWN with the
COOKIE-ACK was broken by a later commit. This series fixes both of
these issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jere Leppänen [Tue, 21 Apr 2020 19:03:42 +0000 (22:03 +0300)]
sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
When starting shutdown in sctp_sf_do_dupcook_a(), get the value for
SHUTDOWN Cumulative TSN Ack from the new association, which is
reconstructed from the cookie, instead of the old association, which
the peer doesn't have anymore.
Otherwise the SHUTDOWN is either ignored or replied to with an ABORT
by the peer because CTSN Ack doesn't match the peer's Initial TSN.
Fixes: bdf6fa52f01b ("sctp: handle association restarts when the socket is closed.")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jere Leppänen [Tue, 21 Apr 2020 19:03:41 +0000 (22:03 +0300)]
sctp: Fix bundling of SHUTDOWN with COOKIE-ACK
When we start shutdown in sctp_sf_do_dupcook_a(), we want to bundle
the SHUTDOWN with the COOKIE-ACK to ensure that the peer receives them
at the same time and in the correct order. This bundling was broken by
commit
4ff40b86262b ("sctp: set chunk transport correctly when it's a
new asoc"), which assigns a transport for the COOKIE-ACK, but not for
the SHUTDOWN.
Fix this by passing a reference to the COOKIE-ACK chunk as an argument
to sctp_sf_do_9_2_start_shutdown() and onward to
sctp_make_shutdown(). This way the SHUTDOWN chunk is assigned the same
transport as the COOKIE-ACK chunk, which allows them to be bundled.
In sctp_sf_do_9_2_start_shutdown(), the void *arg parameter was
previously unused. Now that we're taking it into use, it must be a
valid pointer to a chunk, or NULL. There is only one call site where
it's not, in sctp_sf_autoclose_timer_expire(). Fix that too.
Fixes: 4ff40b86262b ("sctp: set chunk transport correctly when it's a new asoc")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 21 Apr 2020 17:18:53 +0000 (20:18 +0300)]
net: dsa: don't fail to probe if we couldn't set the MTU
There is no reason to fail the probing of the switch if the MTU couldn't
be configured correctly (either the switch port itself, or the host
port) for whatever reason. MTU-sized traffic probably won't work, sure,
but we can still probably limp on and support some form of communication
anyway, which the users would probably appreciate more.
Fixes: bfcb813203e6 ("net: dsa: configure the MTU for switch ports")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 21 Apr 2020 17:00:28 +0000 (10:00 -0700)]
sched: etf: do not assume all sockets are full blown
skb->sk does not always point to a full blown socket,
we need to use sk_fullsock() before accessing fields which
only make sense on full socket.
BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
Read of size 1 at addr
ffff88805eb9b245 by task syz-executor.5/9630
CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
__kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
kasan_report+0x33/0x50 mm/kasan/common.c:625
report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170
__dev_xmit_skb net/core/dev.c:3710 [inline]
__dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021
neigh_hh_output include/net/neighbour.h:499 [inline]
neigh_output include/net/neighbour.h:508 [inline]
ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117
__ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143
ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176
dst_output include/net/dst.h:435 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280
tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521
tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916
inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline]
reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763
call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405
expire_timers kernel/time/timer.c:1450 [inline]
__run_timers kernel/time/timer.c:1774 [inline]
__run_timers kernel/time/timer.c:1741 [inline]
run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787
__do_softirq+0x26c/0x9f7 kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x192/0x1d0 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:546 [inline]
smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792
Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40
RSP: 0018:
ffffc90003b5f6c0 EFLAGS:
00000282 ORIG_RAX:
ffffffffffffff13
RAX:
1ffffffff10e4e55 RBX:
00000000d2f846d0 RCX:
00000000d2f846d0
RDX:
0000000012380612 RSI:
ffffffff839863ca RDI:
ffffffff887272a8
RBP:
dffffc0000000000 R08:
ffff888091d0a380 R09:
0000000000800081
R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000012
R13:
ffff8880a8ae8078 R14:
00000000c545c93e R15:
0000000000000006
cipher_crypt_one crypto/cipher.c:75 [inline]
crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82
crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830
crypto_shash_update+0xc4/0x120 crypto/shash.c:119
shash_ahash_update+0xa3/0x110 crypto/shash.c:246
crypto_ahash_update include/crypto/hash.h:547 [inline]
hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x308/0x7e0 net/socket.c:2362
___sys_sendmsg+0x100/0x170 net/socket.c:2416
__sys_sendmmsg+0x195/0x480 net/socket.c:2506
__do_sys_sendmmsg net/socket.c:2535 [inline]
__se_sys_sendmmsg net/socket.c:2532 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45c829
Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:
00007f6d9528ec78 EFLAGS:
00000246 ORIG_RAX:
0000000000000133
RAX:
ffffffffffffffda RBX:
00000000004fc080 RCX:
000000000045c829
RDX:
0000000000000001 RSI:
0000000020002640 RDI:
0000000000000004
RBP:
000000000078bf00 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
00000000ffffffff
R13:
00000000000008d7 R14:
00000000004cb7aa R15:
00007f6d9528f6d4
Fixes: 4b15c7075352 ("net/sched: Make etf report drops on error_queue")
Fixes: 25db26a91364 ("net/sched: Introduce the ETF Qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 21 Apr 2020 14:47:24 +0000 (08:47 -0600)]
selftests: Fix suppress test in fib_tests.sh
fib_tests is spewing errors:
...
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
ping: connect: Network is unreachable
Cannot open network namespace "ns1": No such file or directory
Cannot open network namespace "ns1": No such file or directory
...
Each test entry in fib_tests is supposed to do its own setup and
cleanup. Right now the $IP commands in fib_suppress_test are
failing because there is no ns1. Add the setup/cleanup and logging
expected for each test.
Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Apr 2020 19:50:41 +0000 (12:50 -0700)]
Merge branch 'net-dsa-b53-Various-ARL-fixes'
Florian Fainelli says:
====================
net: dsa: b53: Various ARL fixes
This patch series fixes a number of short comings in the existing b53
driver ARL management logic in particular:
- we were not looking up the {MAC,VID} tuples against their VID, despite
having VLANs enabled
- the MDB entries (multicast) would lose their validity as soon as a
single port in the vector would leave the entry
- the ARL was currently under utilized because we would always place new
entries in bin index #1, instead of using all possible bins available,
thus reducing the ARL effective size by 50% or 75% depending on the
switch generation
- it was possible to overwrite the ARL entries because no proper space
verification was done
This patch series addresses all of these issues.
Changes in v2:
- added a new patch to correctly flip invidual VLAN learning vs. shared
VLAN learning depending on the global VLAN state
- added Andrew's R-b tags for patches which did not change
- corrected some verbosity and minor issues in patch #4 to match caller
expectations, also avoid a variable length DECLARE_BITMAP() call
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 21 Apr 2020 03:26:55 +0000 (20:26 -0700)]
net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL
Flip the IVL_SVL_SELECT bit correctly based on the VLAN enable status,
the default is to perform Shared VLAN learning instead of Individual
learning.
Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 21 Apr 2020 03:26:54 +0000 (20:26 -0700)]
net: dsa: b53: Rework ARL bin logic
When asking the ARL to read a MAC address, we will get a number of bins
returned in a single read. Out of those bins, there can essentially be 3
states:
- all bins are full, we have no space left, and we can either replace an
existing address or return that full condition
- the MAC address was found, then we need to return its bin index and
modify that one, and only that one
- the MAC address was not found and we have a least one bin free, we use
that bin index location then
The code would unfortunately fail on all counts.
Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 21 Apr 2020 03:26:53 +0000 (20:26 -0700)]
net: dsa: b53: Fix ARL register definitions
The ARL {MAC,VID} tuple and the forward entry were off by 0x10 bytes,
which means that when we read/wrote from/to ARL bin index 0, we were
actually accessing the ARLA_RWCTRL register.
Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 21 Apr 2020 03:26:52 +0000 (20:26 -0700)]
net: dsa: b53: Fix valid setting for MDB entries
When support for the MDB entries was added, the valid bit was correctly
changed to be assigned depending on the remaining port bitmask, that is,
if there were no more ports added to the entry's port bitmask, the entry
now becomes invalid. There was another assignment a few lines below that
would override this which would invalidate entries even when there were
still multiple ports left in the MDB entry.
Fixes: 5d65b64a3d97 ("net: dsa: b53: Add support for MDB")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 21 Apr 2020 03:26:51 +0000 (20:26 -0700)]
net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled
When VLAN is enabled, and an ARL search is issued, we also need to
compare the full {MAC,VID} tuple before returning a successful search
result.
Fixes: 1da6df85c6fb ("net: dsa: b53: Implement ARL add/del/dump operations")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Apr 2020 19:32:11 +0000 (12:32 -0700)]
Merge branch 'vrf-looping'
David Ahern says:
====================
net: Fix looping with vrf, xfrms and qdisc on VRF
Trev reported that use of VRFs with xfrms is looping when a qdisc
is added to the VRF device. The combination of xfrm + qdisc is not
handled by the VRF driver which lost track that it has already
seen the packet.
The XFRM_TRANSFORMED flag is used by the netfilter code for a similar
purpose, so re-use for VRF. Patch 1 drops the #ifdef around setting
the flag in the xfrm output functions. Patch 2 adds a check to
the VRF driver for flag; if set the packet has already passed through
the VRF driver once and does not need to recirculated a second time.
This is a day 1 bug with VRFs; stable wise, I would only take this
back to 4.14. I have a set of test cases which I will submit to
net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 20 Apr 2020 23:13:52 +0000 (17:13 -0600)]
vrf: Check skb for XFRM_TRANSFORMED flag
To avoid a loop with qdiscs and xfrms, check if the skb has already gone
through the qdisc attached to the VRF device and then to the xfrm layer.
If so, no need for a second redirect.
Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
Reported-by: Trev Larock <trev@larock.ca>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 20 Apr 2020 23:13:51 +0000 (17:13 -0600)]
xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish
IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by
xfrm code to tell other skb handlers that the packet has been passed
through the xfrm output functions. Simplify the code and just always
set them rather than conditionally based on netfilter enabled thus
making the flag available for other users.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Mon, 20 Apr 2020 18:25:07 +0000 (11:25 -0700)]
ipv6: ndisc: RFC-ietf-6man-ra-pref64-09 is now published as RFC8781
See:
https://www.rfc-editor.org/authors/rfc8781.txt
Cc: Erik Kline <ek@google.com>
Cc: Jen Linkova <furry@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Michael Haro <mharo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Fixes: c24a77edc9a7 ("ipv6: ndisc: add support for 'PREF64' dns64 prefix identifier")
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuiko Oshino [Mon, 20 Apr 2020 15:51:41 +0000 (11:51 -0400)]
net: phy: microchip_t1: add lan87xx_phy_init to initialize the lan87xx phy.
lan87xx_phy_init() initializes the lan87xx phy hardware
including its TC10 Wake-up and Sleep features.
Fixes: 3e50d2da5850 ("Add driver for Microchip LAN87XX T1 PHYs")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
v0->v1:
- Add more details in the commit message and source comments.
- Update to the latest initialization sequences.
- Add access_ereg_modify_changed().
- Fix access_ereg() to access SMI bank correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Voon Weifeng [Mon, 20 Apr 2020 15:42:52 +0000 (23:42 +0800)]
net: stmmac: Enable SERDES power up/down sequence
This patch is to enable Intel SERDES power up/down sequence. The SERDES
converts 8/10 bits data to SGMII signal. Below is an example of
HW configuration for SGMII mode. The SERDES is located in the PHY IF
in the diagram below.
<-----------------GBE Controller---------->|<--External PHY chip-->
+----------+ +----+ +---+ +----------+
| EQoS | <-GMII->| DW | < ------ > |PHY| <-SGMII-> | External |
| MAC | |xPCS| |IF | | PHY |
+----------+ +----+ +---+ +----------+
^ ^ ^ ^
| | | |
+---------------------MDIO-------------------------+
PHY IF configuration and status registers are accessible through
mdio address 0x15 which is defined as mdio_adhoc_addr. During D0,
The driver will need to power up PHY IF by changing the power state
to P0. Likewise, for D3, the driver sets PHY IF power state to P3.
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Mon, 20 Apr 2020 15:06:41 +0000 (23:06 +0800)]
net: broadcom: convert to devm_platform_ioremap_resource_byname()
Use the function devm_platform_ioremap_resource_byname() to simplify
source code which calls the functions platform_get_resource_byname()
and devm_ioremap_resource(). Remove also a few error messages which
became unnecessary with this software refactoring.
Suggested-by: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Mon, 20 Apr 2020 13:29:40 +0000 (13:29 +0000)]
macvlan: fix null dereference in macvlan_device_event()
In the macvlan_device_event(), the list_first_entry_or_null() is used.
This function could return null pointer if there is no node.
But, the macvlan module doesn't check the null pointer.
So, null-ptr-deref would occur.
bond0
|
+----+-----+
| |
macvlan0 macvlan1
| |
dummy0 dummy1
The problem scenario.
If dummy1 is removed,
1. ->dellink() of dummy1 is called.
2. NETDEV_UNREGISTER of dummy1 notification is sent to macvlan module.
3. ->dellink() of macvlan1 is called.
4. NETDEV_UNREGISTER of macvlan1 notification is sent to bond module.
5. __bond_release_one() is called and it internally calls
dev_set_mac_address().
6. dev_set_mac_address() calls the ->ndo_set_mac_address() of macvlan1,
which is macvlan_set_mac_address().
7. macvlan_set_mac_address() calls the dev_set_mac_address() with dummy1.
8. NETDEV_CHANGEADDR of dummy1 is sent to macvlan module.
9. In the macvlan_device_event(), it calls list_first_entry_or_null().
At this point, dummy1 and macvlan1 were removed.
So, list_first_entry_or_null() will return NULL.
Test commands:
ip netns add nst
ip netns exec nst ip link add bond0 type bond
for i in {0..10}
do
ip netns exec nst ip link add dummy$i type dummy
ip netns exec nst ip link add macvlan$i link dummy$i \
type macvlan mode passthru
ip netns exec nst ip link set macvlan$i master bond0
done
ip netns del nst
Splat looks like:
[ 40.585687][ T146] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEI
[ 40.587249][ T146] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 40.588342][ T146] CPU: 1 PID: 146 Comm: kworker/u8:2 Not tainted 5.7.0-rc1+ #532
[ 40.589299][ T146] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.590469][ T146] Workqueue: netns cleanup_net
[ 40.591045][ T146] RIP: 0010:macvlan_device_event+0x4e2/0x900 [macvlan]
[ 40.591905][ T146] Code: 00 00 00 00 00 fc ff df 80 3c 06 00 0f 85 45 02 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff d2
[ 40.594126][ T146] RSP: 0018:
ffff88806116f4a0 EFLAGS:
00010246
[ 40.594783][ T146] RAX:
dffffc0000000000 RBX:
0000000000000000 RCX:
0000000000000000
[ 40.595653][ T146] RDX:
0000000000000000 RSI:
ffff88806547ddd8 RDI:
ffff8880540f1360
[ 40.596495][ T146] RBP:
ffff88804011a808 R08:
fffffbfff4fb8421 R09:
fffffbfff4fb8421
[ 40.597377][ T146] R10:
ffffffffa7dc2107 R11:
0000000000000000 R12:
0000000000000008
[ 40.598186][ T146] R13:
ffff88804011a000 R14:
ffff8880540f1000 R15:
1ffff1100c22de9a
[ 40.599012][ T146] FS:
0000000000000000(0000) GS:
ffff888067800000(0000) knlGS:
0000000000000000
[ 40.600004][ T146] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 40.600665][ T146] CR2:
00005572d3a807b8 CR3:
000000005fcf4003 CR4:
00000000000606e0
[ 40.601485][ T146] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 40.602461][ T146] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 40.603443][ T146] Call Trace:
[ 40.603871][ T146] ? nf_tables_dump_setelem+0xa0/0xa0 [nf_tables]
[ 40.604587][ T146] ? macvlan_uninit+0x100/0x100 [macvlan]
[ 40.605212][ T146] ? __module_text_address+0x13/0x140
[ 40.605842][ T146] notifier_call_chain+0x90/0x160
[ 40.606477][ T146] dev_set_mac_address+0x28e/0x3f0
[ 40.607117][ T146] ? netdev_notify_peers+0xc0/0xc0
[ 40.607762][ T146] ? __module_text_address+0x13/0x140
[ 40.608440][ T146] ? notifier_call_chain+0x90/0x160
[ 40.609097][ T146] ? dev_set_mac_address+0x1f0/0x3f0
[ 40.609758][ T146] dev_set_mac_address+0x1f0/0x3f0
[ 40.610402][ T146] ? __local_bh_enable_ip+0xe9/0x1b0
[ 40.611071][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding]
[ 40.611823][ T146] ? netdev_notify_peers+0xc0/0xc0
[ 40.612461][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding]
[ 40.613213][ T146] ? bond_hw_addr_flush+0x77/0x100 [bonding]
[ 40.613963][ T146] ? __local_bh_enable_ip+0xe9/0x1b0
[ 40.614631][ T146] ? bond_time_in_interval.isra.31+0x90/0x90 [bonding]
[ 40.615484][ T146] ? __bond_release_one+0x9f0/0x12c0 [bonding]
[ 40.616230][ T146] __bond_release_one+0x9f0/0x12c0 [bonding]
[ 40.616949][ T146] ? bond_enslave+0x47c0/0x47c0 [bonding]
[ 40.617642][ T146] ? lock_downgrade+0x730/0x730
[ 40.618218][ T146] ? check_flags.part.42+0x450/0x450
[ 40.618850][ T146] ? __mutex_unlock_slowpath+0xd0/0x670
[ 40.619519][ T146] ? trace_hardirqs_on+0x30/0x180
[ 40.620117][ T146] ? wait_for_completion+0x250/0x250
[ 40.620754][ T146] bond_netdev_event+0x822/0x970 [bonding]
[ 40.621460][ T146] ? __module_text_address+0x13/0x140
[ 40.622097][ T146] notifier_call_chain+0x90/0x160
[ 40.622806][ T146] rollback_registered_many+0x660/0xcf0
[ 40.623522][ T146] ? netif_set_real_num_tx_queues+0x780/0x780
[ 40.624290][ T146] ? notifier_call_chain+0x90/0x160
[ 40.624957][ T146] ? netdev_upper_dev_unlink+0x114/0x180
[ 40.625686][ T146] ? __netdev_adjacent_dev_unlink_neighbour+0x30/0x30
[ 40.626421][ T146] ? mutex_is_locked+0x13/0x50
[ 40.627016][ T146] ? unregister_netdevice_queue+0xf2/0x240
[ 40.627663][ T146] unregister_netdevice_many.part.134+0x13/0x1b0
[ 40.628362][ T146] default_device_exit_batch+0x2d9/0x390
[ 40.628987][ T146] ? unregister_netdevice_many+0x40/0x40
[ 40.629615][ T146] ? dev_change_net_namespace+0xcb0/0xcb0
[ 40.630279][ T146] ? prepare_to_wait_exclusive+0x2e0/0x2e0
[ 40.630943][ T146] ? ops_exit_list.isra.9+0x97/0x140
[ 40.631554][ T146] cleanup_net+0x441/0x890
[ ... ]
Fixes: e289fd28176b ("macvlan: fix the problem when mac address changes for passthru mode")
Reported-by: syzbot+5035b1f9dc7ea4558d5a@syzkaller.appspotmail.com
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Yan [Mon, 20 Apr 2020 12:35:06 +0000 (20:35 +0800)]
e1000: remove unneeded conversion to bool
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:
drivers/net/ethernet/intel/e1000/e1000_main.c:1479:44-49: WARNING:
conversion to bool not needed here
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Yan [Mon, 20 Apr 2020 12:34:48 +0000 (20:34 +0800)]
i40e: Remove unneeded conversion to bool
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:
drivers/net/ethernet/intel/i40e/i40e_main.c:1614:52-57: WARNING:
conversion to bool not needed here
drivers/net/ethernet/intel/i40e/i40e_main.c:11439:52-57: WARNING:
conversion to bool not needed here
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Yan [Mon, 20 Apr 2020 12:34:31 +0000 (20:34 +0800)]
ptp: Remove unneeded conversion to bool
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:
drivers/ptp/ptp_ines.c:403:55-60: WARNING: conversion to bool not
needed here
drivers/ptp/ptp_ines.c:404:55-60: WARNING: conversion to bool not
needed here
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Mon, 20 Apr 2020 07:04:24 +0000 (09:04 +0200)]
cgroup, netclassid: remove double cond_resched
Commit
018d26fcd12a ("cgroup, netclassid: periodically release file_lock
on classid") added a second cond_resched to write_classid indirectly by
update_classid_task. Remove the one in write_classid.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Apr 2020 18:50:31 +0000 (11:50 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan.
2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 20 Apr 2020 23:17:48 +0000 (16:17 -0700)]
Merge tag 'mlx5-fixes-2020-04-20' of git://git./linux/kernel/git/saeed/linux
mlx5-fixes-2020-04-20
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Wed, 8 Apr 2020 06:51:52 +0000 (14:51 +0800)]
net/mlx5e: Get the latest values from counters in switchdev mode
In the switchdev mode, when running "cat
/sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
accessed to get the latest values. But currently this command can
not get the correct values from ppcnt.
From firmware manual, before getting the 802_3 counters, the 802_3
data layout should be set to the ppcnt register.
When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
run, before updating 802_3 data layout with ppcnt register, the
monitor counters are tested. The test result will decide the
802_3 data layout is updated or not.
Actually the monitor counters do not support to monitor rx/tx
stats of 802_3 in switchdev mode. So the rx/tx counters change
will not trigger monitor counters. So the 802_3 data layout will
not be updated in ppcnt register. Finally this command can not get
the latest values from ppcnt register with 802_3 data layout.
Fixes: 5c7e8bbb0257 ("net/mlx5e: Use monitor counters for update stats")
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 16 Apr 2020 22:04:02 +0000 (15:04 -0700)]
net/mlx5: Kconfig: convert imply usage to weak dependency
MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
MLXFW and PCI_HYPERV_INTERFACE.
This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
regardless of their config states.
Due to the changes in the cited commit below, the semantics of 'imply'
was changed to not force any restriction on the implied config.
As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
would result in undefined references, as VXLAN now would stay as 'm'.
To fix this we change MLX5_CORE to have a weak dependency on
these modules/configs and make sure they are reachable, by adding:
depend on symbol || !symbol.
For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
Fixes: def2fbffe62c ("kconfig: allow symbols implied by y to become m")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Maxim Mikityanskiy [Tue, 11 Feb 2020 14:02:35 +0000 (16:02 +0200)]
net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns
XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
ICOSQ. When the application floods the driver with wakeup requests by
calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
the XSK ICOSQ may overflow.
Multiple NOPs are not required and won't accelerate the process, so
avoid posting a second NOP if there is one already on the way. This way
we also avoid increasing the queue size (which might not help anyway).
Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Mon, 6 Apr 2020 12:47:52 +0000 (15:47 +0300)]
net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation
After allowing parallel tuple insertion, we get the following trace:
[ 5505.142249] ------------[ cut here ]------------
[ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180
[ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G OE 5.6.0+ #78
[ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017
[ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table]
[ 5505.323257] RIP: 0010:delete_node+0x16c/0x180
[ 5505.349862] RSP: 0018:
ffffb19184eb7b30 EFLAGS:
00010282
[ 5505.356785] RAX:
0000000000000000 RBX:
ffff904ac95b86d8 RCX:
ffff904b6f938838
[ 5505.365190] RDX:
0000000000000000 RSI:
ffff904ac954b908 RDI:
ffff904ac954b920
[ 5505.373628] RBP:
ffff904b4ac13060 R08:
0000000000000001 R09:
0000000000000000
[ 5505.382155] R10:
0000000000000000 R11:
0000000000000040 R12:
0000000000000000
[ 5505.390527] R13:
ffffb19184eb7bfc R14:
ffff904b6bef5800 R15:
ffff90482c1203c0
[ 5505.399246] FS:
0000000000000000(0000) GS:
ffff904c2fc80000(0000) knlGS:
0000000000000000
[ 5505.408621] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 5505.415739] CR2:
00007f5d27006010 CR3:
0000000058c10006 CR4:
00000000001626e0
[ 5505.424547] Call Trace:
[ 5505.428429] idr_alloc_u32+0x7b/0xc0
[ 5505.433803] mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core]
[ 5505.441354] ? mlx5_fc_create+0x23c/0x370 [mlx5_core]
[ 5505.448225] mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core]
[ 5505.456278] ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core]
[ 5505.464532] nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table]
[ 5505.472286] ? __kmalloc+0x217/0x2f0
[ 5505.477093] ? flow_rule_alloc+0x1c/0x30
[ 5505.482117] flow_offload_work_handler+0x1d0/0x290 [nf_flow_table]
[ 5505.489674] ? process_one_work+0x17c/0x580
[ 5505.494922] process_one_work+0x202/0x580
[ 5505.500082] ? process_one_work+0x17c/0x580
[ 5505.505696] worker_thread+0x4c/0x3f0
[ 5505.510458] kthread+0x103/0x140
[ 5505.514989] ? process_one_work+0x580/0x580
[ 5505.520616] ? kthread_bind+0x10/0x10
[ 5505.525837] ret_from_fork+0x3a/0x50
[ 5505.570841] ---[ end trace
07995de9c56d6831 ]---
This happens from parallel deletes/adds to idr, as idr isn't protected.
Fix that by using xarray as the tuple_ids allocator instead of idr.
Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Niklas Schnelle [Thu, 9 Apr 2020 07:46:20 +0000 (09:46 +0200)]
net/mlx5: Fix failing fw tracer allocation on s390
On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
allocation as done for the firmware tracer will always fail.
Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
that copies the debug data into the trace array and there is no need for
the allocation to be contiguous in physical memory. We can therefor use
kvzalloc() instead of kzalloc() and get rid of the large contiguous
allcoation.
Fixes: f53aaa31cce7 ("net/mlx5: FW tracer, implement tracer logic")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Taehee Yoo [Mon, 20 Apr 2020 15:01:33 +0000 (15:01 +0000)]
team: fix hang in team_mode_get()
When team mode is changed or set, the team_mode_get() is called to check
whether the mode module is inserted or not. If the mode module is not
inserted, it calls the request_module().
In the request_module(), it creates a child process, which is
the "modprobe" process and waits for the done of the child process.
At this point, the following locks were used.
down_read(&cb_lock()); by genl_rcv()
genl_lock(); by genl_rcv_msc()
rtnl_lock(); by team_nl_cmd_options_set()
mutex_lock(&team->lock); by team_nl_team_get()
Concurrently, the team module could be removed by rmmod or "modprobe -r"
The __exit function of team module is team_module_exit(), which calls
team_nl_fini() and it tries to acquire following locks.
down_write(&cb_lock);
genl_lock();
Because of the genl_lock() and cb_lock, this process can't be finished
earlier than request_module() routine.
The problem secenario.
CPU0 CPU1
team_mode_get
request_module()
modprobe -r team_mode_roundrobin
team <--(B)
modprobe team <--(A)
team_mode_roundrobin
By request_module(), the "modprobe team_mode_roundrobin" command
will be executed. At this point, the modprobe process will decide
that the team module should be inserted before team_mode_roundrobin.
Because the team module is being removed.
By the module infrastructure, the same module insert/remove operations
can't be executed concurrently.
So, (A) waits for (B) but (B) also waits for (A) because of locks.
So that the hang occurs at this point.
Test commands:
while :
do
teamd -d &
killall teamd &
modprobe -rv team_mode_roundrobin &
done
The approach of this patch is to hold the reference count of the team
module if the team module is compiled as a module. If the reference count
of the team module is not zero while request_module() is being called,
the team module will not be removed at that moment.
So that the above scenario could not occur.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 20 Apr 2020 19:59:33 +0000 (12:59 -0700)]
Merge branch 'mptcp-fix-races-on-accept'
Paolo Abeni says:
====================
mptcp: fix races on accept()
This series includes some fixes for accept() races which may cause inconsistent
MPTCP socket status and oops. Please see the individual patches for the
technical details.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 20 Apr 2020 14:25:06 +0000 (16:25 +0200)]
mptcp: drop req socket remote_key* fields
We don't need them, as we can use the current ingress opt
data instead. Setting them in syn_recv_sock() may causes
inconsistent mptcp socket status, as per previous commit.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 20 Apr 2020 14:25:05 +0000 (16:25 +0200)]
mptcp: avoid flipping mp_capable field in syn_recv_sock()
If multiple CPUs races on the same req_sock in syn_recv_sock(),
flipping such field can cause inconsistent child socket status.
When racing, the CPU losing the req ownership may still change
the mptcp request socket mp_capable flag while the CPU owning
the request is cloning the socket, leaving the child socket with
'is_mptcp' set but no 'mp_capable' flag.
Such socket will stay with 'conn' field cleared, heading to oops
in later mptcp callback.
Address the issue tracking the fallback status in a local variable.
Fixes: 58b09919626b ("mptcp: create msk early")
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 20 Apr 2020 14:25:04 +0000 (16:25 +0200)]
mptcp: handle mptcp listener destruction via rcu
Following splat can occur during self test:
BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
Read of size 8 at addr
ffff888100c35c28 by task mptcp_connect/4808
subflow_data_ready+0x156/0x160
tcp_child_process+0x6a3/0xb30
tcp_v4_rcv+0x2231/0x3730
ip_protocol_deliver_rcu+0x5c/0x860
ip_local_deliver_finish+0x220/0x360
ip_local_deliver+0x1c8/0x4e0
ip_rcv_finish+0x1da/0x2f0
ip_rcv+0xd0/0x3c0
__netif_receive_skb_one_core+0xf5/0x160
__netif_receive_skb+0x27/0x1c0
process_backlog+0x21e/0x780
net_rx_action+0x35f/0xe90
do_softirq+0x4c/0x50
[..]
This occurs when accessing subflow_ctx->conn.
Problem is that tcp_child_process() calls listen sockets'
sk_data_ready() notification, but it doesn't hold the listener
lock. Another cpu calling close() on the listener will then cause
transition of refcount to 0.
Fixes: 58b09919626bf ("mptcp: create msk early")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Mon, 20 Apr 2020 09:56:54 +0000 (15:26 +0530)]
cxgb4: fix large delays in PTP synchronization
Fetching PTP sync information from mailbox is slow and can take
up to 10 milliseconds. Reduce this unnecessary delay by directly
reading the information from the corresponding registers.
Fixes: 9c33e4208bce ("cxgb4: Add PTP Hardware Clock (PHC) support")
Signed-off-by: Manoj Malviya <manojmalviya@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Sat, 18 Apr 2020 18:14:57 +0000 (19:14 +0100)]
net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array
Running with KASAN on a VIM3L systems leads to the following splat
when probing the Ethernet device:
==================================================================
BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8
Read of size 4 at addr
ffffa000090615f4 by task systemd-udevd/139
CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G E
5.7.0-rc1-00101-g8624b7577b9c #781
Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020
Call trace:
dump_backtrace+0x0/0x2a0
show_stack+0x20/0x30
dump_stack+0xec/0x148
print_address_description.isra.12+0x70/0x35c
__kasan_report+0xfc/0x1d4
kasan_report+0x4c/0x68
__asan_load4+0x9c/0xd8
_get_maxdiv+0x74/0xd8
clk_divider_bestdiv+0x74/0x5e0
clk_divider_round_rate+0x80/0x1a8
clk_core_determine_round_nolock.part.9+0x9c/0xd0
clk_core_round_rate_nolock+0xf0/0x108
clk_hw_round_rate+0xac/0xf0
clk_factor_round_rate+0xb8/0xd0
clk_core_determine_round_nolock.part.9+0x9c/0xd0
clk_core_round_rate_nolock+0xf0/0x108
clk_core_round_rate_nolock+0xbc/0x108
clk_core_set_rate_nolock+0xc4/0x2e8
clk_set_rate+0x58/0xe0
meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b]
platform_drv_probe+0x78/0xd8
really_probe+0x158/0x610
driver_probe_device+0x140/0x1b0
device_driver_attach+0xa4/0xb0
__driver_attach+0xcc/0x1c8
bus_for_each_dev+0xf4/0x168
driver_attach+0x3c/0x50
bus_add_driver+0x238/0x2e8
driver_register+0xc8/0x1e8
__platform_driver_register+0x88/0x98
meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b]
do_one_initcall+0xa8/0x328
do_init_module+0xe8/0x368
load_module+0x3300/0x36b0
__do_sys_finit_module+0x120/0x1a8
__arm64_sys_finit_module+0x4c/0x60
el0_svc_common.constprop.2+0xe4/0x268
do_el0_svc+0x98/0xa8
el0_svc+0x24/0x68
el0_sync_handler+0x12c/0x318
el0_sync+0x158/0x180
The buggy address belongs to the variable:
div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]
Memory state around the buggy address:
ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00
ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
>
ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
^
ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa
ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
==================================================================
Digging into this indeed shows that the clock divider array is
lacking a final fence, and that the clock subsystems goes in the
weeds. Oh well.
Let's add the empty structure that indicates the end of the array.
Fixes: bd6f48546b9c ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Haxby [Sat, 18 Apr 2020 15:30:49 +0000 (16:30 +0100)]
ipv6: fix restrict IPV6_ADDRFORM operation
Commit
b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
problem found by syzbot an unfortunate logic error meant that it
also broke IPV6_ADDRFORM.
Rearrange the checks so that the earlier test is just one of the series
of checks made before moving the socket from IPv6 to IPv4.
Fixes: b6f6118901d1 ("ipv6: restrict IPV6_ADDRFORM operation")
Signed-off-by: John Haxby <john.haxby@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Tang Bin [Sat, 18 Apr 2020 08:51:06 +0000 (16:51 +0800)]
net: systemport: Omit superfluous error message in bcm_sysport_probe()
In the function bcm_sysport_probe(), when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant message
here.
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Thu, 16 Apr 2020 18:57:31 +0000 (02:57 +0800)]
net: openvswitch: ovs_ct_exit to be done under ovs_lock
syzbot wrote:
| =============================
| WARNING: suspicious RCU usage
| 5.7.0-rc1+ #45 Not tainted
| -----------------------------
| net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
|
| other info that might help us debug this:
| rcu_scheduler_active = 2, debug_locks = 1
| ...
|
| stack backtrace:
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Workqueue: netns cleanup_net
| Call Trace:
| ...
| ovs_ct_exit
| ovs_exit_net
| ops_exit_list.isra.7
| cleanup_net
| process_one_work
| worker_thread
To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
lockdep_ovsl_is_held as optional lockdep expression.
Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Yi-Hung Wei <yihung.wei@gmail.com>
Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hillf Danton [Sat, 18 Apr 2020 08:28:32 +0000 (16:28 +0800)]
netfilter: nat: fix error handling upon registering inet hook
A case of warning was reported by syzbot.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x35 kernel/panic.c:582
report_bug+0x27b/0x2f0 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:175 [inline]
fixup_bug arch/x86/kernel/traps.c:170 [inline]
do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
RSP: 0018:
ffffc90005487208 EFLAGS:
00010246
RAX:
0000000000040000 RBX:
0000000000000004 RCX:
ffffc9001444a000
RDX:
0000000000040000 RSI:
ffffffff865c94a2 RDI:
0000000000000005
RBP:
ffff88808b5cf000 R08:
ffff8880a2620140 R09:
fffffbfff14bcd79
R10:
ffffc90005487208 R11:
fffffbfff14bcd78 R12:
0000000000000000
R13:
0000000000000001 R14:
0000000000000001 R15:
0000000000000000
nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
___sys_sendmsg+0x100/0x170 net/socket.c:2416
__sys_sendmsg+0xec/0x1b0 net/socket.c:2449
do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x49/0xb3
and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
in case of failing to register NFPROTO_IPV4 hook.
Reported-by: syzbot <syzbot+33e06702fd6cffc24c40@syzkaller.appspotmail.com>
Fixes: d164385ec572 ("netfilter: nat: add inet family nat support")
Cc: Florian Westphal <fw@strlen.de>
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric Dumazet [Fri, 17 Apr 2020 14:10:23 +0000 (07:10 -0700)]
tcp: cache line align MAX_TCP_HEADER
TCP stack is dumb in how it cooks its output packets.
Depending on MAX_HEADER value, we might chose a bad ending point
for the headers.
If we align the end of TCP headers to cache line boundary, we
make sure to always use the smallest number of cache lines,
which always help.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Apr 2020 22:43:20 +0000 (15:43 -0700)]
Merge branch 'mptcp-fixes'
Florian Westphal says:
====================
mptcp: fix 'attempt to release socket in state...' splats
These two patches fix error handling corner-cases where
inet_sock_destruct gets called for a mptcp_sk that is not in TCP_CLOSE
state. This results in unwanted error printks from the network stack.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 17 Apr 2020 07:28:23 +0000 (09:28 +0200)]
mptcp: fix 'Attempt to release TCP socket in state' warnings
We need to set sk_state to CLOSED, else we will get following:
IPv4: Attempt to release TCP socket in state 3
00000000b95f109e
IPv4: Attempt to release TCP socket in state 10
00000000b95f109e
First one is from inet_sock_destruct(), second one from
mptcp_sk_clone failure handling. Setting sk_state to CLOSED isn't
enough, we also need to orphan sk so it has DEAD flag set.
Otherwise, a very similar warning is printed from inet_sock_destruct().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Fri, 17 Apr 2020 07:28:22 +0000 (09:28 +0200)]
mptcp: fix splat when incoming connection is never accepted before exit/close
Following snippet (replicated from syzkaller reproducer) generates
warning: "IPv4: Attempt to release TCP socket in state 1".
int main(void) {
struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20,
.sin_addr.s_addr = 0x010000e0, };
struct sockaddr_in sin2 = { .sin_family = 2,
.sin_addr.s_addr = 0x0100007f, };
struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20,
.sin_addr.s_addr = 0x0100007f, };
int r0 = socket(0x2, 0x1, 0x106);
int r1 = socket(0x2, 0x1, 0x106);
bind(r1, (void *)&sin1, sizeof(sin1));
connect(r1, (void *)&sin2, sizeof(sin2));
listen(r1, 3);
return connect(r0, (void *)&sin3, 0x4d);
}
Reason is that the newly generated mptcp socket is closed via the ulp
release of the tcp listener socket when its accept backlog gets purged.
To fix this, delay setting the ESTABLISHED state until after userspace
calls accept and via mptcp specific destructor.
Fixes: 58b09919626bf ("mptcp: create msk early")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 15 Apr 2020 16:46:52 +0000 (09:46 -0700)]
net/mlx4_en: avoid indirect call in TX completion
Commit
9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support")
brought another indirect call in fast path.
Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Wed, 15 Apr 2020 13:06:53 +0000 (09:06 -0400)]
ipv6: rpl: fix full address compression
This patch makes it impossible that cmpri or cmpre values are set to the
value 16 which is not possible, because these are 4 bit values. We
currently run in an overflow when assigning the value 16 to it.
According to the standard a value of 16 can be interpreted as a full
elided address which isn't possible to set as compression value. A reason
why this cannot be set is that the current ipv6 header destination address
should never show up inside the segments of the rpl header. In this case we
run in a overflow and the address will have no compression at all. Means
cmpri or compre is set to 0.
As we handle cmpri and cmpre sometimes as unsigned char or 4 bit value
inside the rpl header the current behaviour ends in an invalid header
format. This patch simple use the best compression method if we ever run
into the case that the destination address is showed up inside the rpl
segments. We avoid the overflow handling and the rpl header is still valid,
even when we have the destination address inside the rpl segments.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Beraud [Wed, 15 Apr 2020 12:24:32 +0000 (14:24 +0200)]
net: stmmac: Fix sub-second increment
In fine adjustement mode, which is the current default, the sub-second
increment register is the number of nanoseconds that will be added to
the clock when the accumulator overflows. At each clock cycle, the
value of the addend register is added to the accumulator.
Currently, we use 20ns = 1e09ns / 50MHz as this value whatever the
frequency of the ptp clock actually is.
The adjustment is then done on the addend register, only incrementing
every X clock cycles X being the ratio between 50MHz and ptp_clock_rate
(addend = 2^32 * 50MHz/ptp_clock_rate).
This causes the following issues :
- In case the frequency of the ptp clock is inferior or equal to 50MHz,
the addend value calculation will overflow and the default
addend value will be set to 0, causing the clock to not work at
all. (For instance, for ptp_clock_rate = 50MHz, addend = 2^32).
- The resolution of the timestamping clock is limited to 20ns while it
is not needed, thus limiting the accuracy of the timestamping to
20ns.
Fix this by setting sub-second increment to 2e09ns / ptp_clock_rate.
It will allow to reach the minimum possible frequency for
ptp_clk_ref, which is 5MHz for GMII 1000Mps Full-Duplex by setting the
sub-second-increment to a higher value. For instance, for 25MHz, it
gives ssinc = 80ns and default_addend = 2^31.
It will also allow to use a lower value for sub-second-increment, thus
improving the timestamping accuracy with frequencies higher than
100MHz, for instance, for 200MHz, ssinc = 10ns and default_addend =
2^31.
v1->v2:
- Remove modifications to the calculation of default addend, which broke
compatibility with clock frequencies for which
2000000000 / ptp_clk_freq
is not an integer.
- Modify description according to discussions.
Signed-off-by: Julien Beraud <julien.beraud@orolia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Beraud [Wed, 15 Apr 2020 12:24:31 +0000 (14:24 +0200)]
net: stmmac: fix enabling socfpga's ptp_ref_clock
There are 2 registers to write to enable a ptp ref clock coming from the
fpga.
One that enables the usage of the clock from the fpga for emac0 and emac1
as a ptp ref clock, and the other to allow signals from the fpga to reach
emac0 and emac1.
Currently, if the dwmac-socfpga has phymode set to PHY_INTERFACE_MODE_MII,
PHY_INTERFACE_MODE_GMII, or PHY_INTERFACE_MODE_SGMII, both registers will
be written and the ptp ref clock will be set as coming from the fpga.
Separate the 2 register writes to only enable signals from the fpga to
reach emac0 or emac1 when ptp ref clock is not coming from the fpga.
Signed-off-by: Julien Beraud <julien.beraud@orolia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Wed, 15 Apr 2020 08:41:20 +0000 (16:41 +0800)]
wimax/i2400m: Fix potential urb refcnt leak
i2400mu_bus_bm_wait_for_ack() invokes usb_get_urb(), which increases the
refcount of the "notif_urb".
When i2400mu_bus_bm_wait_for_ack() returns, local variable "notif_urb"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.
The issue happens in all paths of i2400mu_bus_bm_wait_for_ack(), which
forget to decrease the refcnt increased by usb_get_urb(), causing a
refcnt leak.
Fix this issue by calling usb_put_urb() before the
i2400mu_bus_bm_wait_for_ack() returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Wed, 15 Apr 2020 08:40:28 +0000 (16:40 +0800)]
tipc: Fix potential tipc_node refcnt leak in tipc_rcv
tipc_rcv() invokes tipc_node_find() twice, which returns a reference of
the specified tipc_node object to "n" with increased refcnt.
When tipc_rcv() returns or a new object is assigned to "n", the original
local reference of "n" becomes invalid, so the refcount should be
decreased to keep refcount balanced.
The issue happens in some paths of tipc_rcv(), which forget to decrease
the refcnt increased by tipc_node_find() and will cause a refcnt leak.
Fix this issue by calling tipc_node_put() before the original object
pointed by "n" becomes invalid.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Wed, 15 Apr 2020 08:39:56 +0000 (16:39 +0800)]
tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv
tipc_crypto_rcv() invokes tipc_aead_get(), which returns a reference of
the tipc_aead object to "aead" with increased refcnt.
When tipc_crypto_rcv() returns, the original local reference of "aead"
becomes invalid, so the refcount should be decreased to keep refcount
balanced.
The issue happens in one error path of tipc_crypto_rcv(). When TIPC
message decryption status is EINPROGRESS or EBUSY, the function forgets
to decrease the refcnt increased by tipc_aead_get() and causes a refcnt
leak.
Fix this issue by calling tipc_aead_put() on the error path when TIPC
message decryption status is EINPROGRESS or EBUSY.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiyu Yang [Wed, 15 Apr 2020 08:36:19 +0000 (16:36 +0800)]
net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node
nr_add_node() invokes nr_neigh_get_dev(), which returns a local
reference of the nr_neigh object to "nr_neigh" with increased refcnt.
When nr_add_node() returns, "nr_neigh" becomes invalid, so the refcount
should be decreased to keep refcount balanced.
The issue happens in one normal path of nr_add_node(), which forgets to
decrease the refcnt increased by nr_neigh_get_dev() and causes a refcnt
leak. It should decrease the refcnt before the function returns like
other normal paths do.
Fix this issue by calling nr_neigh_put() before the nr_add_node()
returns.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 16 Apr 2020 22:00:57 +0000 (15:00 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of git://git./linux/kernel/git/chrome-platform/linux
Pull chrome-platform fixes from Benson Leung:
"Two small fixes for cros_ec_sensorhub_ring.c, addressing issues
introduced in the cros_ec_sensorhub FIFO support commit"
* tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_sensorhub: Add missing '\n' in log messages
platform/chrome: cros_ec_sensorhub: Off by one in cros_sensorhub_send_sample()
Linus Torvalds [Thu, 16 Apr 2020 21:52:29 +0000 (14:52 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel.
2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet.
3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing.
4) Adjust icmp6 message source address selection when routes have a
preferred source address set, from Tim Stallard.
5) Be sure to validate HSR protocol version when creating new links,
from Taehee Yoo.
6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in
non-initial namespaces, from Michael Weiß.
7) Missing release firmware call in mlx5, from Eran Ben Elisha.
8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from
Taehee Yoo.
9) Fix pause frame negotiation in marvell phy driver, from Clemens
Gruber.
10) Record RX queue early enough in tun packet paths such that XDP
programs will see the correct RX queue index, from Gilberto Bertin.
11) Fix double unlock in mptcp, from Florian Westphal.
12) Fix offset overflow in ARM bpf JIT, from Luke Nelson.
13) marvell10g needs to soft reset PHY when coming out of low power
mode, from Russell King.
14) Fix MTU setting regression in stmmac for some chip types, from
Florian Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
amd-xgbe: Use __napi_schedule() in BH context
mISDN: make dmril and dmrim static
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
tipc: fix incorrect increasing of link window
Documentation: Fix tcp_challenge_ack_limit default value
net: tulip: make early_486_chipsets static
dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
ipv6: remove redundant assignment to variable err
net/rds: Use ERR_PTR for rds_message_alloc_sgs()
net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
selftests/bpf: Check for correct program attach/detach in xdp_attach test
libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
libbpf: Always specify expected_attach_type on program load if supported
xsk: Add missing check on user supplied headroom size
mac80211: fix channel switch trigger from unknown mesh peer
mac80211: fix race in ieee80211_register_hw()
net: marvell10g: soft-reset the PHY when coming out of low power
net: marvell10g: report firmware version
net/cxgb4: Check the return from t4_query_params properly
...
Sebastian Andrzej Siewior [Thu, 16 Apr 2020 15:57:40 +0000 (17:57 +0200)]
amd-xgbe: Use __napi_schedule() in BH context
The driver uses __napi_schedule_irqoff() which is fine as long as it is
invoked with disabled interrupts by everybody. Since the commit
mentioned below the driver may invoke xgbe_isr_task() in tasklet/softirq
context. This may lead to list corruption if another driver uses
__napi_schedule_irqoff() in IRQ context.
Use __napi_schedule() which safe to use from IRQ and softirq context.
Fixes: 85b85c853401d ("amd-xgbe: Re-issue interrupt if interrupt status not cleared")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Yan [Wed, 15 Apr 2020 08:42:26 +0000 (16:42 +0800)]
mISDN: make dmril and dmrim static
Fix the following sparse warning:
drivers/isdn/hardware/mISDN/mISDNisar.c:746:12: warning: symbol 'dmril'
was not declared. Should it be static?
drivers/isdn/hardware/mISDN/mISDNisar.c:749:12: warning: symbol 'dmrim'
was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 14 Apr 2020 22:39:52 +0000 (15:39 -0700)]
net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes
After commit
bfcb813203e619a8960a819bf533ad2a108d8105 ("net: dsa:
configure the MTU for switch ports") my Lamobo R1 platform which uses
an allwinner,sun7i-a20-gmac compatible Ethernet MAC started to fail
by rejecting a MTU of 1536. The reason for that is that the DMA
capabilities are not readable on this version of the IP, and there
is also no 'tx-fifo-depth' property being provided in Device Tree. The
property is documented as optional, and is not provided.
Chen-Yu indicated that the FIFO sizes are 4KB for TX and 16KB for RX, so
provide these values through platform data as an immediate fix until
various Device Tree sources get updated accordingly.
Fixes: eaf4fac47807 ("net: stmmac: Do not accept invalid MTU values")
Suggested-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Tue, 14 Apr 2020 06:34:08 +0000 (14:34 +0800)]
net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode
In VLAN-unaware mode, the Egress Tag (EG_TAG) field in Port VLAN
Control register must be set to Consistent to let tagged frames pass
through as is, otherwise their tags will be stripped.
Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 16 Apr 2020 17:45:47 +0000 (10:45 -0700)]
Merge tag 'selinux-pr-
20200416' of git://git./linux/kernel/git/pcmoore/selinux
Pull SELinux fix from Paul Moore:
"One small SELinux fix to ensure we cleanup properly on an error
condition"
* tag 'selinux-pr-
20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: free str on error in str_read()
Linus Torvalds [Thu, 16 Apr 2020 17:29:34 +0000 (10:29 -0700)]
Merge tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
- a set of patches for a deadlock on "rbd map" error path
- a fix for invalid pointer dereference and uninitialized variable use
on asynchronous create and unlink error paths.
* tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-client:
ceph: fix potential bad pointer deref in async dirops cb's
rbd: don't mess with a page vector in rbd_notify_op_lock()
rbd: don't test rbd_dev->opts in rbd_dev_image_release()
rbd: call rbd_dev_unprobe() after unwatching and flushing notifies
rbd: avoid a deadlock on header_rwsem when flushing notifies
Linus Torvalds [Thu, 16 Apr 2020 17:14:22 +0000 (10:14 -0700)]
Merge tag 'trace-v5.7-rc1' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"This fixes a small race between allocating a snapshot buffer and
setting the snapshot trigger.
On a slow machine, the trigger can occur before the snapshot is
allocated causing a warning to be displayed in the ring buffer, and no
snapshot triggering. Reversing the allocation and the enabling of the
trigger fixes the problem"
* tag 'trace-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
Vasily Averin [Tue, 14 Apr 2020 20:33:16 +0000 (21:33 +0100)]
keys: Fix proc_keys_next to increase position index
If seq_file .next function does not change position index,
read after some lseek can generate unexpected output:
$ dd if=/proc/keys bs=1 # full usual output
0f6bfdf5 I--Q--- 2 perm
3f010000 1000 1000 user
4af2f79ab8848d0a: 740
1fb91b32 I--Q--- 3 perm
1f3f0000 1000 65534 keyring _uid.1000: 2
27589480 I--Q--- 1 perm
0b0b0000 0 0 user invocation_id: 16
2f33ab67 I--Q--- 152 perm
3f030000 0 0 keyring _ses: 2
33f1d8fa I--Q--- 4 perm
3f030000 1000 1000 keyring _ses: 1
3d427fda I--Q--- 2 perm
3f010000 1000 1000 user
69ec44aec7678e5a: 740
3ead4096 I--Q--- 1 perm
1f3f0000 1000 65534 keyring _uid_ses.1000: 1
521+0 records in
521+0 records out
521 bytes copied, 0,
00123769 s, 421 kB/s
But a read after lseek in middle of last line results in the partial
last line and then a repeat of the final line:
$ dd if=/proc/keys bs=500 skip=1
dd: /proc/keys: cannot skip to specified offset
g _uid_ses.1000: 1
3ead4096 I--Q--- 1 perm
1f3f0000 1000 65534 keyring _uid_ses.1000: 1
0+1 records in
0+1 records out
97 bytes copied, 0,
000135035 s, 718 kB/s
and a read after lseek beyond end of file results in the last line being
shown:
$ dd if=/proc/keys bs=1000 skip=1 # read after lseek beyond end of file
dd: /proc/keys: cannot skip to specified offset
3ead4096 I--Q--- 1 perm
1f3f0000 1000 65534 keyring _uid_ses.1000: 1
0+1 records in
0+1 records out
76 bytes copied, 0,
000119981 s, 633 kB/s
See https://bugzilla.kernel.org/show_bug.cgi?id=206283
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 16 Apr 2020 00:37:48 +0000 (17:37 -0700)]
Merge tag 'efi-urgent-2020-04-15' of git://git./linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Misc EFI fixes, including the boot failure regression caused by the
BSS section not being cleared by the loaders"
* tag 'efi-urgent-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Revert struct layout change to fix kexec boot regression
efi/x86: Don't remap text<->rodata gap read-only for mixed mode
efi/x86: Fix the deletion of variables in mixed mode
efi/libstub/file: Merge file name buffers to reduce stack usage
Documentation/x86, efi/x86: Clarify EFI handover protocol and its requirements
efi/arm: Deal with ADR going out of range in efi_enter_kernel()
efi/x86: Always relocate the kernel for EFI handover entry
efi/x86: Move efi stub globals from .bss to .data
efi/libstub/x86: Remove redundant assignment to pointer hdr
efi/cper: Use scnprintf() for avoiding potential buffer overflow
Tuong Lien [Wed, 15 Apr 2020 11:34:49 +0000 (18:34 +0700)]
tipc: fix incorrect increasing of link window
In commit
16ad3f4022bb ("tipc: introduce variable window congestion
control"), we allow link window to change with the congestion avoidance
algorithm. However, there is a bug that during the slow-start if packet
retransmission occurs, the link will enter the fast-recovery phase, set
its window to the 'ssthresh' which is never less than 300, so the link
window suddenly increases to that limit instead of decreasing.
Consequently, two issues have been observed:
- For broadcast-link: it can leave a gap between the link queues that a
new packet will be inserted and sent before the previous ones, i.e. not
in-order.
- For unicast: the algorithm does not work as expected, the link window
jumps to the slow-start threshold whereas packet retransmission occurs.
This commit fixes the issues by avoiding such the link window increase,
but still decreasing if the 'ssthresh' is lowered.
Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cambda Zhu [Wed, 15 Apr 2020 09:54:04 +0000 (17:54 +0800)]
Documentation: Fix tcp_challenge_ack_limit default value
The default value of tcp_challenge_ack_limit has been changed from
100 to 1000 and this patch fixes its documentation.
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Yan [Wed, 15 Apr 2020 08:42:48 +0000 (16:42 +0800)]
net: tulip: make early_486_chipsets static
Fix the following sparse warning:
drivers/net/ethernet/dec/tulip/tulip_core.c:1280:28: warning: symbol
'early_486_chipsets' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Jonker [Wed, 15 Apr 2020 20:01:49 +0000 (22:01 +0200)]
dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400
The description below is already in use in
'rk3228-evb.dts', 'rk3229-xms6.dts' and 'rk3328.dtsi'
but somehow never added to a document, so add
"ethernet-phy-id1234.d400", "ethernet-phy-ieee802.3-c22"
for ethernet-phy nodes on Rockchip platforms to
'ethernet-phy.yaml'.
Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 15 Apr 2020 23:16:30 +0000 (00:16 +0100)]
ipv6: remove redundant assignment to variable err
The variable err is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Mosnacek [Tue, 14 Apr 2020 14:23:51 +0000 (16:23 +0200)]
selinux: free str on error in str_read()
In [see "Fixes:"] I missed the fact that str_read() may give back an
allocated pointer even if it returns an error, causing a potential
memory leak in filename_trans_read_one(). Fix this by making the
function free the allocated string whenever it returns a non-zero value,
which also makes its behavior more obvious and prevents repeating the
same mistake in the future.
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID:
1461665 ("Resource leaks")
Fixes: c3a276111ea2 ("selinux: optimize storage of filename transitions")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Jason Gunthorpe [Tue, 14 Apr 2020 23:02:07 +0000 (20:02 -0300)]
net/rds: Use ERR_PTR for rds_message_alloc_sgs()
Returning the error code via a 'int *ret' when the function returns a
pointer is very un-kernely and causes gcc 10's static analysis to choke:
net/rds/message.c: In function ‘rds_message_map_pages’:
net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
358 | return ERR_PTR(ret);
Use a typical ERR_PTR return instead.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 14 Apr 2020 19:36:15 +0000 (22:36 +0300)]
net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge
To rehash a previous explanation given in commit
1c44ce560b4d ("net:
mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is
up"), the switch driver operates the in a mode where a single VLAN can
be transmitted as untagged on a particular egress port. That is the
"native VLAN on trunk port" use case.
The configuration for this native VLAN is driven in 2 ways:
- Set the egress port rewriter to strip the VLAN tag for the native
VID (as it is egress-untagged, after all).
- Configure the ingress port to drop untagged and priority-tagged
traffic, if there is no native VLAN. The intention of this setting is
that a trunk port with no native VLAN should not accept untagged
traffic.
Since both of the above configurations for the native VLAN should only
be done if VLAN awareness is requested, they are actually done from the
ocelot_port_vlan_filtering function, after the basic procedure of
toggling the VLAN awareness flag of the port.
But there's a problem with that simplistic approach: we are trying to
juggle with 2 independent variables from a single function:
- Native VLAN of the port - its value is held in port->vid.
- VLAN awareness state of the port - currently there are some issues
here, more on that later*.
The actual problem can be seen when enslaving the switch ports to a VLAN
filtering bridge:
0. The driver configures a pvid of zero for each port, when in
standalone mode. While the bridge configures a default_pvid of 1 for
each port that gets added as a slave to it.
1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true.
The VLAN-filtering-dependent portion of the native VLAN
configuration is done, considering that the native VLAN is 0.
2. The bridge calls ocelot_vlan_add with vid=1, pvid=true,
untagged=true. The native VLAN changes to 1 (change which gets
propagated to hardware).
3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the
VLAN-filtering-dependent portion of the native VLAN configuration,
for the new native VLAN of 1. One can notice that after toggling "ip
link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0
type bridge vlan_filtering 1", the new native VLAN finally makes it
through and untagged traffic finally starts flowing again. But
obviously that shouldn't be needed.
So it is clear that 2 independent variables need to both re-trigger the
native VLAN configuration. So we introduce the second variable as
ocelot_port->vlan_aware.
*Actually both the DSA Felix driver and the Ocelot driver already had
each its own variable:
- Ocelot: ocelot_port_private->vlan_aware
- Felix: dsa_port->vlan_filtering
but the common Ocelot library needs to work with a single, common,
variable, so there is some refactoring done to move the vlan_aware
property from the private structure into the common ocelot_port
structure.
Fixes: 97bb69e1e36e ("net: mscc: ocelot: break apart ocelot_vlan_port_apply")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Apr 2020 18:35:42 +0000 (11:35 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2020-04-15
The following pull-request contains BPF updates for your *net* tree.
We've added 10 non-merge commits during the last 3 day(s) which contain
a total of 11 files changed, 238 insertions(+), 95 deletions(-).
The main changes are:
1) Fix offset overflow for BPF_MEM BPF_DW insn mapping on arm32 JIT,
from Luke Nelson and Xi Wang.
2) Prevent mprotect() to make frozen & mmap()'ed BPF map writeable
again, from Andrii Nakryiko and Jann Horn.
3) Fix type of old_fd in bpf_xdp_set_link_opts to int in libbpf and add
selftests, from Toke Høiland-Jørgensen.
4) Fix AF_XDP to check that headroom cannot be larger than the available
space in the chunk, from Magnus Karlsson.
5) Fix reset of XDP prog when expected_fd is set, from David Ahern.
6) Fix a segfault in bpftool's struct_ops command when BTF is not
available, from Daniel T. Lee.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Apr 2020 18:27:23 +0000 (11:27 -0700)]
Merge tag 'mac80211-for-net-2020-04-15' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple of fixes:
* FTM responder policy netlink validation fix
(but the only user validates again later)
* kernel-doc fixes
* a fix for a race in mac80211 radio registration vs. userspace
* a mesh channel switch fix
* a fix for a syzbot reported kasprintf() issue
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Toke Høiland-Jørgensen [Tue, 14 Apr 2020 14:50:25 +0000 (16:50 +0200)]
selftests/bpf: Check for correct program attach/detach in xdp_attach test
David Ahern noticed that there was a bug in the EXPECTED_FD code so
programs did not get detached properly when that parameter was supplied.
This case was not included in the xdp_attach tests; so let's add it to be
sure that such a bug does not sneak back in down.
Fixes: 87854a0b57b3 ("selftests/bpf: Add tests for attaching XDP programs")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200414145025.182163-2-toke@redhat.com
Toke Høiland-Jørgensen [Tue, 14 Apr 2020 14:50:24 +0000 (16:50 +0200)]
libbpf: Fix type of old_fd in bpf_xdp_set_link_opts
The 'old_fd' parameter used for atomic replacement of XDP programs is
supposed to be an FD, but was left as a u32 from an earlier iteration of
the patch that added it. It was converted to an int when read, so things
worked correctly even with negative values, but better change the
definition to correctly reflect the intention.
Fixes: bd5ca3ef93cd ("libbpf: Add function to set link XDP fd while specifying old program")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200414145025.182163-1-toke@redhat.com
Andrii Nakryiko [Tue, 14 Apr 2020 18:26:45 +0000 (11:26 -0700)]
libbpf: Always specify expected_attach_type on program load if supported
For some types of BPF programs that utilize expected_attach_type, libbpf won't
set load_attr.expected_attach_type, even if expected_attach_type is known from
section definition. This was done to preserve backwards compatibility with old
kernels that didn't recognize expected_attach_type attribute yet (which was
added in
5e43f899b03a ("bpf: Check attach type at prog load time"). But this
is problematic for some BPF programs that utilize newer features that require
kernel to know specific expected_attach_type (e.g., extended set of return
codes for cgroup_skb/egress programs).
This patch makes libbpf specify expected_attach_type by default, but also
detect support for this field in kernel and not set it during program load.
This allows to have a good metadata for bpf_program
(e.g., bpf_program__get_extected_attach_type()), but still work with old
kernels (for cases where it can work at all).
Additionally, due to expected_attach_type being always set for recognized
program types, bpf_program__attach_cgroup doesn't have to do extra checks to
determine correct attach type, so remove that additional logic.
Also adjust section_names selftest to account for this change.
More detailed discussion can be found in [0].
[0] https://lore.kernel.org/bpf/
20200412003604.GA15986@rdna-mbp.dhcp.thefacebook.com/
Fixes: 5cf1e9145630 ("bpf: cgroup inet skb programs can return 0 to 3")
Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20200414182645.1368174-1-andriin@fb.com
Magnus Karlsson [Tue, 14 Apr 2020 07:35:15 +0000 (09:35 +0200)]
xsk: Add missing check on user supplied headroom size
Add a check that the headroom cannot be larger than the available
space in the chunk. In the current code, a malicious user can set the
headroom to a value larger than the chunk size minus the fixed XDP
headroom. That way packets with a length larger than the supported
size in the umem could get accepted and result in an out-of-bounds
write.
Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt")
Reported-by: Bui Quang Minh <minhquangbui99@gmail.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207225
Link: https://lore.kernel.org/bpf/1586849715-23490-1-git-send-email-magnus.karlsson@intel.com
Tamizh chelvam [Sat, 28 Mar 2020 13:53:24 +0000 (19:23 +0530)]
mac80211: fix channel switch trigger from unknown mesh peer
Previously mesh channel switch happens if beacon contains
CSA IE without checking the mesh peer info. Due to that
channel switch happens even if the beacon is not from
its own mesh peer. Fixing that by checking if the CSA
originated from the same mesh network before proceeding
for channel switch.
Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Link: https://lore.kernel.org/r/1585403604-29274-1-git-send-email-tamizhr@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Sumit Garg [Tue, 7 Apr 2020 10:10:55 +0000 (15:40 +0530)]
mac80211: fix race in ieee80211_register_hw()
A race condition leading to a kernel crash is observed during invocation
of ieee80211_register_hw() on a dragonboard410c device having wcn36xx
driver built as a loadable module along with a wifi manager in user-space
waiting for a wifi device (wlanX) to be active.
Sequence diagram for a particular kernel crash scenario:
user-space ieee80211_register_hw() ieee80211_tasklet_handler()
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | |
|<---phy0----wiphy_register() |
|-----iwd if_add---->| |
| |<---IRQ----(RX packet)
| Kernel crash |
| due to unallocated |
| workqueue. |
| | |
| alloc_ordered_workqueue() |
| | |
| Misc wiphy init. |
| | |
| ieee80211_if_add() |
| | |
As evident from above sequence diagram, this race condition isn't specific
to a particular wifi driver but rather the initialization sequence in
ieee80211_register_hw() needs to be fixed. So re-order the initialization
sequence and the updated sequence diagram would look like:
user-space ieee80211_register_hw() ieee80211_tasklet_handler()
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | |
| alloc_ordered_workqueue() |
| | |
| Misc wiphy init. |
| | |
|<---phy0----wiphy_register() |
|-----iwd if_add---->| |
| |<---IRQ----(RX packet)
| | |
| ieee80211_if_add() |
| | |
Cc: stable@vger.kernel.org
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/1586254255-28713-1-git-send-email-sumit.garg@linaro.org
[Johannes: fix rtnl imbalances]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Xiao Yang [Tue, 14 Apr 2020 01:51:45 +0000 (09:51 +0800)]
tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger()
or snapshot_count_trigger()) when register_snapshot_trigger() has completed
registration but doesn't allocate buffer for 'snapshot' event trigger. In
the rare case, 'snapshot' operation always detects the lack of allocated
buffer so make register_snapshot_trigger() allocate buffer first.
trigger-snapshot.tc in kselftest reproduces the issue on slow vm:
-----------------------------------------------------------
cat trace
...
ftracetest-3028 [002] .... 236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036
<...>-2875 [003] .... 240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED ***
<...>-2875 [003] .... 240.460338: tracing_snapshot_instance_cond: *** stopping trace here! ***
-----------------------------------------------------------
Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com
Cc: stable@vger.kernel.org
Fixes: 93e31ffbf417a ("tracing: Add 'snapshot' event trigger command")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
David S. Miller [Tue, 14 Apr 2020 23:48:09 +0000 (16:48 -0700)]
Merge branch 'Fix-88x3310-leaving-power-save-mode'
Russell King says:
====================
Fix 88x3310 leaving power save mode
This series fixes a problem with the 88x3310 PHY on Macchiatobin
coming out of powersave mode noticed by Matteo Croce. It seems
that certain PHY firmwares do not properly exit powersave mode,
resulting in a fibre link not coming up.
The solution appears to be to soft-reset the PHY after clearing
the powersave bit.
We add support for reporting the PHY firmware version to the kernel
log, and use it to trigger this new behaviour if we have v0.3.x.x
or more recent firmware on the PHY. This, however, is a guess as
the firmware revision documentation does not mention this issue,
and we know that v0.2.1.0 works without this fix but v0.3.3.0 and
later does not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 14 Apr 2020 19:49:08 +0000 (20:49 +0100)]
net: marvell10g: soft-reset the PHY when coming out of low power
Soft-reset the PHY when coming out of low power mode, which seems to
be necessary with firmware versions 0.3.3.0 and 0.3.10.0.
This depends on ("net: marvell10g: report firmware version")
Fixes: c9cc1c815d36 ("net: phy: marvell10g: place in powersave mode at probe")
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 14 Apr 2020 19:49:03 +0000 (20:49 +0100)]
net: marvell10g: report firmware version
Report the firmware version when probing the PHY to allow issues
attributable to firmware to be diagnosed.
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe [Tue, 14 Apr 2020 15:27:08 +0000 (12:27 -0300)]
net/cxgb4: Check the return from t4_query_params properly
Positive return values are also failures that don't set val,
although this probably can't happen. Fixes gcc 10 warning:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c: In function ‘t4_phy_fw_ver’:
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:3747:14: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized]
3747 | *phy_fw_ver = val;
Fixes: 01b6961410b7 ("cxgb4: Add PHY firmware support for T420-BT cards")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atsushi Nemoto [Tue, 14 Apr 2020 01:12:34 +0000 (10:12 +0900)]
net: stmmac: socfpga: Allow all RGMII modes
Allow all the RGMII modes to be used. (Not only "rgmii", "rgmii-id"
but "rgmii-txid", "rgmii-rxid")
Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Apr 2020 23:33:26 +0000 (16:33 -0700)]
Merge branch 'mv88e6xxx-fixed-link-fixes'
Andrew Lunn says:
====================
mv88e6xxx fixed link fixes
Recent changes for how the MAC is configured broke fixed links, as
used by CPU/DSA ports, and for SFPs when phylink cannot be used. The
first fix is unchanged from v1. The second fix takes a different
solution than v1. If a CPU or DSA port is known to have a PHYLINK
instance, configure the port down before instantiating the PHYLINK, so
it is in the down state as expected by PHYLINK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 14 Apr 2020 00:34:39 +0000 (02:34 +0200)]
net: dsa: Down cpu/dsa ports phylink will control
DSA and CPU ports can be configured in two ways. By default, the
driver should configure such ports to there maximum bandwidth. For
most use cases, this is sufficient. When this default is insufficient,
a phylink instance can be bound to such ports, and phylink will
configure the port, e.g. based on fixed-link properties. phylink
assumes the port is initially down. Given that the driver should have
already configured it to its maximum speed, ask the driver to down
the port before instantiating the phylink instance.
Fixes: 30c4a5b0aad8 ("net: mv88e6xxx: use resolved link config in mac_link_up()")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Tue, 14 Apr 2020 00:34:38 +0000 (02:34 +0200)]
net: dsa: mv88e6xxx: Configure MAC when using fixed link
The
88e6185 is reporting it has detected a PHY, when a port is
connected to an SFP. As a result, the fixed-phy configuration is not
being applied. That then breaks packet transfer, since the port is
reported as being down.
Add additional conditions to check the interface mode, and if it is
fixed always configure the port on link up/down, independent of the
PPU status.
Fixes: 30c4a5b0aad8 ("net: mv88e6xxx: use resolved link config in mac_link_up()")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Apr 2020 23:30:14 +0000 (16:30 -0700)]
Merge branch 'ionic-address-automated-build-complaints'
Shannon Nelson says:
====================
ionic: address automated build complaints
Kernel build checks found a couple of things to improve.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Mon, 13 Apr 2020 17:33:11 +0000 (10:33 -0700)]
ionic: fix unused assignment
Remove an unused initialized value.
Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Mon, 13 Apr 2020 17:33:10 +0000 (10:33 -0700)]
ionic: add dynamic_debug header
Add the appropriate header for using dynamic_hex_dump(), which
seems to be incidentally included in some configurations but
not all.
Fixes: 7e4d47596b68 ("ionic: replay filters after fw upgrade")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>