git.openwrt.org Git - openwrt/staging/blogic.git/log

ipv6: route: make rtm_getroute not assume rtnl is locked

__dev_get_by_index assumes RTNL is held, use _rcu version instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: add 'ip get' to rtnetlink.sh

exercise ip/ip6 RTM_GETROUTE doit() callpath.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: fix flow disector null pointer

A recent change to fix up DSA device behavior made the assumption that
all skbs passing through the flow disector will be associated with a
device. This does not appear to be a safe assumption.  Syzkaller found
the crash below by attaching a BPF socket filter that tries to find the
payload offset of a packet passing between two unix sockets.

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 2940 Comm: syzkaller872007 Not tainted 4.13.0-rc4-next-20170811 #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d1b425c0 task.stack: ffff8801d0bc0000
RIP: 0010:__skb_flow_dissect+0xdcd/0x3ae0 net/core/flow_dissector.c:445
RSP: 0018:ffff8801d0bc7340 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000060 RSI: ffffffff856dc080 RDI: 0000000000000300
RBP: ffff8801d0bc7870 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000008 R11: ffffed003a178f1e R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff856dc080 R15: ffff8801ce223140
FS:  00000000016ed880(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020008000 CR3: 00000001ce22d000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
skb_flow_dissect_flow_keys include/linux/skbuff.h:1176 [inline]
skb_get_poff+0x9a/0x1a0 net/core/flow_dissector.c:1079
______skb_get_pay_offset net/core/filter.c:114 [inline]
__skb_get_pay_offset+0x15/0x20 net/core/filter.c:112
Code: 80 3c 02 00 44 89 6d 10 0f 85 44 2b 00 00 4d 8b 67 20 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 00 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 13 2b 00 00 4d 8b a4 24 00 03 00 00 4d 85 e4
RIP: __skb_flow_dissect+0xdcd/0x3ae0 net/core/flow_dissector.c:445 RSP: ffff8801d0bc7340

Fixes: 43e665287f93 ("net-next: dsa: fix flow dissection")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Use correct config option

I made an embarrassing mistake and used 'IPV6' instead of 'CONFIG_IPV6'
around the function that updates the kernel about IPv6 neighbours
activity. This can be a problem if the kernel has more neighbours than a
certain threshold and it starts deleting those that are supposedly
inactive.

Fixes: b5f3e0d43012 ("mlxsw: spectrum_router: Fix build when IPv6 isn't enabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fib: Provide offload indication using nexthop flags

IPv6 routes currently lack nexthop flags as in IPv4. This has several
implications.

In the forwarding path, it requires us to check the carrier state of the
nexthop device and potentially ignore a linkdown route, instead of
checking for RTNH_F_LINKDOWN.

It also requires capable drivers to use the user facing IPv6-specific
route flags to provide offload indication, instead of using the nexthop
flags as in IPv4.

Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
provide offload indication instead of the RTF_OFFLOAD flag, which is
removed while it's still not part of any official kernel release.

In the near future we would like to use the field for the
RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
not be ready in time for the current cycle.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: remove unnecessary pci_set_drvdata()

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not necessary to manually clear the
device driver data to NULL.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: remove unnecessary pci_set_drvdata()

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not necessary to manually clear the
device driver data to NULL.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf/verifier: track liveness for pruning

State of a register doesn't matter if it wasn't read in reaching an exit;
a write screens off all reads downstream of it from all explored_states
upstream of it.
This allows us to prune many more branches; here are some processed insn
counts for some Cilium programs:
Program                  before  after
bpf_lb_opt_-DLB_L3.o       6515   3361
bpf_lb_opt_-DLB_L4.o       8976   5176
bpf_lb_opt_-DUNKNOWN.o     2960   1137
bpf_lxc_opt_-DDROP_ALL.o  95412  48537
bpf_lxc_opt_-DUNKNOWN.o  141706  78718
bpf_netdev.o              24251  17995
bpf_overlay.o             10999   9385

The runtime is also improved; here are 'time' results in ms:
Program                  before  after
bpf_lb_opt_-DLB_L3.o         24      6
bpf_lb_opt_-DLB_L4.o         26     11
bpf_lb_opt_-DUNKNOWN.o       11      2
bpf_lxc_opt_-DDROP_ALL.o   1288    139
bpf_lxc_opt_-DUNKNOWN.o    1768    234
bpf_netdev.o                 62     31
bpf_overlay.o                15     13

Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-next'

Julian Wiedmann says:

====================
s390/net: updates for 4.14

a mixed bag of minor fixes, cleanups and refactors for net-next. Please apply.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fix using of ref counter for rxip addresses

IP-address setting and removal are delayed when the device is not yet in
state SOFTSETUP or UP. ref_counter has been implemented only for
ip-address with type normal. In this patch ref_counter logic is also used
for ip-address with type rxip to allow appropriate handling of multiple
postponed rxip add and del calls.

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fix trace-messages for deleting rxip addresses

change trace-messages:
- from addrxip4 to delrxip4
- from addrxip6 to delrxip6

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: reject multicast rxip addresses

There exist different commands to add unicast and multicast addresses on
the OSA card. rxip addresses are always set as unicast addresses and
thus just unicast addresses should be allowed.

Adding a multicast address now fails and a grace message is generated.

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: extract bridgeport cmd builder

Consolidation of duplicated code, no functional change.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/net: reduce inlining

Clean up the inline cruft in s390 net drivers. Many of the inlined
functions had only one caller anyway.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: make more use of skb API

Replace some open-coded parts with their proper API calls.

Also remove two skb_[re]set_mac_header() calls in the L2
xmit paths that are clearly no longer required, since at least
commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()").

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: clean up fill_buffer() offset logic

For some xmit paths we pass down a data offset to qeth_fill_buffer(),
to indicate that the first k bytes of the skb should be skipped when
mapping it into buffer elements.
Commit acd9776b5c45 ("s390/qeth: no ETH header for outbound AF_IUCV")
recently switched the offset for the IUCV-over-HiperSockets path
from 0 to ETH_HLEN, and now we have

device offset
OSA = 0
IQD > 0

for all xmit paths.

OSA would previously pass down -1 from do_send_packet(), to distinguish
between 1) OSA and 2) IQD with offset 0. That's no longer needed now,
so have it pass 0, make the offset unsigned and clean up how we apply
the offset in __qeth_fill_buffer().

No change of behaviour for any of our current xmit paths.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: straighten out fill_buffer() interface

1. for adjusting the buffer's next_element_to_fill in __fill_buffer(),
just pass the full qeth_qdio_out_buffer struct
2. when adding a header element, be consistent about passing
a hint ('is_first_elem') to __fill_buffer()

No functional change.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: simplify fragment type selection

Improve readability of the code that determines a buffer element's
fragment type, and reduce the number of cases down from 5 to 3.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: remove extra L3 adapterparms query

qeth_l3_setadapter_parms() queries the device for supported
adapterparms, even though they already have been queried as part of the
device's high-level setup. Remove that extra call.

The only call chain for qeth_l3_setadapter_parms() is
__qeth_l3_set_online()
qeth_core_hardsetup_card()
qeth_query_setadapterparms()
qeth_l3_setadapter_parms()
qeth_query_setadapterparms()

, and we only reach qeth_l3_setadapter_parms() if the first
adapterparms query succeeds. Hence removing the second query results in
no loss of functionality.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: remove extra L2 adapterparms query

qeth_l2_request_initial_mac() queries the device for its supported
adapterparms, even though they already have been queried as part of the
device's high-level setup. Remove that extra call.

The only call chain for qeth_l2_request_initial_mac() is
__qeth_l2_set_online()
qeth_core_hardsetup_card()
qeth_query_setadapterparms()
qeth_l2_setup_netdev()
qeth_l2_request_initial_mac()
qeth_query_setadapterparms()

, and we only reach qeth_l2_request_initial_mac() if the first
adapterparms query succeeds. Hence removing the second query results in
no loss of functionality.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: don't access skb after transmission

After transmitting a skb via send_packet[_fast](), the statistics
code accesses the skb once more to account for transmitted page frags.
This has a (theoretical?) race against the TX completion - if the TX
completion is processed and frees the skb before hard_start_xmit()
gets to the statistics part, we access random memory.

Fix this by caching the # of page frags, before the skb is transmitted.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: fix issues with fw_type module parameter

The fw_type module parameter isn't showing up in the
/sys/module/liquidio/parameters directory.  Fix it by setting the read
permission bits for user, group, other in module_param_string().  Revise
the description of fw_type.  Initialize the fw_type static char array with
the default value to conform to the module parameter description.

Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Add-support-for-nexthop-group-consolidation-for-IPv6'

Jiri Pirko says:

====================
mlxsw: Add support for nexthop group consolidation for IPv6

Arkadi says:

Due to limited ASIC resources the maximum number of routes is limited by
the nexthop resource. In order to improve the routing scale nexthop
consolidation should be performed.

In case of IPv4, the kernel does the consolidation of nexthops in the form
of the fib_info struct. In that case, the driver uses the fib_info's
address as a key for the internal nexthop group representative struct
lookup. In case of IPv6, the kernel doesn't do consolidation, thus the
driver should implement it by itself.

The hash value is calculated based on the nexthop set, by performing
bitwise xor on the ifindexs of the nexthops, in a similar way to IPV4's
kernel implementation. In case of collision a full match is performed
between the sets which include address and ifindex comparison.

In order to use the same hash table in both cases (IPv4/6), the rhashtable
is changed to operate on variable length key.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Add support for nexthop group consolidation for IPv6

Due to limited ASIC resources the maximum number of routes is limited by
the nexthop resource. In order to improve the routing scale nexthop
consolidation should be performed.

This patch adds support for IPv6 neighbor consolidation. The hash value
is calculated based on the nexthop set, by performing bitwise xor on the
ifindexs of the nexthops, in a similar way to IPv4's kernel implementation.
In case of collision a full match is performed between the sets which
include address and ifindex comparison.

Non gateway nexthop groups are not inserted to the hash table due to
lack of nexthop device (ifindex).

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Prepare nexthop group's hash table for IPv6

This patch does preparation before introducing IPv6 nexthop group
consolidation. Currently the nexthop group hash table is used only by
IPv4 and uses fixed key size. In order to support the IPv6's variable
length key the current table is changed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'liquidio-adding-support-for-ethtool-set-ring-feature'

Intiyaz Basha says:

====================
liquidio: adding support for ethtool --set-ring feature

Code reorganization is required for adding ethtool --set-ring feature.
First seven patches are for code reorganization. The last patch is for
adding this feature.

Change Log:
V1 -> V2
Only patch #8 was changed: unnecessary parentheses were removed in two
if-statements in lio_ethtool_set_ringparam().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: added support for ethtool --set-ring feature

added support for ethtool --set-ring feature

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved liquidio_setup_io_queues to lio_core.c

Moving common liquidio_setup_io_queues to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved liquidio_napi_poll to lio_core.c

Moving common liquidio_napi_poll to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved liquidio_napi_drv_callback to lio_core.c

Moving common liquidio_napi_drv_callback to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved liquidio_push_packet to lio_core.c

Moving common liquidio_push_packet to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved octeon_setup_droq to lio_core.c

Moving common octeon_setup_droq to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved update_txq_status to lio_core.c

Moving common update_txq_status to lio_core.c

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved wait_for_pending_requests to octeon_network.h

Moving common function wait_for_pending_requests to octeon_network.h

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlnx-i2c'

Ohad Oz says:

====================
Enable Mellanox switch device in I2C mode

The following patch set updates global to Mellanox Kconfig files to support
configuration of Mellanox Switch (mlxsw) without PCI and with I2C only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Change Kconfig description

This patch apply Mellanox network vendor which includes:
- Mellanox card devices: ConnectX-4, ConnectX-5 and Connect-IB cards.
- Mellanox switch device: SwitchX-2 Switch-IB, Spectrum.
Therefore rephrasing help.

Signed-off-by: Ohad Oz <ohado@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Allow Mellanox switch devices to be configured if only I2C bus is set

Mellanox switches (mlxsw) supports I2C systems without PCI, in order to
give the ability to the users to use such functionality, there is need
to update Kconfig.

Signed-off-by: Ohad Oz <ohado@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Use tab for indentation in Kconfig

Using tabs instead of space for indentation.

Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-spectrum_router-Increase-VRF-scale'

Jiri Pirko says:

====================
mlxsw: spectrum_router: Increase VRF scale

Ido says:

The purpose of this set is to increase the maximum number of supported VRF
devices on top of the Spectrum ASIC under different workloads.

This is achieved by sharing the same LPM tree across all the virtual
routers for a given L3 protocol (IPv4 / IPv6). The change is explained in
detail in the third patch. First two patches are small changes to make
review easier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Use one LPM tree for all virtual routers

The number of LPM trees available for lookup is much smaller than the
number of virtual routers, which are used to implement VRFs. In
addition, an LPM tree can only be used by one protocol - either IPv4 or
IPv6.

Therefore, in order to increase the number of supported virtual routers
to the maximum we need to be able to share LPM trees across virtual
routers instead of trying to find an optimized tree for each.

Do that by allocating one LPM tree for each protocol, but make sure it
will only include prefixes that are actually used, so as to not perform
unnecessary lookups.

Since changing the structure of a bound tree isn't recommended, whenever
a new tree it required, it's first created and then bound to each
virtual router, replacing the old one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Pass argument explicitly

Instead of relying on the LPM tree to be assigned to the virtual router
before binding the two, lets pass it explicitly.

This will later allow us to return upon binding error instead of having
to perform a rollback of the assignment.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Return void from deletion functions

There is no point in returning a value from function whose return value
is never checked.

Even if the return value was checked, there wouldn't be anything to do
about it, as these functions are either called from error or deletion
paths.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: fix duplicated code for different branches

Refactor code in order to avoid identical code for different branches.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: fix duplicated code for different branches

Refactor code in order to avoid identical code for different branches.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: update debug console logging mechanism

- remove logging dependency upon global func octeon_console_debug_enabled()
- abstract debug console logging using console structure (via function ptr)
to allow for more flexible logging

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ti: cpsw:: constify platform_device_id

platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by <linux/platform_device.h>
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sh_eth: constify platform_device_id

platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by <linux/platform_device.h>
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dpaa_eth: constify platform_device_id

platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by <linux/platform_device.h>
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: constify platform_device_id

platform_device_id are not supposed to change at runtime. All functions
working with platform_device_id provided by <linux/platform_device.h>
work with const platform_device_id. So mark the non-const structs as
const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tap: make struct tap_fops static

The structure tap_fops is local to the source and does not need to
be in global scope, so make it static.

Cleans up sparse warning:
symbol 'tap_fops' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: make array guest_offloads static

The array guest_offloads is local to the source and does not need to
be in global scope, so make it static. Also tweak formatting.

Cleans up sparse warnings:
symbol 'guest_offloads' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of_mdio: merge branch tails in of_phy_register_fixed_link()

Looks like gcc isn't always able to figure out that 3 *if* branches in
of_phy_register_fixed_link() calling fixed_phy_register() at their ends
are similar enough and thus can be merged. The "manual" merge saves 40
bytes of the object code (AArch64 gcc 4.8.5), and still saves 12 bytes
even if gcc was able to merge the branch tails (ARM gcc 4.8.5)...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vrf-Support-for-local-traffic-with-sockets-bound-to-enslaved-devices'

David Ahern says:

====================
net: vrf: Support for local traffic with sockets bound to enslaved devices

This set gets local traffic working for sockets bound to enslaved
devices. The local rtable and rt6_info added in June 2016 to get
local traffic in VRFs working is no longer needed and actually
keeps local traffic for sockets bound to an enslaved device from
working. Patch 1 removes them.

Patch 2 adds a fix up for IPv4 IP_PKTINFO to return rt_iif for
packets sent over the VRF device. This is similar to the handling
of loopback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: add check for l3slave for index returned in IP_PKTINFO

Similar to the loopback device, for packets sent through a VRF device
the index returned in ipi_ifindex needs to be the saved index in
rt_iif.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vrf: Drop local rtable and rt6_info

The VRF cached rtable and rt6_info for local traffic are no longer
needed and actually prevent local traffic through enslaved devices.
Remove them.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: remove unnecessary check on orig_oif

rt_iif is going to be set to either 0 or orig_oif. If orig_oif
is 0 it amounts to the same end result so remove the check.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting

The kernel log is not where users expect error messages for netlink
requests; as we have extended acks now, we can replace pr_debug() with
NL_SET_ERR_MSG_ATTR().

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tap-XDP-support'

Jason Wang says:

====================
XDP support for tap

This series tries to implement XDP support for tap. Two path were
implemented:

- fast path: small & non-gso packet, For performance reason we do it
  at page level and use build_skb() to create skb if necessary.
- slow path: big or gso packet, we don't want to lose the capability
  compared to generic XDP, so we export some generic xdp helpers and
  do it after skb was created.

xdp1 shows about 41% improvement, xdp_redirect shows about 60%
improvement.

Changes from V1:
- fix the race between xdp set and free
- don't hold extra refcount
- add XDP_REDIRECT support

Please review.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tap: XDP support

This patch tries to implement XDP for tun. The implementation was
split into two parts:

- fast path: small and no gso packet. We try to do XDP at page level
  before build_skb(). For XDP_TX, since creating/destroying queues
  were completely under control of userspace, it was implemented
  through generic XDP helper after skb has been built. This could be
  optimized in the future.
- slow path: big or gso packet. We try to do it after skb was created
  through generic XDP helpers.

Test were done through pktgen with small packets.

xdp1 test shows ~41.1% improvement:

Before: ~1.7Mpps
After:  ~2.3Mpps

xdp_redirect to ixgbe shows ~60% improvement:

Before: ~0.8Mpps
After:  ~1.38Mpps

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: export some generic xdp helpers

This patch tries to export some generic xdp helpers to drivers. This
can let driver to do XDP for a specific skb. This is useful for the
case when the packet is hard to be processed at page level directly
(e.g jumbo/GSO frame).

With this patch, there's no need for driver to forbid the XDP set when
configuration is not suitable. Instead, it can defer the XDP for
packets that is hard to be processed directly after skb is created.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tap: use build_skb() for small packet

We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate
skb in the past. This socket based method is not suitable for high
speed userspace like virtualization which usually:

- ignore sk_sndbuf (INT_MAX) and expect to receive the packet as fast as
possible
- don't want to be block at sendmsg()

To eliminate the above overheads, this patch tries to use build_skb()
for small packet. We will do this only when the following conditions
are all met:

- TAP instead of TUN
- sk_sndbuf is INT_MAX
- caller don't want to be blocked
- zerocopy is not used
- packet size is smaller enough to use build_skb()

Pktgen from guest to host shows ~11% improvement for rx pps of tap:

Before: ~1.70Mpps
After : ~1.88Mpps

What's more important, this makes it possible to implement XDP for tap
before creating skbs.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnelink: Move link dump consistency check out of the loop

Calls to rtnl_dump_ifinfo() are protected by RTNL lock. So are the
{list,unlist}_netdevice() calls where we bump the net->dev_base_seq
number.

For this reason net->dev_base_seq can't change under out feet while
we're looping over links in rtnl_dump_ifinfo(). So move the check for
net->dev_base_seq change (since the last time we were called) out of the
loop.

This way we avoid giving a wrong impression that there are concurrent
updates to the link list going on while we're iterating over them.

Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: moved ptp_enable to octeon_device structure

ptp_enable was a global static variable. Moved this global variable to
octeon_device structure and removed extra device id check.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: bpf: add check for ip XDP redirect

Kernel test robot reports error when running test_xdp_redirect.sh.
Check if ip tool supports xdpgeneric, if not, skip the test.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: make mlxsw_config_profile const

Make these structures const as they only stored in the profile field of
a mlxsw_driver structure, which is of type const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Remove unnecessary newlines from OVS_NLERR uses

OVS_NLERR already adds a newline so these just add blank
lines to the logging.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: send control message when MAC representors are created

The firmware expects a MAC_REPR control message when a MAC representor
is created. The driver should expect a PORTMOD message to follow which
will provide the link states of the physical port associated with the MAC
representor.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.

Because we remove the UFO support, we will also remove the
CHECKSUM_UNNECESSARY check in skb_needs_check().

Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wan: dscc4: convert to plain DMA API

Make use the dma_*() interfaces rather than the pci_*() interfaces.

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

wan: dscc4: add checks for dma mapping errors

The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: set orig_oif based on fib result for local traffic

Attempts to connect to a local address with a socket bound
to a device with the local address hangs if there is no listener:

  $ ip addr sh dev eth1
  3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff
    inet 10.100.1.4/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2001:db8:1::4/120 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe1c:37/64 scope link
       valid_lft forever preferred_lft forever

  $ vrf-test -I eth1 -r 10.100.1.4
  <hangs when there is no server>

(don't let the command name fool you; vrf-test works without vrfs.)

The problem is that the original intended device, eth1 in this case, is
lost when the tcp reset is sent, so the socket lookup does not find a
match for the reset and the connect attempt hangs. Fix by adjusting
orig_oif for local traffic to the device from the fib lookup result.

With this patch you get the more user friendly:
  $ vrf-test -I eth1 -r 10.100.1.4
  connect failed: 111: Connection refused

orig_oif is saved to the newly created rtable as rt_iif and when set
it is used as the dif for socket lookups. It is set based on flowi4_oif
passed in to ip_route_output_key_hash_rcu and will be set to either
the loopback device, an l3mdev device, nothing (flowi4_oif = 0 which
is the case in the example above) or a netdev index depending on the
lookup path. In each case, resetting orig_oif to the device in the fib
result for the RTN_LOCAL case allows the actual device to be preserved
as the skb tx and rx is done over the loopback or VRF device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: implement several errata workarounds

Implemented workarounds for the following dTSEC Erratum:
A002, A004, A0012, A0014, A004839 on several operations
that involve MAC CFG register changes: adjust link,
rx pause frames, modify MAC address.

Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hns3pf: Fix some harmless copy and paste bugs

These were copy and paste bugs, but I believe they are harmless.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hns3pf: fix hns3_del_tunnel_port()

This function has a copy and paste bug so it accidentally calls the add
function instead of the delete function.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rockchip-internal-phy'

David Wu says:

====================
rockchip: Add the integrated PHY support

The rk3228 and rk3328 support integrated PHY inside, let's enable
it to work. And the integrated PHY need to do some special setting,
so register the rockchip integrated PHY driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: rockchip: Enable gmac2phy for rk3328-evb

Enable the gmac2phy, make the gmac2phy work on
the rk3328-evb board.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM64: dts: rockchip: Add gmac2phy node support for rk3328

The gmac2phy controller of rk3328 is connected to integrated PHY
directly inside, add the node for the integrated PHY support.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: rk3228-evb: Enable the integrated PHY for gmac

This patch enables the integrated PHY for rk3228 evb board
by default.
To use the external 1000M PHY on evb board, need to make
some switch of evb board to be on.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-rk: Add integrated PHY supprot for rk3328

There are two mac controllers in the rk3328, the one connects
to external PHY, and the other one connects to integrated PHY.
Like the mac of external PHY, the integrated PHY's mac also
needs to configure the related mac registers at GRF.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-rk: Add integrated PHY support for rk3228

There is only one mac controller in rk3228, which could connect to
external PHY or integrated PHY, use the grf_com_mux bit15 to route
external/integrated PHY.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-rk: Add integrated PHY support

To make integrated PHY work, need to configure the PHY clock,
PHY cru reset and related registers.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: net: phy: Add phy-is-integrated binding

Add the documentation for integrated PHY. A boolean property indicates
the PHY is integrated into the same physical package as the Ethernet
MAC. If needed, muxers should be configured to ensure the integrated
PHY is used. The absence of this property indicates the muxers should
be configured so that the external PHY is used.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-rk: Remove unwanted code for rk3328_set_to_rmii()

This is wrong setting for rk3328_set_to_rmii(), so remove it.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: defconfig: Enable CONFIG_ROCKCHIP_PHY

Make the rockchip PHY driver built into the kernel.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

multi_v7_defconfig: Make rockchip PHY built-in

Enable the rockchip PHY driver for multi_v7_defconfig builds.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add rockchip PHY driver support

Support integrated ethernet PHY currently.

Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: replace init_timer_deferrable with setup_deferrable_timer

Replace init_timer_deferrable with setup_deferrable_timer to simplify
the source code.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: davinci_mdio: print bus frequency

Frequency can be adjusted in DT it make sense to
print current used value on driver init.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: davinci_mdio: remove busy loop on wait user access

Polling 14 mdio devices on single mdio bus eats 30% of 1Ghz cpu time
due to busy loop in wait(). Add small delay to relax cpu.

Signed-off-by: Max Uvarov <muvarov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netvsc-minor-fixes-and-improvements'

Stephen Hemminger says:

====================
netvsc: minor fixes and improvements

These are non-critical bug fixes, related to functionality now in net-next.
1. delaying the automatic bring up of VF device to allow udev to change name.
2. performance improvement
3. handle MAC address change with VF; mostly propogate the error that VF gives.
4. minor cleanups
5. allow setting send/receive buffer size with ethtool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: keep track of some non-fatal overload conditions

Add ethtool statistics for case where send chimmeny buffer is
exhausted and driver has to fall back to doing scatter/gather
send. Also, add statistic for case where ring buffer is full and
receive completions are delayed.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: allow controlling send/recv buffer size

Control the size of the buffer areas via ethtool ring settings.
They aren't really traditional hardware rings, but host API breaks
receive and send buffer into chunks. The final size of the chunks are
controlled by the host.

The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: remove unnecessary check for NULL hdr

The function init_page_array is always called with a valid pointer
to RNDIS header. No check for NULL is needed.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: remove unnecessary cast of void pointer

Assignment to a typed pointer is sufficient in C.
No cast is needed.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: whitespace cleanup

Fix some minor indentation issues.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: no need to allocate send/receive on numa node

The send and receive buffers are both per-device (not per-channel).
The associated NUMA node is a property of the CPU which is per-channel
therefore it makes no sense to force the receive/send buffer to be
allocated on a particular node (since it is a shared resource).

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: check error return when restoring channels and mtu

If setting new values fails, and the attempt to restore original
settings fails. Then log an error and leave device down.
This should never happen, but if it does don't go down in flames.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: propagate MAC address change to VF slave

If VF is slaved to synthetic device, then any change to netvsc
MAC address should be propagated to the slave device.

If slave device doesn't support MAC address change then it
should also be an error to attempt to change synthetic NIC MAC
address.

It also fixes the error unwind in the original code.
If give a bad address, the old code would change the device
MAC address anyway.

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: don't signal host twice if empty

When hv_pkt_iter_next() returns NULL, it has already called
hv_pkt_iter_close(). Calling it twice can lead to extra host signal.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netvsc: delay setup of VF device

When VF device is discovered, delay bring it automatically up in
order to allow userspace to some simple changes (like renaming).

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>