git.openwrt.org Git - openwrt/staging/blogic.git/log

net: allow to kill a task which waits net_mutex in copy_new_ns

net_mutex can be locked for a long time. It may be because many
namespaces are being destroyed or many processes decide to create
a network namespace.

Both these operations are heavy, so it is better to have an ability to
kill a process which is waiting net_mutex.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames

META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.

This is incorrect for zero VID vlan accel packets, making the following
match fail:
tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...

Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b2 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
adapted).

Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.

Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-cosmetics-plus-res-mgmt-rewrite'

Jiri Pirko says:

====================
mlxsw: Driver update

Mostly cosmetics and small resource values management rewrite.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Convert resources into array

Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: cmd: Push resource query defines to cmd.h

Push cmd resource query related defines to cmd.h where they belong.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Generare register names automatically

Extend the MLXSW_REG_DEFINE macro to store register name in string form.
Use this string later on instead of hard coded string values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: reg: Use helper macro to define registers

Save some code and also prepare to easily carry name in string form.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: item: Make char *buf arg constant for getters

Enforce const for getter buf args.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: item: Make struct mlxsw_item args const

These should be const, so enforce it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-numa-id'

Daniel Borkmann says:

====================
Add BPF numa id helper

This patch set adds a helper for retrieving current numa node
id and a test case for SO_REUSEPORT.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

reuseport, bpf: add test case for bpf_get_numa_node_id

The test case is very similar to reuseport_bpf_cpu, only that here
we select socket members based on current numa node id.

  # numactl -H
  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
  node 0 size: 128867 MB
  node 0 free: 120080 MB
  node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
  node 1 size: 96765 MB
  node 1 free: 87504 MB
  node distances:
  node   0   1
    0:  10  20
    1:  20  10

  # ./reuseport_bpf_numa
  ---- IPv4 UDP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv6 UDP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv4 TCP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  ---- IPv6 TCP ----
  send node 0, receive socket 0
  send node 1, receive socket 1
  send node 1, receive socket 1
  send node 0, receive socket 0
  SUCCESS

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add helper for retrieving current numa node id

Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'udpmem'

Paolo Abeni says:

====================
udp: refactor memory accounting

This patch series refactor the udp memory accounting, replacing the
generic implementation with a custom one, in order to remove the needs for
locking the socket on the enqueue and dequeue operations. The socket backlog
usage is dropped, as well.

The first patch factor out pieces of some queue and memory management
socket helpers, so that they can later be used by the udp memory accounting
functions.
The second patch adds the memory account helpers, without using them.
The third patch replacse the old rx memory accounting path for udp over ipv4 and
udp over ipv6. In kernel UDP users are updated, as well.

The memory accounting schema is described in detail in the individual patch
commit message.

The performance gain depends on the specific scenario; with few flows (and
little contention in the original code) the differences are in the noise range,
while with several flows contending the same socket, the measured speed-up
is relevant (e.g. even over 100% in case of extreme contention)

Many thanks to Eric Dumazet for the reiterated reviews and suggestions.

v5 -> v6:
- do not orphan the skb on enqueue, skb_steal_sock() already did
the work for us

v4 -> v5:
- use the receive queue spin lock to protect the memory accounting
- several minor clean-up

v3 -> v4:
- simplified the locking schema, always use a plain spinlock

v2 -> v3:
- do not set the now unsed backlog_rcv callback

v1 -> v2:
- changed slighly the memory accounting schema, we now perform lazy reclaim
- fixed forward_alloc updating issue
- fixed memory counter integer overflows
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

udp: use it's own memory accounting schema

Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: implement memory accounting helpers

Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.

On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.

On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
  costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
  dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
  the work for us

The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().

v5 -> v6:
  - don't orphan the skb on enqueue

v4 -> v5:
  - replace the mem_lock with the receive queue spin lock
  - ensure that the bh is always allowed to enqueue at least
    a skb, even if sk_rcvbuf is exceeded

v3 -> v4:
  - reworked memory accunting, simplifying the schema
  - provide an helper for both memory scheduling and enqueuing

v1 -> v2:
  - use a udp specific destrctor to perform memory reclaiming
  - remove a couple of helpers, unneeded after the above cleanup
  - do not reclaim memory on dequeue if not under memory
    pressure
  - reworked the fwd accounting schema to avoid potential
    integer overflow

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/socket: factor out helpers for memory and queue manipulation

Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.

v4 -> v5:
  - avoid whitespace changes

v2 -> v4:
  - avoid exporting __sock_enqueue_skb

v1 -> v2:
  - avoid export sock_rmem_free

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove MTU limits on a few ether_setup callers

These few drivers call ether_setup(), but have no ndo_change_mtu, and thus
were overlooked for changes to MTU range checking behavior. They
previously had no range checks, so for feature-parity, set their min_mtu
to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500
inherited from the ether_setup() changes. Fine-tuning can come after we get
back to full feature-parity here.

CC: netdev@vger.kernel.org
Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
CC: R Parameswaran <parameswaran.r7@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
chn_table") turns out to be incomplete. A crash in
netvsc_get_next_send_section() is observed on mtu change when the device
is under load. The race I identified is: if we get to netvsc_send() after
we set net_device_ctx->nvdev link in netvsc_device_add() but before we
finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
link after the netvsc_init_buf() call as during the negotiation we need
to receive packets and on the receive path we check for it. It would
probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
links and check them accordingly in get_outbound_net_device()/
get_inbound_net_device() but this looks like an overkill.

Check that send_section_map is allocated in netvsc_send().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'MTU-core-range-checking-more'

Jarod Wilson says:

====================
net: use core MTU range checking everywhere

This stack of patches should get absolutely everything in the kernel
converted from doing their own MTU range checking to the core MTU range
checking. This second spin includes alterations to hopefully fix all
concerns raised with the first, as well as including some additional
changes to drivers and infrastructure where I completely missed necessary
updates.

These have all been built through the 0-day build infrastructure via the
(rebasing) master branch at https://github.com/jarodwilson/linux-muck, which
at the time of the most recent compile across 147 configs, was based on
net-next at commit 7b1536ef0aa0.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4/6: use core net MTU range checking

ipv4/ip_tunnel:
- min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
- preserve all ndo_change_mtu checks for now to prevent regressions

ipv6/ip6_tunnel:
- min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
- preserve all ndo_change_mtu checks for now to prevent regressions

ipv6/ip6_vti:
- min_mtu = 1280, max_mtu = 65535
- remove redundant vti6_change_mtu

ipv6/sit:
- min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
- remove redundant ipip6_tunnel_change_mtu

CC: netdev@vger.kernel.org
CC: "David S. Miller" <davem@davemloft.net>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/net: use net core MTU range checking

ctcm:
- min_mtu = 576, max_mtu = 65527

netiucv:
- min_mtu = 576, max_mtu = 65535

qeth:
- min_mtu = 64, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: linux-s390@vger.kernel.org
CC: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in misc drivers

firewire-net:
- set min/max_mtu
- remove fwnet_change_mtu

nes:
- set max_mtu
- clean up nes_netdev_change_mtu

xpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtu

hippi:
- set min/max_mtu
- remove hippi_change_mtu

batman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
  in the optimal place, don't have hardware to test with

rionet:
- set min/max_mtu
- remove rionet_change_mtu

slip:
- set min/max_mtu
- streamline sl_change_mtu

um/net_kern:
- remove pointless ndo_change_mtu

hsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtu

ipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
  connected mode or not, so we'll just set the max_mtu value to the max
  possible, and let the ndo_change_mtu function continue to validate any new
  MTU change requests with checks for CM or not. Note that ipoib has no
  min_mtu set, and thus, the network core's mtu > 0 check is the only lower
  bounds here.

mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtu

fddi:
- min_mtu = 21, max_mtu = 4470
- remove now redundant fddi_change_mtu (including export)

fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
  get past the core net MTU range checks so fjes_change_mtu can validate a
  new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)

hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)

f_phonet:
- min_mtu = 6, max_mtu = 65541

u_ether:
- min_mtu = 14, max_mtu = 15412

phonet/pep-gprs:
- min_mtu = 576, max_mtu = 65530
- remove redundant gprs_set_mtu

CC: netdev@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
CC: Faisal Latif <faisal.latif@intel.com>
CC: linux-rdma@vger.kernel.org
CC: Cliff Whickman <cpw@sgi.com>
CC: Robin Holt <robinmholt@gmail.com>
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Marek Lindner <mareklindner@neomailbox.ch>
CC: Simon Wunderlich <sw@simonwunderlich.de>
CC: Antonio Quartulli <a@unstable.cc>
CC: Sathya Prakash <sathya.prakash@broadcom.com>
CC: Chaitra P B <chaitra.basappa@broadcom.com>
CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
CC: MPT-FusionLinux.pdl@broadcom.com
CC: Sebastian Reichel <sre@kernel.org>
CC: Felipe Balbi <balbi@kernel.org>
CC: Arvid Brodin <arvid.brodin@alten.se>
CC: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in virt drivers

hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add

virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu

vmxnet3:
- set min/max_mtu

xen-netback:
- min_mtu = 0, max_mtu = 65517

xen-netfront:
- min_mtu = 0, max_mtu = 65535

unisys/visor:
- clean up defines a little to not clash with network core or add
redundat definitions

CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in core net infra

geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
  closer inspection and testing

macvlan:
- set min/max_mtu

tun:
- set min/max_mtu, remove tun_net_change_mtu

vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function
- This one is also not as straight-forward and could use closer inspection
  and testing from vxlan folks

bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
  change_mtu function

openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
  is the largest possible size supported

sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)

macsec:
- min_mtu = 0, max_mtu = 65535

macvlan:
- min_mtu = 0, max_mtu = 65535

ntb_netdev:
- min_mtu = 0, max_mtu = 65535

veth:
- min_mtu = 68, max_mtu = 65535

8021q:
- min_mtu = 0, max_mtu = 65535

CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in WAN drivers

- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu
- sent max_mtu in lec driver, remove lec_change_mtu
- set min/max_mtu in x25_asy driver

CC: netdev@vger.kernel.org
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: Krzysztof Halasa <khalasa@piap.pl>
CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
CC: Francois Romieu <romieu@fr.zoreil.com>
CC: Kevin Curtis <kevin.curtis@farsite.co.uk>
CC: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in wireless drivers

- set max_mtu in wil6210 driver
- set max_mtu in atmel driver
- set min/max_mtu in cisco airo driver, remove airo_change_mtu
- set min/max_mtu in ipw2100/ipw2200 drivers, remove libipw_change_mtu
- set min/max_mtu in p80211netdev, remove wlan_change_mtu
- set min/max_mtu in net/mac80211/iface.c and remove ieee80211_change_mtu
- set min/max_mtu in wimax/i2400m and remove i2400m_change_mtu
- set min/max_mtu in intersil/hostap and remove prism2_change_mtu
- set min/max_mtu in intersil/orinoco
- set min/max_mtu in tty/n_gsm and remove gsm_change_mtu

CC: netdev@vger.kernel.org
CC: linux-wireless@vger.kernel.org
CC: Maya Erez <qca_merez@qca.qualcomm.com>
CC: Simon Kelley <simon@thekelleys.org.uk>
CC: Stanislav Yakovlev <stas.yakovlev@gmail.com>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use core MTU range checking in USB NIC drivers

usbnet:
- Remove stale new_mtu <= 0 check in usbnet.c
- Set min_mtu = 0, max_mtu = 65535 (sub-drivers must set their own
  max_mtu and/or min_mtu as needed)

r8152:
- Set appropriate max_mtu for different variants (1500 or 9194)

lan78xx:
- Set max_mtu = 9000

asix_driver:
- max_mtu = 16384 for ax88178 variant

ax88179:
- max_mtu = 4088

cdc_ncm:
- max_mtu from hardware

cdc-phonet:
- min_mtu = 6, max_mtu = 65541

sierra_net:
- max_mtu = 1500, call usbnet_change_mtu directly
- sierra_net_change_mtu checked for MTU > 1500, then called
  usbnet_change_mtu, but if we set max_mtu to let the network core handle
  the range check, then we can simply call usbnet_change_mtu directly

smsc75xx:
- max_mtu = 9000

CC: netdev@vger.kernel.org
CC: Woojung Huh <woojung.huh@microchip.com>
CC: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
CC: Hayes Wang <hayeswang@realtek.com>
CC: Oliver Neukum <oneukum@suse.com>
CC: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: use net core MTU range checking in more drivers

Somehow, I missed a healthy number of ethernet drivers in the last pass.
Most of these drivers either were in need of an updated max_mtu to make
jumbo frames possible to enable again. In a few cases, also setting a
different min_mtu to match previous lower bounds. There are also a few
drivers that had no upper bounds checking, so they're getting a brand new
ETH_MAX_MTU that is identical to IP_MAX_MTU, but accessible by includes
all ethernet and ethernet-like drivers all have already.

acenic:
- min_mtu = 0, max_mtu = 9000

amazon/ena:
- min_mtu = 128, max_mtu = adapter->max_mtu

amd/xgbe:
- min_mtu = 0, max_mtu = 9000

sb1250:
- min_mtu = 0, max_mtu = 1518

cxgb3:
- min_mtu = 81, max_mtu = 65535

cxgb4:
- min_mtu = 81, max_mtu = 9600

cxgb4vf:
- min_mtu = 81, max_mtu = 65535

benet:
- min_mtu = 256, max_mtu = 9000

ibmveth:
- min_mtu = 68, max_mtu = 65535

ibmvnic:
- min_mtu = adapter->min_mtu, max_mtu = adapter->max_mtu
- remove now redundant ibmvnic_change_mtu

jme:
- min_mtu = 1280, max_mtu = 9202

mv643xx_eth:
- min_mtu = 64, max_mtu = 9500

mlxsw:
- min_mtu = 0, max_mtu = 65535
- Basically bypassing the core checks, and instead relying on dynamic
  checks in the respective switch drivers' ndo_change_mtu functions

ns83820:
- min_mtu = 0
- remove redundant ns83820_change_mtu, only checked for mtu > 1500

netxen:
- min_mtu = 0, max_mtu = 8000 (P2), max_mtu = 9600 (P3)

qlge:
- min_mtu = 1500, max_mtu = 9000
- driver only supports setting mtu to 1500 or 9000, so the core check only
  rules out < 1500 and > 9000, qlge_change_mtu still needs to check that
  the value is 1500 or 9000

qualcomm/emac:
- min_mtu = 46, max_mtu = 9194

xilinx_axienet:
- min_mtu = 64, max_mtu = 9000

Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
CC: netdev@vger.kernel.org
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Netanel Belgazal <netanel@annapurnalabs.com>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Santosh Raspatur <santosh@chelsio.com>
CC: Hariprasad S <hariprasad@chelsio.com>
CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
CC: John Allen <jallen@linux.vnet.ibm.com>
CC: Guo-Fu Tseng <cooldavid@cooldavid.org>
CC: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Manish Chopra <manish.chopra@qlogic.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: Rajesh Borundia <rajesh.borundia@qlogic.com>
CC: Timur Tabi <timur@codeaurora.org>
CC: Anirudha Sarangi <anirudh@xilinx.com>
CC: John Linn <John.Linn@xilinx.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: fix typo in parameter description

Fix typo in parameter description.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: mediatek: use dev_kfree_skb_any instead of dev_kfree_skb

Replace dev_kfree_skb with dev_kfree_skb_any in mtk_start_xmit()
which can be called from hard irq context (netpoll) and from
other contexts. mtk_start_xmit() only frees skbs that it has
dropped.

This is detected by Coccinelle semantic patch.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dwc_eth_qos: use dev_kfree_skb_any instead of dev_kfree_skb

Replace dev_kfree_skb with dev_kfree_skb_any in dwceqos_start_xmit()
which can be called from hard irq context (netpoll) and from
other contexts. dwceqos_start_xmit() only frees skbs that it has
dropped.

This is detected by Coccinelle semantic patch.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: aquantia: add PHY ID of AQR106 and AQR107

The AQR106 and AQR107 can use the existing driver.

Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: drop check for clk==NULL before calling clk_*

clk_prepare, clk_enable and their counterparts (at least the common clk
ones, but also most others) do check for the clk being NULL anyhow (and
return 0 then), so there is no gain when the caller checks, too.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: relax listening_hash operations

softirq handlers use RCU protection to lookup listeners,
and write operations all happen from process context.
We do not need to block BH for dump operations.

Also SYN_RECV since request sockets are stored in the ehash table :

1) inet_diag_dump_icsk() no longer need to clear
    cb->args[3] and cb->args[4] that were used as cursors while
    iterating the old per listener hash table.

2) Also factorize a test : No need to scan listening_hash[]
    if r->id.idiag_dport is not zero.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: smc91x: fix neponset breakage by pxa u16 writes

The patch isolating the u16 writes for pxa assumed all machine_is_*()
calls were removed, and therefore removed the mach-types.h include which
provided them.

Unfortunately 2 machine_is_*() remained in smc91x.c file including
smc91x.h from which the include was removed, triggering the error:
drivers/net/ethernet/smsc/smc91x.c: In function ‘smc_drv_probe’:
drivers/net/ethernet/smsc/smc91x.c:2380:2: error: implicit declaration
of function ‘machine_is_assabet’
[-Werror=implicit-function-declaration]
if (machine_is_assabet() && machine_has_neponset())

This adds back the wrongly removed include.

Fixes: d09d747ae4c2 ("net: smc91x: isolate u16 writes alignment workaround")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ila: Fix tailroom allocation of lwtstate

Tailroom is supposed to be of length sizeof(struct ila_lwt) but
sizeof(struct ila_params) is currently allocated.

This leads to the dst_cache and connected member of ila_lwt being
referenced out of bounds.

struct ila_lwt {
struct ila_params p;
struct dst_cache dst_cache;
u32 connected : 1;
};

Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macb-ethtool-ringparam'

Zach Brown says:

====================
macb: Add ethtool get_ringparam and set_ringparam to cadence

There are use cases like RT that would benefit from being able to tune the
macb rx/tx ring sizes. The ethtool set_ringparam function is the standard way
of doing so.

The first patch changes the hardcoded tx/rx ring sizes to variables that are
set to a hardcoded default.

The second patch implements the get_ringparam and set_ringparam fucntions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: Add ethtool get_ringparam and set_ringparam functionality

Some applications want to tune the size of the macb rx/tx ring buffers.
The ethtool set_ringparam function is the standard way of doing it.

Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: Use variables with defaults for tx/rx ring sizes instead of hardcoded values

The macb driver hardcoded the tx/rx ring sizes. This made it
impossible to change the sizes at run time.

Add tx_ring_size, and rx_ring_size variables to macb object, which
are initilized with default vales during macb_init. Change all
references to RX_RING_SIZE and TX_RING_SIZE to their respective
replacements.

Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arc_emac: use dev_kfree_skb_any instead of dev_kfree_skb

Replace dev_kfree_skb with dev_kfree_skb_any in arc_emac_tx()
which can be called from hard irq context (netpoll) and from
other contexts. arc_emac_tx() only frees skbs that it has
dropped.

This is detected by Coccinelle semantic patch.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ovs-remove-unused'

Jiri Benc says:

====================
openvswitch: remove unused code

Removed unused functions and unnecessary EXPORT_SYMBOLs from openvswitch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: remove unnecessary EXPORT_SYMBOLs

Some symbols exported to other modules are really used only by
openvswitch.ko. Remove the exports.

Tested by loading all 4 openvswitch modules, nothing breaks.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: remove unused functions

ovs_vport_deferred_free is not used anywhere. It's the only caller of
free_vport_rcu thus this one can be removed, too.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers

A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.

In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:

0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0
R4 invalid mem access 'map_value_or_null'

This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fs_enet: Use net_device_stats from struct net_device

Instead of using a private copy of struct net_device_stats in struct
fs_enet_private, use stats from struct net_device. Also remove the now
unnecessary .ndo_get_stats function.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Remove useless set memory to zero use memset()

The memory return by kzalloc() has already be set to zero, so
remove useless memset(0).

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix non static symbol warning

Fixes the following sparse warning:

drivers/net/dsa/mv88e6xxx/chip.c:2866:5: warning:
symbol 'mv88e6xxx_g1_set_switch_mac' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: add new products of Lenovo

Add the following four products of Lenovo and sort the order of the list.

VID PID
0x17ef 0x3062
0x17ef 0x3069
0x17ef 0x720c
0x17ef 0x7214

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vlan: Use sizeof instead of literal number

Use sizeof variable instead of literal number to enhance the readability.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc91x-dt'

Robert Jarzmik says:

====================
support smc91x on mainstone and devicetree

This series aims at bringing support to mainstone board on a
device-tree based build, as what is already in place for legacy
mainstone.

The bulk of the mainstone "specific" behavior is that a u16 write
doesn't work on a address of the form 4*n + 2, while it works on 4*n.

The legacy workaround was in SMC_outw(), with calls to
machine_is_mainstone(). These calls don't work with a pxa27x-dt
machine type, which is used when a generic device-tree pxa27x machine
is used to boot the mainstone board.

Therefore, this series enables the smc91c111 adapter of the mainstone
board to work on a device-tree build, exaclty as it's been working for
years with the legacy arch/arm/mach-pxa/mainstone.c definition.

As a sum up, this extends an existing mechanism to device-tree based
pxa platforms.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: smsc91x: add u16 workaround for pxa platforms

Add a workaround for mainstone, idp and stargate2 boards, for u16 writes
which must be aligned on 32 bits addresses.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: smc91x: take into account half-word workaround

For device-tree builds, platforms such as mainstone, idp and stargate2
must have their u16 writes all aligned on 32 bit boundaries. This is
already enabled in platform data builds, and this patch adds it to
device-tree builds.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: smc91x: isolate u16 writes alignment workaround

Writes to u16 has a special handling on 3 PXA platforms, where the
hardware wiring forces these writes to be u32 aligned.

This patch isolates this handling for PXA platforms as before, but
enables this "workaround" to be set up dynamically, which will be the
case in device-tree build types.

This patch was tested on 2 PXA platforms : mainstone, which relies on
the workaround, and lubbock, which doesn't.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: pxa: enhance smc91x platform data

Instead of having the smc91x driver relying on machine_is_*() calls,
provide this data through platform data, ie. idp, mainstone and
stargate.

This way, the driver doesn't need anymore machine_is_*() calls, which
wouldn't work anymore with a device-tree build.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/sfc: use core min/max MTU checking

Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-led-triggers'

Zach Brown says:

====================
Add support for led triggers on phy link state change

Fix skge driver that declared enum contants that conflicted with enum
constants in linux/leds.h

Create function that encapsulates actions taken during the adjust phy link step
of phy state changes.

Create function that provides list of speeds currently supported by the phy.

Add support for led triggers on phy link state changes by adding
a config option. When set the config option will create a set of led triggers
for each phy device. Users can use the led triggers to represent link state
changes on the phy.

v2:
* New patch that creates phy_adjust_link function to encapsulate actions taken
   when adjusting phy link during phy state changes
* led trigger speed strings changed to match existing phy speed strings
* New function that maps speeds to led triggers
* Replace magic constants with definitions when declaring trigger name
   buffer and number of triggers.
v3:
* Changed LED_ON to LED_REG_ON in skge driver to avoid possible future
   conflict and improve consistency.
* Dropped rtl8712 patch that was accepted separately.
v4:
* tweaked commit message
v5
* Changed commit message to explain relationship between the new triggers and
   leds driven by phys.
* Added new patch that creates phy_supported_speeds function.
* Moved phy_leds_triggers_register and phy_leds_triggers_unregister to
   phy_attach and phy_detach respectively. This change is so the
   phydev->supported field will be filled by the time the triggers are
   registered.
* Changed hardcoded list of triggers to dynamic list determined by speeds
   return by phy_supported_speeds.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: leds: add support for led triggers on phy link state change

Create an option CONFIG_LED_TRIGGER_PHY (default n), which will create a
set of led triggers for each instantiated PHY device. There is one LED
trigger per link-speed, per-phy.
The triggers are registered during phy_attach and unregistered during
phy_detach.

This allows for a user to configure their system to allow a set of LEDs
not controlled by the phy to represent link state changes on the phy.
LEDS controlled by the phy are unaffected.

For example, we have a board where some of the leds in the
RJ45 socket are controlled by the phy, but others are not. Using the
triggers provided by this patch the leds not controlled by the phy can
be configured to show the current speed of the ethernet connection. The
leds controlled by the phy are unaffected.

Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Create phy_supported_speeds function which lists speeds currently supported by a phydevice

phy_supported_speeds provides a means to get a list of all the speeds a
phy device currently supports.

Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Encapsulate actions performed during link state changes into function phy_adjust_link

During phy state machine state transitions some set of actions should
occur whenever the link state changes. These actions should be
encapsulated into a single function

This patch adds the phy_adjust_link function, which is called whenever
phydev->adjust_link would have been called before. Actions that should
occur whenever the phy link is adjusted can now be added to the
phy_adjust_link function.

Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace

Adding led support for phy causes namespace conflicts for some
phy drivers.

The marvel skge driver declared an enum for representing the states of
Link LED Register. The enum contained constant LED_OFF which conflicted
with declartation found in linux/leds.h.
LED_OFF changed to LED_REG_OFF
Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
for consistency.

Signed-off-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netdev-adjacency'

David Ahern says:

====================
net: Fix netdev adjacency tracking

The netdev adjacency tracking is failing to create proper dependencies
for some topologies. For example this topology

        +--------+
        |  myvrf |
        +--------+
          |    |
          |  +---------+
          |  | macvlan |
          |  +---------+
          |    |
      +----------+
      |  bridge  |
      +----------+
          |
      +--------+
      | bond1  |
      +--------+
          |
      +--------+
      |  eth3  |
      +--------+

hits 1 of 2 problems depending on the order of enslavement. The base set of
commands for both cases:

    ip link add bond1 type bond
    ip link set bond1 up
    ip link set eth3 down
    ip link set eth3 master bond1
    ip link set eth3 up

    ip link add bridge type bridge
    ip link set bridge up
    ip link add macvlan link bridge type macvlan
    ip link set macvlan up

    ip link add myvrf type vrf table 1234
    ip link set myvrf up

    ip link set bridge master myvrf

Case 1 enslave macvlan to the vrf before enslaving the bond to the bridge:

    ip link set macvlan master myvrf
    ip link set bond1 master bridge

Attempts to delete the VRF:
    ip link delete myvrf

trigger the BUG in __netdev_adjacent_dev_remove:

[  587.405260] tried to remove device eth3 from myvrf
[  587.407269] ------------[ cut here ]------------
[  587.408918] kernel BUG at /home/dsa/kernel.git/net/core/dev.c:5661!
[  587.411113] invalid opcode: 0000 [#1] SMP
[  587.412454] Modules linked in: macvlan bridge stp llc bonding vrf
[  587.414765] CPU: 0 PID: 726 Comm: ip Not tainted 4.8.0+ #109
[  587.416766] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[  587.420241] task: ffff88013ab6eec0 task.stack: ffffc90000628000
[  587.422163] RIP: 0010:[<ffffffff813cef03>]  [<ffffffff813cef03>] __netdev_adjacent_dev_remove+0x40/0x12c
...
[  587.446053] Call Trace:
[  587.446424]  [<ffffffff813d1542>] __netdev_adjacent_dev_unlink+0x20/0x3c
[  587.447390]  [<ffffffff813d16a3>] netdev_upper_dev_unlink+0xfa/0x15e
[  587.448297]  [<ffffffffa00003a3>] vrf_del_slave+0x13/0x2a [vrf]
[  587.449153]  [<ffffffffa00004a4>] vrf_dev_uninit+0xea/0x114 [vrf]
[  587.450036]  [<ffffffff813d19b0>] rollback_registered_many+0x22b/0x2da
[  587.450974]  [<ffffffff813d1aac>] unregister_netdevice_many+0x17/0x48
[  587.451903]  [<ffffffff813de444>] rtnl_delete_link+0x3c/0x43
[  587.452719]  [<ffffffff813dedcd>] rtnl_dellink+0x180/0x194

When the BUG is converted to a WARN_ON it shows 4 missing adjacencies:
  eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1

All of those are because the __netdev_upper_dev_link function does not
properly link macvlan lower devices to myvrf when it is enslaved.

The second case just flips the ordering of the enslavements:
    ip link set bond1 master bridge
    ip link set macvlan master myvrf

Then run:
    ip link delete bond1
    ip link delete myvrf

The vrf delete command hangs because myvrf has a reference that has not
been released. In this case the removal code does not account for 2 paths
between eth3 and myvrf - one from bridge to vrf and the other through the
macvlan.

Rather than try to maintain a linked list of all upper and lower devices
per netdevice, only track the direct neighbors. The remaining stack can
be determined by recursively walking the neighbors.

The existing netdev_for_each_all_upper_dev_rcu,
netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macros
are replaced with APIs that walk the upper and lower device lists. The
new APIs take a callback function and a data arg that is passed to the
callback for each device in the list. Drivers using the old macros are
converted in separate patches to make it easier on reviewers. It is an
API conversion only; no functional change is intended.

v3
- address Stephen's comment to simplify logic and remove typecasts

v2
- fixed bond0 references in cover-letter
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
  version.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dev: Improve debug statements for adjacency tracking

Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.

In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add warning if any lower device is still in adjacency list

Lower list should be empty just like upper.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove all_adj_list and its references

Only direct adjacencies are maintained. All upper or lower devices can
be learned via the new walk API which recursively walks the adj_list for
upper devices or lower devices.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Flip to the new dev walk API

Convert rocker to the new dev walk API. This is just a code conversion;
no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Flip to the new dev walk API

Convert mlxsw users to new dev walk API. This is just a code conversion;
no functional change is intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Flip to the new dev walk API

Convert ixgbe users to new dev walk API. This is just a code conversion;
no functional change is intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/ipoib: Flip to new dev walk API

Convert ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/core: Flip to the new dev walk API

Convert rdma_is_upper_dev_rcu, handle_netdev_upper and
ipoib_get_net_dev_match_addr to the new upper device walk API.
This is just a code conversion; no functional change is intended.

v2
- removed typecast of data

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bonding: Flip to the new dev walk API

Convert alb_send_learning_packets and bond_has_this_ip to use the new
netdev_walk_all_upper_dev_rcu API. In both cases this is just a code
conversion; no functional change is intended.

v2
- removed typecast of data and simplified bond_upper_dev_walk

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Introduce new api for walking upper and lower devices

This patch introduces netdev_walk_all_upper_dev_rcu,
netdev_walk_all_lower_dev and netdev_walk_all_lower_dev_rcu. These
functions recursively walk the adj_list of devices to determine all upper
and lower devices.

The functions take a callback function that is invoked for each device
in the list. If the callback returns non-0, the walk is terminated and
the functions return that code back to callers.

v3
- simplified netdev_has_upper_dev_all_rcu and __netdev_has_upper_dev and
removed typecast as suggested by Stephen

v2
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
version.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove refnr arg when inserting link adjacencies

Commit 93409033ae65 ("net: Add netdev all_adj_list refcnt propagation to
fix panic") propagated the refnr to insert and remove functions tracking
the netdev adjacency graph. However, for the insert path the refnr can
only be 1. Accordingly, remove the refnr argument to make that clear.
ie., the refnr arg in 93409033ae65 was only needed for the remove path.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-selftests'

Daniel Borkmann says:

====================
Move to BPF selftests

This set improves the test_verifier and test_maps suite and moves
it over to a new BPF selftest directory, so we can keep improving
it under kernel selftest umbrella. This also integrates a test
script for checking test_bpf.ko under various JIT options.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add initial suite for selftests

Add a start of a test suite for kernel selftests. This moves test_verifier
and test_maps over to tools/testing/selftests/bpf/ along with various
code improvements and also adds a script for invoking test_bpf module.
The test suite can simply be run via selftest framework, f.e.:

  # cd tools/testing/selftests/bpf/
  # make
  # make run_tests

Both test_verifier and test_maps were kind of misplaced in samples/bpf/
directory and we were looking into adding them to selftests for a while
now, so it can be picked up by kbuild bot et al and hopefully also get
more exposure and thus new test case additions.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add various tests around spill/fill of regs

Add several spill/fill tests. Besides others, one that performs xadd
on the spilled register, one ldx/stx test where different types are
spilled from two branches and read out from common path. Verfier does
handle all correctly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethernet-use-core-min-max-mtu'

Jarod Wilson says:

====================
ethernet: use core min/max MTU checking

Now that the network stack core min/max MTU checking infrastructure is in
place, time to start making drivers use it. We'll start with the easiest
ones, the ethernet drivers, split roughly by vendor, with a catch-all
patch at the end.

For the most part, every patch does the same essential thing: removes the
MTU range checking from the drivers' ndo_change_mtu function, puts those
ranges into the core net_device min_mtu and max_mtu fields, and where
possible, removes ndo_change_mtu functions entirely.

These patches have all been built through the 0-day build infrastructure
provided by Intel, on top of net-next as of October 17.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: use core min/max MTU checking

et131x: min_mtu 64, max_mtu 9216

altera_tse: min_mtu 64, max_mtu 1500

amd8111e: min_mtu 60, max_mtu 9000

bnad: min_mtu 46, max_mtu 9000

macb: min_mtu 68, max_mtu 1500 or 10240 depending on hardware capability

xgmac: min_mtu 46, max_mtu 9000

cxgb2: min_mtu 68, max_mtu 9582 (pm3393) or 9600 (vsc7326)

enic: min_mtu 68, max_mtu 9000

gianfar: min_mtu 50, max_mu 9586

hns_enet: min_mtu 68, max_mtu 9578 (v1) or 9706 (v2)

ksz884x: min_mtu 60, max_mtu 1894

myri10ge: min_mtu 68, max_mtu 9000

natsemi: min_mtu 64, max_mtu 2024

nfp: min_mtu 68, max_mtu hardware-specific

forcedeth: min_mtu 64, max_mtu 1500 or 9100, depending on hardware

pch_gbe: min_mtu 46, max_mtu 10300

pasemi_mac: min_mtu 64, max_mtu 9000

qcaspi: min_mtu 46, max_mtu 1500
- remove qcaspi_netdev_change_mtu as it is now redundant

rocker: min_mtu 68, max_mtu 9000

sxgbe: min_mtu 68, max_mtu 9000

stmmac: min_mtu 46, max_mtu depends on hardware

tehuti: min_mtu 60, max_mtu 16384
- driver had no max mtu checking, but product docs say 16k jumbo packets
are supported by the hardware

netcp: min_mtu 68, max_mtu 9486
- remove netcp_ndo_change_mtu as it is now redundant

via-velocity: min_mtu 64, max_mtu 9000

octeon: min_mtu 46, max_mtu 65370

CC: netdev@vger.kernel.org
CC: Mark Einon <mark.einon@gmail.com>
CC: Vince Bridgers <vbridger@opensource.altera.com>
CC: Rasesh Mody <rasesh.mody@qlogic.com>
CC: Nicolas Ferre <nicolas.ferre@atmel.com>
CC: Santosh Raspatur <santosh@chelsio.com>
CC: Hariprasad S <hariprasad@chelsio.com>
CC: Christian Benvenuti <benve@cisco.com>
CC: Sujith Sankar <ssujith@cisco.com>
CC: Govindarajulu Varadarajan <_govind@gmx.com>
CC: Neel Patel <neepatel@cisco.com>
CC: Claudiu Manoil <claudiu.manoil@freescale.com>
CC: Yisen Zhuang <yisen.zhuang@huawei.com>
CC: Salil Mehta <salil.mehta@huawei.com>
CC: Hyong-Youb Kim <hykim@myri.com>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Olof Johansson <olof@lixom.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Byungho An <bh74.an@samsung.com>
CC: Girish K S <ks.giri@samsung.com>
CC: Vipul Pandya <vipul.pandya@samsung.com>
CC: Giuseppe Cavallaro <peppe.cavallaro@st.com>
CC: Alexandre Torgue <alexandre.torgue@st.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Wingman Kwok <w-kwok2@ti.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/toshiba: use core min/max MTU checking

gelic_net: min_mtu 64, max_mtu 1518
- remove gelic_net_change_mtu now that it is redundant

spidernet: min_Mtu 64, max_mtu 2294
- remove spiter_net_change_mtu now that it is redundant

CC: netdev@vger.kernel.org
CC: Geoff Levand <geoff@infradead.org>
CC: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/tile: use core min/max MTU checking

tilegx: min_mtu 68, max_mtu 1500 or 9000, depending on modparam
- remove tile_net_change_mtu now that it is fully redundant

tilepro: min_mtu 68, max_mtu 1500
- hardware supports jumbo packets up to 10226, but it's not implemented or
tested yet, according to code comments

CC: netdev@vger.kernel.org
CC: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/ibm: use core min/max MTU checking

ehea: min_mtu 68, max_mtu 9022
- remove ehea_change_mtu, it's now redundant

emac: min_mtu 46, max_mtu 1500 or whatever gets read from OF

CC: netdev@vger.kernel.org
CC: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/cavium: use core min/max MTU checking

liquidio: min_mtu 68, max_mtu 16000

thunder: min_mtu 64, max_mtu 9200

CC: netdev@vger.kernel.org
CC: Sunil Goutham <sgoutham@cavium.com>
CC: Robert Richter <rric@kernel.org>
CC: Derek Chickles <derek.chickles@caviumnetworks.com>
CC: Satanand Burla <satananda.burla@caviumnetworks.com>
CC: Felix Manlunas <felix.manlunas@caviumnetworks.com>
CC: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/neterion: use core min/max MTU checking

s2io: min_mtu 46, max_mtu 9600

vxge: min_mtu 68, max_mtu 9600

CC: netdev@vger.kernel.org
CC: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/dlink: use core min/max MTU checking

dl2k: min_mtu 68, max_mtu 1536 or 8000, depending on hardware
- Removed change_mtu, does nothing productive anymore

sundance: min_mtu 68, max_mtu 8191

CC: netdev@vger.kernel.org
CC: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/sun: use core min/max MTU checking

cassini: min_mtu 60, max_mtu 9000

niu: min_mtu 68, max_mtu 9216

sungem: min_mtu 68, max_mtu 1500 (comments say jumbo mode is broken)

sunvnet: min_mtu 68, max_mtu 65535
- removed sunvnet_change_mut_common as it does nothing now

CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/realtek: use core min/max MTU checking

8139cp: min_mtu 60, max_mtu 4096

8139too: min_mtu 68, max_mtu 1770

r8169: min_mtu 60, max_mtu depends on chipset, 1500 to 9k-ish

CC: netdev@vger.kernel.org
CC: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/qlogic: use core min/max MTU checking

qede: min_mtu 46, max_mtu 9600
- Put define for max in qede.h

qlcnic: min_mtu 68, max_mtu 9600

CC: netdev@vger.kernel.org
CC Dept-GELinuxNICDev@qlogic.com
CC: Yuval Mintz <Yuval.Mintz@qlogic.com>
CC: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/mellanox: use core min/max MTU checking

mlx4: min_mtu 46, max_mtu depends on hardware

mlx5: min_mtu 68, max_mtu depends on hardware

CC: netdev@vger.kernel.org
CC: Tariq Toukan <tariqt@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/marvell: use core min/max MTU checking

mvneta: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
mvneta_change_mtu for simplicity

mvpp2: min_mtu 68, max_mtu 9676
- mtu validation routine mostly did range check, merge back into
mvpp2_change_mtu for simplicity

pxa168_eth: min_mtu 68, max_mtu 9500

skge: min_mtu 60, max_mtu 9000

sky2: min_mtu 68, max_mtu 1500 or 9000, depending on hw

CC: netdev@vger.kernel.org
CC: Mirko Lindner <mlindner@marvell.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/intel: use core min/max MTU checking

e100: min_mtu 68, max_mtu 1500
- remove e100_change_mtu entirely, is identical to old eth_change_mtu,
  and no longer serves a purpose. No need to set min_mtu or max_mtu
  explicitly, as ether_setup() will already set them to 68 and 1500.

e1000: min_mtu 46, max_mtu 16110

e1000e: min_mtu 68, max_mtu varies based on adapter

fm10k: min_mtu 68, max_mtu 15342
- remove fm10k_change_mtu entirely, does nothing now

i40e: min_mtu 68, max_mtu 9706

i40evf: min_mtu 68, max_mtu 9706

igb: min_mtu 68, max_mtu 9216
- There are two different "max" frame sizes claimed and both checked in
  the driver, the larger value wasn't relevant though, so I've set max_mtu
  to the smaller of the two values here to retain identical behavior.

igbvf: min_mtu 68, max_mtu 9216
- Same issue as igb duplicated

ixgb: min_mtu 68, max_mtu 16114
- Also remove pointless old == new check, as that's done in dev_set_mtu

ixgbe: min_mtu 68, max_mtu 9710

ixgbevf: min_mtu 68, max_mtu dependent on hardware/firmware
- Some hw can only handle up to max_mtu 1504 on a vf, others 9710

CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/broadcom: use core min/max MTU checking

tg3: min_mtu 60, max_mtu 9000/1500

bnxt: min_mtu 60, max_mtu 9000

bnx2x: min_mtu 46, max_mtu 9600
- Fix up ETH_OVREHEAD -> ETH_OVERHEAD while we're in here, remove
duplicated defines from bnx2x_link.c.

bnx2: min_mtu 46, max_mtu 9000
- Use more standard ETH_* defines while we're at it.

bcm63xx_enet: min_mtu 46, max_mtu 2028
- compute_hw_mtu was made largely pointless, and thus merged back into
bcm_enet_change_mtu.

b44: min_mtu 60, max_mtu 1500

CC: netdev@vger.kernel.org
CC: Michael Chan <michael.chan@broadcom.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Dept-HSGLinuxNICDev@qlogic.com
CC: Siva Reddy Kallam <siva.kallam@broadcom.com>
CC: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/atheros: use core min/max MTU checking

atl2: min_mtu 40, max_mtu 1504

- Remove a few redundant defines that already have equivalents in
  if_ether.h.

atl1: min_mtu 42, max_mtu 10218

atl1e: min_mtu 42, max_mtu 8170

atl1c: min_mtu 42, max_mtu 6122/1500

- GbE hardware gets a max_mtu of 6122, slower hardware gets 1500.

alx: min_mtu 34, max_mtu 9256

- Not so sure that minimum MTU number is really what was intended, but
  that's what the math actually makes it out to be, due to max_frame
  manipulations and comparison in alx_change_mtu, rather than just
  comparing new_mtu. (I think 68 was the intended min_mtu value).

CC: netdev@vger.kernel.org
CC: Jay Cliburn <jcliburn@gmail.com>
CC: Chris Snook <chris.snook@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dp83867-impedance-control'

Mugunthan V N says:

====================
add support for impedance control for TI dp83867 phy and fix 2nd ethernet on dra72 rev C evm

Add support for configurable impedance control for TI dp83867
phy via devicetree. More documentation in [1].
CPSW second ethernet is not working, fix it by enabling
impedance configuration on the phy.

Verified the patch on DRA72 Rev C evm, logs at [2]. Also pushed
a branch [3] for others to test.

Changes from v3:
* Fixup change log text and no code changes.

Changes from v2:
* Fixed a typo in dts and driver.

Changes from initial version:
* As per Sekhar's comment, instead of passing impedance values,
change to max and min impedance from DT
* Adopted phy_read_mmd_indirect() to cunnrent implementation.
* Corrected the phy delay timings to the optimal value.

[1] - http://www.ti.com/lit/ds/symlink/dp83867ir.pdf
[2] - http://pastebin.ubuntu.com/23343139/
[3] - git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git dp83867-v4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: dra72-evm-revc: fix correct phy delay

The current delay settings of the phy are not the optimal value,
fix it with correct values.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: dra72-evm-revc: add phy impedance settings

The default impedance settings of the phy is not the optimal
value, due to this the second ethernet is not working. Fix it
with correct values which makes the second ethernet port to work.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: add support for MAC impedance configuration

Add support for programmable MAC impedance configuration

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: Add documentation for optional impedance control

Add documention of ti,min-output-impedance and ti,max-output-impedance
which can be used to correct MAC impedance mismatch using phy extended
registers.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: Remove unnecessary comparison of unsigned against 0

args.u.name_type is of type unsigned int and is always >= 0.

This fixes the following GCC warning:

net/8021q/vlan.c: In function ‘vlan_ioctl_handler’:
net/8021q/vlan.c:574:14: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: fix error return code in mac_probe()

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 3933961682a3 ("fsl/fman: Add FMan MAC driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: report right mtu value in error message

Check is for max_mtu but message reports min_mtu.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: nb8800: fix error return code in nb8800_open()

Fix to return error code -ENODEV from the of_phy_connect() error
handling case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>