openwrt/staging/blogic.git
5 years agodsa: Keep link list of tag drivers
Andrew Lunn [Sun, 28 Apr 2019 17:37:16 +0000 (19:37 +0200)]
dsa: Keep link list of tag drivers

Let the tag drivers register themselves with the DSA core, keeping
them in a linked list.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Add boilerplate helper to register DSA tag driver modules
Andrew Lunn [Sun, 28 Apr 2019 17:37:15 +0000 (19:37 +0200)]
dsa: Add boilerplate helper to register DSA tag driver modules

A DSA tag driver module will need to register the tag protocols it
implements with the DSA core. Add macros containing this boiler plate.

The registration/unregistration code is currently just a stub. A Later
patch will add the real implementation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Fix indent of #endif
Rewrite to move list pointer into a new structure
v3
Move kdoc next to macro
Fix THIS_MODULE indentation

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Add TAG protocol to tag ops
Andrew Lunn [Sun, 28 Apr 2019 17:37:14 +0000 (19:37 +0200)]
dsa: Add TAG protocol to tag ops

In order that we can match the tagging protocol a switch driver
request to the tagger, we need to know what protocol the tagger
supports. Add this information to the ops structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
More tag protocol to end of structure to keep hot members at the beginning.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Add MODULE_LICENSE to tag drivers
Andrew Lunn [Sun, 28 Apr 2019 17:37:13 +0000 (19:37 +0200)]
dsa: Add MODULE_LICENSE to tag drivers

All the tag drivers are some variant of GPL. Add a MODULE_LICENSE()
indicating this, so the drivers can later be compiled as modules.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Add MODULE_ALIAS to taggers in preparation to become modules
Andrew Lunn [Sun, 28 Apr 2019 17:37:12 +0000 (19:37 +0200)]
dsa: Add MODULE_ALIAS to taggers in preparation to become modules

When the tag drivers become modules, we will need to dynamically load
them based on what the switch drivers need. Add aliases to map between
the TAG protocol and the driver.

In order to do this, we need the tag protocol number as something
which the C pre-processor can stringinfy. Only the compiler knows the
value of an enum, CPP cannot use them. So add #defines.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Move tagger name into its ops structure
Andrew Lunn [Sun, 28 Apr 2019 17:37:11 +0000 (19:37 +0200)]
dsa: Move tagger name into its ops structure

Rather than keep a list to map a tagger ops to a name, place the name
into the ops structure. This removes the hard coded list, a step
towards making the taggers more dynamic.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2:
Move name to end of structure, keeping the hot entries at the beginning.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodsa: Add SPDX header to tag drivers.
Andrew Lunn [Sun, 28 Apr 2019 17:37:10 +0000 (19:37 +0200)]
dsa: Add SPDX header to tag drivers.

Add an SPDX header, and remove the license boilerplate text.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sun, 28 Apr 2019 12:42:41 +0000 (08:42 -0400)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Introduce BPF socket local storage map so that BPF programs can store
   private data they associate with a socket (instead of e.g. separate hash
   table), from Martin.

2) Add support for bpftool to dump BTF types. This is done through a new
   `bpftool btf dump` sub-command, from Andrii.

3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
   was currently not supported since skb was used to lookup netns, from Stanislav.

4) Add an opt-in interface for tracepoints to expose a writable context
   for attached BPF programs, used here for NBD sockets, from Matt.

5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.

6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
   support tunnels such as sit. Add selftests as well, from Willem.

7) Various smaller misc fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: Delete all hash and TCAM filters before resource cleanup
Vishal Kulkarni [Fri, 26 Apr 2019 08:28:48 +0000 (13:58 +0530)]
cxgb4: Delete all hash and TCAM filters before resource cleanup

During driver unload, hash/TCAM filter deletion doesn't wait for
completion.This patch deletes all the filters with completion before
clearing the resources.

Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Remove legacy probe support
Andrew Lunn [Sat, 27 Apr 2019 17:19:10 +0000 (19:19 +0200)]
net: dsa: mv88e6xxx: Remove legacy probe support

Remove the legacy method of probing the mv88e6xxx driver, now that all
the mainline boards have been converted to use mdio based probing for
a number of cycles.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mv88e6060-cleanups'
David S. Miller [Sun, 28 Apr 2019 00:23:05 +0000 (20:23 -0400)]
Merge branch 'mv88e6060-cleanups'

Andrew Lunn says:

====================
mv88e6060 cleanups

This patchset performs some cleanups of the mv88e6060 DSA driver, as a
step towards making it an MDIO device, rather than use the old probing
method. The changes here are all pretty mechanical and only compile
tested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6060: Replace REG_READ macro
Andrew Lunn [Sat, 27 Apr 2019 17:32:59 +0000 (19:32 +0200)]
net: dsa: mv88e6060: Replace REG_READ macro

The REG_READ macro contains a return statement, making it not very
safe. Remove it by inlining the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6060: Replace REG_WRITE macro
Andrew Lunn [Sat, 27 Apr 2019 17:32:58 +0000 (19:32 +0200)]
net: dsa: mv88e6060: Replace REG_WRITE macro

The REG_WRITE macro contains a return statement, making it not very
safe. Remove it by inlining the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6060: Replace ds with priv
Andrew Lunn [Sat, 27 Apr 2019 17:32:57 +0000 (19:32 +0200)]
net: dsa: mv88e6060: Replace ds with priv

Pass around priv, not ds. This will help with changing to an mdio
driver, and makes this driver more like mv88e6xxx.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6060: Add SPDX header
Andrew Lunn [Sat, 27 Apr 2019 17:32:56 +0000 (19:32 +0200)]
net: dsa: mv88e6060: Add SPDX header

Add an SPDX header, and remove the license text.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoibmvnic: Add device identification to requested IRQs
Murilo Fossa Vicentini [Thu, 25 Apr 2019 14:02:33 +0000 (11:02 -0300)]
ibmvnic: Add device identification to requested IRQs

The ibmvnic driver currently uses the same fixed name when using
request_irq, this makes it hard to parse when multiple VNIC devices are
available at the same time. This patch adds the unit_address as the device
identification along with an id for each queue.

The original idea was to use the interface name as an identifier, but it
is not feasible given these requests happen at adapter probe, and at this
point netdev is not yet registered so it doesn't have the proper name
assigned to it.

Signed-off-by: Murilo Fossa Vicentini <muvic@linux.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Reviewed-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocpsw: Put back cpsw_ndo_poll_controller()
David S. Miller [Sun, 28 Apr 2019 00:08:25 +0000 (20:08 -0400)]
cpsw: Put back cpsw_ndo_poll_controller()

To fix the build.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-ethernet-ti-clean-up-and-optimizations'
David S. Miller [Sat, 27 Apr 2019 21:11:49 +0000 (17:11 -0400)]
Merge branch 'net-ethernet-ti-clean-up-and-optimizations'

Grygorii Strashko says:

====================
net: ethernet: ti: clean up and optimizations

This is a preparation series for introducing new switchbase TI CPSW driver which
was originally introduced [1][2] by Ilias Apalodimas <ilias.apalodimas@linaro.org>
and also discussed in private mails and at Netdev x13 confernce.

Following discussions and suggestions (mostly by Andrew and Ivan) we going
to introduce the new driver which is operating in dual-emac mode
by default, thus working as 2 individual network interfaces.
When both interfaces joined the bridge - CPSW driver will enter a switch
mode and discard dual_mac configuration. The CPSW will be switched back
to dual_mac mode if any port leaves the bridge. All configuration is going to be
implemented via switchdev API.

Hence overall change is already very big I'm sending prerequisite patches which
are mostly minor fixes/clean ups and code refactoring to separate common parts
to be reused by both drivers.
Probably the most serious change from functional point of view is Patch 11.

These patches were NFS boot tetested on TI AM335x/AM437x/AM5xx boards.

These patches can be found at:
 git@git.ti.com:~gragst/ti-linux-kernel/gragsts-ti-linux-kernel.git
 branch: lkml-5.1-cpsw-clean-up-v2

changes in v2:
- added new patch 16 to get rid of force type conversation
- other chages metioned in patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: move ethtool func in separate file
Grygorii Strashko [Fri, 26 Apr 2019 17:12:42 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: move ethtool func in separate file

As a preparatory patch to add support for a switchdev based cpsw driver,
move common ethtool functions to separate cpsw-ethtool.c file so that they
can be used across both drivers. It will simplify CPSW driver code
maintenance also.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: switch to use mac sl api
Grygorii Strashko [Fri, 26 Apr 2019 17:12:41 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: switch to use mac sl api

Switch CPSW driver to use the new MAC SL API.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: introduce mac sl module api
Grygorii Strashko [Fri, 26 Apr 2019 17:12:40 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: introduce mac sl module api

The MAC SL submodule has a lot of common functions between many of TI SoCs
AM335x/AM437x/DRA7(AM57xx), Keystone 2 66AK2HK/E/L/G and K3 AM654, but
there are also differences especially in registers offsets and sets of
supported functions.

This patch introduces the MAC SL submodule API which is intended to provide
a common way to access the MAC SL submodule and hide HW integrations
details.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: move common hw init code in separate func
Grygorii Strashko [Fri, 26 Apr 2019 17:12:39 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: move common hw init code in separate func

move common hw init code in separate function as preparation for adding new
switchdev driver.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: davinci_cpdma: use dma_addr_t for desc_mem_phys and desc_hw_addr
Grygorii Strashko [Fri, 26 Apr 2019 17:12:38 +0000 (20:12 +0300)]
net: ethernet: ti: davinci_cpdma: use dma_addr_t for desc_mem_phys and desc_hw_addr

Use dma_addr_t for desc_mem_phys and desc_hw_addr to avoid types
conversions.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: move cpsw definitions in priv header
Grygorii Strashko [Fri, 26 Apr 2019 17:12:37 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: move cpsw definitions in priv header

As a preparatory patch to add a switchdev based cpsw driver move the common
header definitions to cpsw_priv.h. The plan is to develop a new driver on
switchdev driver model and obsolete the current cpsw driver after all
required functions are added to the new driver. This patch allows the same
header file to be re-used on both drivers during the transition period.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: refactor probe to group common hw initialization
Grygorii Strashko [Fri, 26 Apr 2019 17:12:36 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: refactor probe to group common hw initialization

Rework probe to group common hw initialization:
- group resources request at the beginning of the probe
- move net device initialization and registration at the end of the probe
- drop cpsw_slave_init
as preparation of refactoring of common hw initialization code to
separate function.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: davinci_mdio: use devm_ioremap()
Grygorii Strashko [Fri, 26 Apr 2019 17:12:35 +0000 (20:12 +0300)]
net: ethernet: ti: davinci_mdio: use devm_ioremap()

The Davinci MDIO in most of the case implemented as module inside of TI
CPSW subsystem and fully depends on CPSW to be enabled, but historically
it's implemented as separate Platform device/driver and defined in DT files
in two ways:
- as standalone node
- as child node of CPSW subsystem.

In later case it's required to split CPSW subsystem "reg" property to
exclude MDIO I/O range which is not useful.

Hence, replace devm_ioremap_resource() with devm_ioremap() to allow define
full I/O range in parent CPSW subsystem without spliting.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: ale: do not auto delete mcast super entries
Grygorii Strashko [Fri, 26 Apr 2019 17:12:34 +0000 (20:12 +0300)]
net: ethernet: ti: ale: do not auto delete mcast super entries

Do not delete multicast supervisory packet's (SUPER) entries while flushing
multicast addresses from ALE table cpsw_ale_flush_multicast(). Those
entries have to be added/removed only explicitly.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: fix allmulti cfg in dual_mac mode
Grygorii Strashko [Fri, 26 Apr 2019 17:12:33 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: fix allmulti cfg in dual_mac mode

Now CPSW ALE will set/clean Host port bit in Unregistered Multicast Flood
Mask (UNREG_MCAST_FLOOD_MASK) for every VLAN without checking if this port
belongs to VLAN or not when ALLMULTI mode flag is set for nedev. This is
working in non dual_mac mode, but in dual_mac - it causes
enabling/disabling ALLMULTI flag for both ports.

Hence fix it by adding additional parameter to cpsw_ale_set_allmulti() to
specify ALE port number for which ALLMULTI has to be enabled and check if
port belongs to VLAN before modifying UNREG_MCAST_FLOOD_MASK.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: ale: use define for host port in cpsw_ale_set_allmulti()
Grygorii Strashko [Fri, 26 Apr 2019 17:12:32 +0000 (20:12 +0300)]
net: ethernet: ti: ale: use define for host port in cpsw_ale_set_allmulti()

Use ALE_PORT_HOST define for host port in cpsw_ale_set_allmulti() instead
of constants.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: ale: fix mcast super setting
Grygorii Strashko [Fri, 26 Apr 2019 17:12:31 +0000 (20:12 +0300)]
net: ethernet: ti: ale: fix mcast super setting

Use correct define ALE_SUPER for ALE Multicast Address Table Entry
Supervisory Packet (SUPER) bit setting instead of ALE_BLOCKED. No issues
were observed till now as it have never been set, but it's going to be used
by new CPSW switch driver.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: drop cpsw_tx_packet_submit()
Grygorii Strashko [Fri, 26 Apr 2019 17:12:30 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: drop cpsw_tx_packet_submit()

Drop unnecessary wrapper function cpsw_tx_packet_submit() which is used
only in one place.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: use devm_alloc_etherdev_mqs()
Grygorii Strashko [Fri, 26 Apr 2019 17:12:29 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: use devm_alloc_etherdev_mqs()

Use devm_alloc_etherdev_mqs() and simplify code.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: drop pinctrl_pm_select_default_state call
Grygorii Strashko [Fri, 26 Apr 2019 17:12:28 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: drop pinctrl_pm_select_default_state call

Drop pinctrl_pm_select_default_state call from probe as default
pinctrl state is set by DD core.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: use local var dev in probe
Grygorii Strashko [Fri, 26 Apr 2019 17:12:27 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: use local var dev in probe

Use local variable struct device *dev in probe to simplify code.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: update cpsw_split_res() to accept cpsw_common
Grygorii Strashko [Fri, 26 Apr 2019 17:12:26 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: update cpsw_split_res() to accept cpsw_common

Update cpsw_split_res() to accept struct cpsw_common instead of
struct net_device to simplify code.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config option
Grygorii Strashko [Fri, 26 Apr 2019 17:12:25 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: drop CONFIG_TI_CPSW_ALE config option

All TI drivers CPSW/NETCP can't work without ALE, hence simplify
build of those drivers by always linking cpsw_ale and drop
CONFIG_TI_CPSW_ALE config option.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option
Grygorii Strashko [Fri, 26 Apr 2019 17:12:24 +0000 (20:12 +0300)]
net: ethernet: ti: cpsw: drop TI_DAVINCI_CPDMA config option

Both drivers CPSW and EMAC can't work without CPDMA, hence simplify build
of those drivers by always linking davinci_cpdma and drop TI_DAVINCI_CPDMA
config option.
Note. the davinci_emac driver module was changed to "ti_davinci_emac" to
make build work.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: convert to SPDX license identifiers
Grygorii Strashko [Fri, 26 Apr 2019 17:12:23 +0000 (20:12 +0300)]
net: ethernet: ti: convert to SPDX license identifiers

Replace textual license with SPDX-License-Identifier.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'strict-netlink-validation'
David S. Miller [Sat, 27 Apr 2019 21:07:22 +0000 (17:07 -0400)]
Merge branch 'strict-netlink-validation'

Johannes Berg says:

====================
strict netlink validation

Here's a respin, with the following changes:
 * change message when rejecting unknown attribute types (David Ahern)
 * drop nl80211 patch - I'll apply it separately
 * remove NL_VALIDATE_POLICY - we have a lot of calls to nla_parse()
   that really should be without a policy as it has previously been
   validated - need to find a good way to handle this later
 * include the correct generic netlink change (d'oh, sorry)
====================

Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogenetlink: optionally validate strictly/dumps
Johannes Berg [Fri, 26 Apr 2019 12:07:31 +0000 (14:07 +0200)]
genetlink: optionally validate strictly/dumps

Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.

Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
     {
            .cmd = X,
    +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
            ...
     },
    ...
    };

For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: add strict parsing for future attributes
Johannes Berg [Fri, 26 Apr 2019 12:07:30 +0000 (14:07 +0200)]
netlink: add strict parsing for future attributes

Unfortunately, we cannot add strict parsing for all attributes, as
that would break existing userspace. We currently warn about it, but
that's about all we can do.

For new attributes, however, the story is better: nobody is using
them, so we can reject bad sizes.

Also, for new attributes, we need not accept them when the policy
doesn't declare their usage.

David Ahern and I went back and forth on how to best encode this, and
the best way we found was to have a "boundary type", from which point
on new attributes have all possible validation applied, and NLA_UNSPEC
is rejected.

As we didn't want to add another argument to all functions that get a
netlink policy, the workaround is to encode that boundary in the first
entry of the policy array (which is for type 0 and thus probably not
really valid anyway). I put it into the validation union for the rare
possibility that somebody is actually using attribute 0, which would
continue to work fine unless they tried to use the extended validation,
which isn't likely. We also didn't find any in-tree users with type 0.

The reason for setting the "start strict here" attribute is that we
never really need to start strict from 0, which is invalid anyway (or
in legacy families where that isn't true, it cannot be set to strict),
so we can thus reserve the value 0 for "don't do this check" and don't
have to add the tag to all policies right now.

Thus, policies can now opt in to this validation, which we should do
for all existing policies, at least when adding new attributes.

Note that entirely *new* policies won't need to set it, as the use
of that should be using nla_parse()/nlmsg_parse() etc. which anyway
do fully strict validation now, regardless of this.

So in effect, this patch only covers the "existing command with new
attribute" case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: re-add parse/validate functions in strict mode
Johannes Berg [Fri, 26 Apr 2019 12:07:29 +0000 (14:07 +0200)]
netlink: re-add parse/validate functions in strict mode

This re-adds the parse and validate functions like nla_parse()
that are now actually strict after the previous rename and were
just split out to make sure everything is converted (and if not
compilation of the previous patch would fail.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: make validation more configurable for future strictness
Johannes Berg [Fri, 26 Apr 2019 12:07:28 +0000 (14:07 +0200)]
netlink: make validation more configurable for future strictness

We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: add NLA_MIN_LEN
Johannes Berg [Fri, 26 Apr 2019 12:07:27 +0000 (14:07 +0200)]
netlink: add NLA_MIN_LEN

Rather than using NLA_UNSPEC for this type of thing, use NLA_MIN_LEN
so we can make NLA_UNSPEC be NLA_REJECT under certain conditions for
future attributes.

While at it, also use NLA_EXACT_LEN for the struct example.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'nla_nest_start'
David S. Miller [Sat, 27 Apr 2019 21:03:44 +0000 (17:03 -0400)]
Merge branch 'nla_nest_start'

Michal Kubecek says:

====================
make nla_nest_start() add NLA_F_NESTED flag

One of the comments in recent review of the ethtool netlink series pointed
out that proposed ethnl_nest_start() helper which adds NLA_F_NESTED to
second argument of nla_nest_start() is not really specific to ethtool
netlink code. That is hard to argue with as closer inspection revealed that
exactly the same helper already exists in ipset code (except it's a macro
rather than an inline function).

Another observation was that even if NLA_F_NESTED flag was introduced in
2007, only few netlink based interfaces set it in kernel generated messages
and even many recently added APIs omit it. That is unfortunate as without
the flag, message parsers not familiar with attribute semantics cannot
recognize nested attributes and do not see message structure; this affects
e.g. wireshark dissector or mnl_nlmsg_fprintf() from libmnl.

This is why I'm suggesting to rename existing nla_nest_start() to different
name (nla_nest_start_noflag) and reintroduce nla_nest_start() as a wrapper
adding NLA_F_NESTED flag. This is implemented in first patch which is
mostly generated by spatch. Second patch drops ipset helper macros which
lose their purpose. Third patch cleans up minor coding style issues found
by checkpatch.pl in first patch.

We could leave nla_nest_start() untouched and simply add a wrapper adding
NLA_F_NESTED but that would probably preserve the state when even most new
code doesn't set the flag.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fix two coding style issues
Michal Kubecek [Fri, 26 Apr 2019 09:13:12 +0000 (11:13 +0200)]
net: fix two coding style issues

This is a simple cleanup addressing two coding style issues found by
checkpatch.pl in an earlier patch. It's submitted as a separate patch to
keep the original patch as it was generated by spatch.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipset: drop ipset_nest_start() and ipset_nest_end()
Michal Kubecek [Fri, 26 Apr 2019 09:13:09 +0000 (11:13 +0200)]
ipset: drop ipset_nest_start() and ipset_nest_end()

After the previous commit, both ipset_nest_start() and ipset_nest_end() are
just aliases for nla_nest_start() and nla_nest_end() so that there is no
need to keep them.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: make nla_nest_start() add NLA_F_NESTED flag
Michal Kubecek [Fri, 26 Apr 2019 09:13:06 +0000 (11:13 +0200)]
netlink: make nla_nest_start() add NLA_F_NESTED flag

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-tls-small-code-cleanup'
David S. Miller [Sat, 27 Apr 2019 20:52:22 +0000 (16:52 -0400)]
Merge branch 'net-tls-small-code-cleanup'

Jakub Kicinski says:

====================
net/tls: small code cleanup

This small patch set cleans up tls (mostly offload parts).
Other than avoiding unnecessary error messages - no functional
changes here.

v2 (Saeed):
 - fix up Review tags;
 - remove the warning on failure completely.
====================

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: byte swap device req TCP seq no upon setting
Jakub Kicinski [Thu, 25 Apr 2019 19:32:04 +0000 (12:32 -0700)]
net/tls: byte swap device req TCP seq no upon setting

To avoid a sparse warning byteswap the be32 sequence number
before it's stored in the atomic value.  While at it drop
unnecessary brackets and use kernel's u64 type.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: move definition of tls ops into net/tls.h
Jakub Kicinski [Thu, 25 Apr 2019 19:32:03 +0000 (12:32 -0700)]
net/tls: move definition of tls ops into net/tls.h

There seems to be no reason for tls_ops to be defined in netdevice.h
which is included in a lot of places.  Don't wrap the struct/enum
declaration in ifdefs, it trickles down unnecessary ifdefs into
driver code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: remove old exports of sk_destruct functions
Jakub Kicinski [Thu, 25 Apr 2019 19:32:02 +0000 (12:32 -0700)]
net/tls: remove old exports of sk_destruct functions

tls_device_sk_destruct being set on a socket used to indicate
that socket is a kTLS device one.  That is no longer true -
now we use sk_validate_xmit_skb pointer for that purpose.
Remove the export.  tls_device_attach() needs to be moved.

While at it, remove the dead declaration of tls_sk_destruct().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: don't log errors every time offload can't proceed
Jakub Kicinski [Thu, 25 Apr 2019 19:32:01 +0000 (12:32 -0700)]
net/tls: don't log errors every time offload can't proceed

Currently when CONFIG_TLS_DEVICE is set each time kTLS
connection is opened and the offload is not successful
(either because the underlying device doesn't support
it or e.g. it's tables are full) a rate limited error
will be printed to the logs.

There is nothing wrong with failing TLS offload.  SW
path will process the packets just fine, drop the
noisy messages.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'sk-local-storage'
Alexei Starovoitov [Sat, 27 Apr 2019 16:07:49 +0000 (09:07 -0700)]
Merge branch 'sk-local-storage'

Martin KaFai Lau says:

====================
v4:
- Move checks to map_alloc_check in patch 1 (Stanislav Fomichev)
- Refactor BTF encoding macros to test_btf.h at
  a new patch 4 (Stanislav Fomichev)
- Refactor getenv and add print PASS message at the
  end of the test in patch 6 (Yonghong Song)

v3:
- Replace spinlock_types.h with spinlock.h in patch 1
  (kbuild test robot <lkp@intel.com>)

v2:
- Add the "test_maps.h" file in patch 5

This series introduces the BPF sk local storage.  The
details is in the patch 1 commit message.
====================

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Add ene-to-end test for bpf_sk_storage_* helpers
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:54 +0000 (16:39 -0700)]
bpf: Add ene-to-end test for bpf_sk_storage_* helpers

This patch rides on an existing BPF_PROG_TYPE_CGROUP_SKB test
(test_sock_fields.c) to do a TCP end-to-end test on the new
bpf_sk_storage_* helpers.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:52 +0000 (16:39 -0700)]
bpf: Add BPF_MAP_TYPE_SK_STORAGE test to test_maps

This patch adds BPF_MAP_TYPE_SK_STORAGE test to test_maps.
The src file is rather long, so it is put into another dir map_tests/
and compile like the current prog_tests/ does.  Other existing
tests in test_maps can also be re-factored into map_tests/ in the
future.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Add verifier tests for the bpf_sk_storage
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:49 +0000 (16:39 -0700)]
bpf: Add verifier tests for the bpf_sk_storage

This patch adds verifier tests for the bpf_sk_storage:
1. ARG_PTR_TO_MAP_VALUE_OR_NULL
2. Map and helper compatibility (e.g. disallow bpf_map_loookup_elem)

It also takes this chance to remove the unused struct btf_raw_data
and uses the BTF encoding macros from "test_btf.h".

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Refactor BTF encoding macro to test_btf.h
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:46 +0000 (16:39 -0700)]
bpf: Refactor BTF encoding macro to test_btf.h

Refactor common BTF encoding macros for other tests to use.
The libbpf may reuse some of them in the future  which requires
some more thoughts before publishing as a libbpf API.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probing
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:44 +0000 (16:39 -0700)]
bpf: Support BPF_MAP_TYPE_SK_STORAGE in bpf map probing

This patch supports probing for the new BPF_MAP_TYPE_SK_STORAGE.
BPF_MAP_TYPE_SK_STORAGE enforces BTF usage, so the new probe
requires to create and load a BTF also.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Sync bpf.h to tools
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:42 +0000 (16:39 -0700)]
bpf: Sync bpf.h to tools

This patch sync the bpf.h to tools/.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: Introduce bpf sk local storage
Martin KaFai Lau [Fri, 26 Apr 2019 23:39:39 +0000 (16:39 -0700)]
bpf: Introduce bpf sk local storage

After allowing a bpf prog to
- directly read the skb->sk ptr
- get the fullsock bpf_sock by "bpf_sk_fullsock()"
- get the bpf_tcp_sock by "bpf_tcp_sock()"
- get the listener sock by "bpf_get_listener_sock()"
- avoid duplicating the fields of "(bpf_)sock" and "(bpf_)tcp_sock"
  into different bpf running context.

this patch is another effort to make bpf's network programming
more intuitive to do (together with memory and performance benefit).

When bpf prog needs to store data for a sk, the current practice is to
define a map with the usual 4-tuples (src/dst ip/port) as the key.
If multiple bpf progs require to store different sk data, multiple maps
have to be defined.  Hence, wasting memory to store the duplicated
keys (i.e. 4 tuples here) in each of the bpf map.
[ The smallest key could be the sk pointer itself which requires
  some enhancement in the verifier and it is a separate topic. ]

Also, the bpf prog needs to clean up the elem when sk is freed.
Otherwise, the bpf map will become full and un-usable quickly.
The sk-free tracking currently could be done during sk state
transition (e.g. BPF_SOCK_OPS_STATE_CB).

The size of the map needs to be predefined which then usually ended-up
with an over-provisioned map in production.  Even the map was re-sizable,
while the sk naturally come and go away already, this potential re-size
operation is arguably redundant if the data can be directly connected
to the sk itself instead of proxy-ing through a bpf map.

This patch introduces sk->sk_bpf_storage to provide local storage space
at sk for bpf prog to use.  The space will be allocated when the first bpf
prog has created data for this particular sk.

The design optimizes the bpf prog's lookup (and then optionally followed by
an inline update).  bpf_spin_lock should be used if the inline update needs
to be protected.

BPF_MAP_TYPE_SK_STORAGE:
-----------------------
To define a bpf "sk-local-storage", a BPF_MAP_TYPE_SK_STORAGE map (new in
this patch) needs to be created.  Multiple BPF_MAP_TYPE_SK_STORAGE maps can
be created to fit different bpf progs' needs.  The map enforces
BTF to allow printing the sk-local-storage during a system-wise
sk dump (e.g. "ss -ta") in the future.

The purpose of a BPF_MAP_TYPE_SK_STORAGE map is not for lookup/update/delete
a "sk-local-storage" data from a particular sk.
Think of the map as a meta-data (or "type") of a "sk-local-storage".  This
particular "type" of "sk-local-storage" data can then be stored in any sk.

The main purposes of this map are mostly:
1. Define the size of a "sk-local-storage" type.
2. Provide a similar syscall userspace API as the map (e.g. lookup/update,
   map-id, map-btf...etc.)
3. Keep track of all sk's storages of this "type" and clean them up
   when the map is freed.

sk->sk_bpf_storage:
------------------
The main lookup/update/delete is done on sk->sk_bpf_storage (which
is a "struct bpf_sk_storage").  When doing a lookup,
the "map" pointer is now used as the "key" to search on the
sk_storage->list.  The "map" pointer is actually serving
as the "type" of the "sk-local-storage" that is being
requested.

To allow very fast lookup, it should be as fast as looking up an
array at a stable-offset.  At the same time, it is not ideal to
set a hard limit on the number of sk-local-storage "type" that the
system can have.  Hence, this patch takes a cache approach.
The last search result from sk_storage->list is cached in
sk_storage->cache[] which is a stable sized array.  Each
"sk-local-storage" type has a stable offset to the cache[] array.
In the future, a map's flag could be introduced to do cache
opt-out/enforcement if it became necessary.

The cache size is 16 (i.e. 16 types of "sk-local-storage").
Programs can share map.  On the program side, having a few bpf_progs
running in the networking hotpath is already a lot.  The bpf_prog
should have already consolidated the existing sock-key-ed map usage
to minimize the map lookup penalty.  16 has enough runway to grow.

All sk-local-storage data will be removed from sk->sk_bpf_storage
during sk destruction.

bpf_sk_storage_get() and bpf_sk_storage_delete():
------------------------------------------------
Instead of using bpf_map_(lookup|update|delete)_elem(),
the bpf prog needs to use the new helper bpf_sk_storage_get() and
bpf_sk_storage_delete().  The verifier can then enforce the
ARG_PTR_TO_SOCKET argument.  The bpf_sk_storage_get() also allows to
"create" new elem if one does not exist in the sk.  It is done by
the new BPF_SK_STORAGE_GET_F_CREATE flag.  An optional value can also be
provided as the initial value during BPF_SK_STORAGE_GET_F_CREATE.
The BPF_MAP_TYPE_SK_STORAGE also supports bpf_spin_lock.  Together,
it has eliminated the potential use cases for an equivalent
bpf_map_update_elem() API (for bpf_prog) in this patch.

Misc notes:
----------
1. map_get_next_key is not supported.  From the userspace syscall
   perspective,  the map has the socket fd as the key while the map
   can be shared by pinned-file or map-id.

   Since btf is enforced, the existing "ss" could be enhanced to pretty
   print the local-storage.

   Supporting a kernel defined btf with 4 tuples as the return key could
   be explored later also.

2. The sk->sk_lock cannot be acquired.  Atomic operations is used instead.
   e.g. cmpxchg is done on the sk->sk_bpf_storage ptr.
   Please refer to the source code comments for the details in
   synchronization cases and considerations.

3. The mem is charged to the sk->sk_omem_alloc as the sk filter does.

Benchmark:
---------
Here is the benchmark data collected by turning on
the "kernel.bpf_stats_enabled" sysctl.
Two bpf progs are tested:

One bpf prog with the usual bpf hashmap (max_entries = 8192) with the
sk ptr as the key. (verifier is modified to support sk ptr as the key
That should have shortened the key lookup time.)

Another bpf prog is with the new BPF_MAP_TYPE_SK_STORAGE.

Both are storing a "u32 cnt", do a lookup on "egress_skb/cgroup" for
each egress skb and then bump the cnt.  netperf is used to drive
data with 4096 connected UDP sockets.

BPF_MAP_TYPE_HASH with a modifier verifier (152ns per bpf run)
27: cgroup_skb  name egress_sk_map  tag 74f56e832918070b run_time_ns 58280107540 run_cnt 381347633
    loaded_at 2019-04-15T13:46:39-0700  uid 0
    xlated 344B  jited 258B  memlock 4096B  map_ids 16
    btf_id 5

BPF_MAP_TYPE_SK_STORAGE in this patch (66ns per bpf run)
30: cgroup_skb  name egress_sk_stora  tag d4aa70984cc7bbf6 run_time_ns 25617093319 run_cnt 390989739
    loaded_at 2019-04-15T13:47:54-0700  uid 0
    xlated 168B  jited 156B  memlock 4096B  map_ids 17
    btf_id 6

Here is a high-level picture on how are the objects organized:

       sk
    ┌──────┐
    │      │
    │      │
    │      │
    │*sk_bpf_storage─────▶ bpf_sk_storage
    └──────┘                 ┌───────┐
                 ┌───────────┤ list  │
                 │           │       │
                 │           │       │
                 │           │       │
                 │           └───────┘
                 │
                 │     elem
                 │  ┌────────┐
                 ├─▶│ snode  │
                 │  ├────────┤
                 │  │  data  │          bpf_map
                 │  ├────────┤        ┌─────────┐
                 │  │map_node│◀─┬─────┤  list   │
                 │  └────────┘  │     │         │
                 │              │     │         │
                 │     elem     │     │         │
                 │  ┌────────┐  │     └─────────┘
                 └─▶│ snode  │  │
                    ├────────┤  │
   bpf_map          │  data  │  │
 ┌─────────┐        ├────────┤  │
 │  list   ├───────▶│map_node│  │
 │         │        └────────┘  │
 │         │                    │
 │         │           elem     │
 └─────────┘        ┌────────┐  │
                 ┌─▶│ snode  │  │
                 │  ├────────┤  │
                 │  │  data  │  │
                 │  ├────────┤  │
                 │  │map_node│◀─┘
                 │  └────────┘
                 │
                 │
                 │          ┌───────┐
     sk          └──────────│ list  │
  ┌──────┐                  │       │
  │      │                  │       │
  │      │                  │       │
  │      │                  └───────┘
  │*sk_bpf_storage───────▶bpf_sk_storage
  └──────┘

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoMerge branch 'writeable-bpf-tracepoints'
Alexei Starovoitov [Sat, 27 Apr 2019 02:05:44 +0000 (19:05 -0700)]
Merge branch 'writeable-bpf-tracepoints'

Matt Mullins says:

====================
This adds an opt-in interface for tracepoints to expose a writable context to
BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE programs that are attached, while
supporting read-only access from existing BPF_PROG_TYPE_RAW_TRACEPOINT
programs, as well as from non-BPF-based tracepoints.

The initial motivation is to support tracing that can be observed from the
remote end of an NBD socket, e.g. by adding flags to the struct nbd_request
header.  Earlier attempts included adding an NBD-specific tracepoint fd, but in
code review, I was recommended to implement it more generically -- as a result,
this patchset is far simpler than my initial try.

v4->v5:
  * rebased onto bpf-next/master and fixed merge conflicts
  * "tools: sync bpf.h" also syncs comments that have previously changed
    in bpf-next

v3->v4:
  * fixed a silly copy/paste typo in include/trace/events/bpf_test_run.h
    (_TRACE_NBD_H -> _TRACE_BPF_TEST_RUN_H)
  * fixed incorrect/misleading wording in patch 1's commit message,
    since the pointer cannot be directly dereferenced in a
    BPF_PROG_TYPE_RAW_TRACEPOINT
  * cleaned up the error message wording if the prog_tests fail
  * Addressed feedback from Yonghong
    * reject non-pointer-sized accesses to the buffer pointer
    * use sizeof(struct nbd_request) as one-byte-past-the-end in
      raw_tp_writable_reject_nbd_invalid.c
    * use BPF_MOV64_IMM instead of BPF_LD_IMM64

v2->v3:
  * Andrew addressed Josef's comments:
    * C-style commenting in nbd.c
    * Collapsed identical events into a single DECLARE_EVENT_CLASS.
      This saves about 2kB of kernel text

v1->v2:
  * add selftests
    * sync tools/include/uapi/linux/bpf.h
  * reject variable offset into the buffer
  * add string representation of PTR_TO_TP_BUFFER to reg_type_str
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoselftests: bpf: test writable buffers in raw tps
Matt Mullins [Fri, 26 Apr 2019 18:49:51 +0000 (11:49 -0700)]
selftests: bpf: test writable buffers in raw tps

This tests that:
  * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
    uses either:
    * a variable offset to the tracepoint buffer, or
    * an offset beyond the size of the tracepoint buffer
  * a tracer can modify the buffer provided when attached to a writable
    tracepoint in bpf_prog_test_run

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agotools: sync bpf.h
Matt Mullins [Fri, 26 Apr 2019 18:49:50 +0000 (11:49 -0700)]
tools: sync bpf.h

This adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, and fixes up the

error: enumeration value ‘BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE’ not handled in switch [-Werror=switch-enum]

build errors it would otherwise cause in libbpf.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agonbd: add tracepoints for send/receive timing
Andrew Hall [Fri, 26 Apr 2019 18:49:49 +0000 (11:49 -0700)]
nbd: add tracepoints for send/receive timing

This adds four tracepoints to nbd, enabling separate tracing of payload
and header sending/receipt.

In the send path for headers that have already been sent, we also
explicitly initialize the handle so it can be referenced by the later
tracepoint.

Signed-off-by: Andrew Hall <hall@fb.com>
Signed-off-by: Matt Mullins <mmullins@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agonbd: trace sending nbd requests
Matt Mullins [Fri, 26 Apr 2019 18:49:48 +0000 (11:49 -0700)]
nbd: trace sending nbd requests

This adds a tracepoint that can both observe the nbd request being sent
to the server, as well as modify that request , e.g., setting a flag in
the request that will cause the server to collect detailed tracing data.

The struct request * being handled is included to permit correlation
with the block tracepoints.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf: add writable context for raw tracepoints
Matt Mullins [Fri, 26 Apr 2019 18:49:47 +0000 (11:49 -0700)]
bpf: add writable context for raw tracepoints

This is an opt-in interface that allows a tracepoint to provide a safe
buffer that can be written from a BPF_PROG_TYPE_RAW_TRACEPOINT program.
The size of the buffer must be a compile-time constant, and is checked
before allowing a BPF program to attach to a tracepoint that uses this
feature.

The pointer to this buffer will be the first argument of tracepoints
that opt in; the pointer is valid and can be bpf_probe_read() by both
BPF_PROG_TYPE_RAW_TRACEPOINT and BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE
programs that attach to such a tracepoint, but the buffer to which it
points may only be written by the latter.

Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Daniel Borkmann [Fri, 26 Apr 2019 19:48:22 +0000 (21:48 +0200)]
bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd

Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agobpf, arm64: remove prefetch insn in xadd mapping
Daniel Borkmann [Fri, 26 Apr 2019 19:48:21 +0000 (21:48 +0200)]
bpf, arm64: remove prefetch insn in xadd mapping

Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.

Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
5 years agoMerge tag 'mac80211-next-for-davem-2019-04-26' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Fri, 26 Apr 2019 20:05:52 +0000 (16:05 -0400)]
Merge tag 'mac80211-next-for-davem-2019-04-26' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Various updates, notably:
 * extended key ID support (from 802.11-2016)
 * per-STA TX power control support
 * mac80211 TX performance improvements
 * HE (802.11ax) updates
 * mesh link probing support
 * enhancements of multi-BSSID support (also related to HE)
 * OWE userspace processing support
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Fri, 26 Apr 2019 16:13:29 +0000 (12:13 -0400)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.

[patch 1/11 - 3/11] fixes some bugs about the IO path

[patch 4/11 - 6/11] includes some optimization and bugfixes
about mailbox handling

[patch 7/11 - 11/11] adds misc code optimizations and bugfixes.

Change log:
V2->V3: fixes comments from Neil Horman, removes [patch 8/12]
V1->V2: adds modification on [patch 8/12]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove reset after command send failed
Weihang Li [Thu, 25 Apr 2019 12:42:55 +0000 (20:42 +0800)]
net: hns3: remove reset after command send failed

It's meaningless to trigger reset when failed to send command to IMP,
because the failure is usually caused by no authority, illegal command
and so on. When that happened, we just need to return the status code
for further debugging.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: prevent double free in hns3_put_ring_config()
Huazhong Tan [Thu, 25 Apr 2019 12:42:54 +0000 (20:42 +0800)]
net: hns3: prevent double free in hns3_put_ring_config()

This patch adds a check for the hns3_put_ring_config() to prevent
double free, and for more readable, move the NULL assignment of
priv->ring_data into the hns3_put_ring_config().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: extend the loopback state acquisition time
liuzhongzhu [Thu, 25 Apr 2019 12:42:53 +0000 (20:42 +0800)]
net: hns3: extend the loopback state acquisition time

The test results show that the maximum time of hardware return
to mac link state is 500MS.The software needs to set twice the
maximum time of hardware return state (1000MS).

If not modified, the loopback test returns probability failure.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix pause configure fail problem
Huazhong Tan [Thu, 25 Apr 2019 12:42:52 +0000 (20:42 +0800)]
net: hns3: fix pause configure fail problem

When configure pause, current implementation returns directly
after setup PFC without setup BP, which is not sufficient.

So this patch fixes it, only return while setting PFC failed.

Fixes: 44e59e375bf7 ("net: hns3: do not return GE PFC setting err when initializing")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: not reset TQP in the DOWN while VF resetting
Huazhong Tan [Thu, 25 Apr 2019 12:42:51 +0000 (20:42 +0800)]
net: hns3: not reset TQP in the DOWN while VF resetting

Since the hardware does not handle mailboxes and the hardware
reset include TQP reset, so it is unnecessary to reset TQP
in the hclgevf_ae_stop() while doing VF reset. Also it is
unnecessary to reset the remaining TQP when one reset fails.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use a reserved byte to identify need_resp flag
Huazhong Tan [Thu, 25 Apr 2019 12:42:50 +0000 (20:42 +0800)]
net: hns3: use a reserved byte to identify need_resp flag

This patch uses a reserved byte in the hclge_mbx_vf_to_pf_cmd
to save the need_resp flag, so when PF received the mailbox,
it can use it to decise whether send a response to VF.

For hclge_set_vf_uc_mac_addr(), it should use mbx_need_resp flag
to decide whether send response to VF.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use atomic_t replace u32 for arq's count
Huazhong Tan [Thu, 25 Apr 2019 12:42:49 +0000 (20:42 +0800)]
net: hns3: use atomic_t replace u32 for arq's count

Since irq handler and mailbox task will both update arq's count,
so arq's count should use atomic_t instead of u32, otherwise
its value may go wrong finally.

Fixes: 07a0556a3a73 ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: stop sending keep alive msg when VF command queue needs reinit
Huazhong Tan [Thu, 25 Apr 2019 12:42:48 +0000 (20:42 +0800)]
net: hns3: stop sending keep alive msg when VF command queue needs reinit

HCLGEVF_STATE_CMD_DISABLE is more suitable than
HCLGEVF_STATE_RST_HANDLING to stop sending keep alive msg,
since HCLGEVF_STATE_RST_HANDLING only be set when the reset
task is running.

Fixes: c59a85c07e77 ("net: hns3: stop sending keep alive msg to PF when VF is resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: handle the BD info on the last BD of the packet
Yunsheng Lin [Thu, 25 Apr 2019 12:42:47 +0000 (20:42 +0800)]
net: hns3: handle the BD info on the last BD of the packet

The bdinfo handled in hns3_handle_bdinfo is only valid on the
last BD of the current packet, currently the bd info may be handled
based on the first BD if the packet has more than two BDs, which
may cause rx error.

This patch fixes it by using the last BD of the current packet in
hns3_handle_bdinfo.

Also, hns3_set_rx_skb_rss_type has used RSS hash value from the last
BD of the current packet, so remove the same last BD calculation in
hns3_set_rx_skb_rss_type and call it from hns3_handle_bdinfo.

Fixes: e55970950556 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for TX clean num when cleaning TX BD
Yunsheng Lin [Thu, 25 Apr 2019 12:42:46 +0000 (20:42 +0800)]
net: hns3: fix for TX clean num when cleaning TX BD

hns3_desc_unused() returns how many BD have been cleaned, but new
buffer has not been attached to them. The register of
HNS3_RING_RX_RING_FBDNUM_REG returns how many BD need allocating new
buffer to or need to cleaned. So the remaining BD need to be clean
is HNS3_RING_RX_RING_FBDNUM_REG - hns3_desc_unused().

Also, new buffer can not attach to the pending BD when the last BD is
not handled, because memcpy has not been done on the first pending BD.

This patch fixes by subtracting the pending BD num from unused_count
after 'HNS3_RING_RX_RING_FBDNUM_REG - unused_count' is used to calculate
the BD bum need to be clean.

Fixes: e55970950556 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix data race between ring->next_to_clean
Yunsheng Lin [Thu, 25 Apr 2019 12:42:45 +0000 (20:42 +0800)]
net: hns3: fix data race between ring->next_to_clean

hns3_clean_tx_ring calls hns3_nic_reclaim_one_desc to clean
buffers and set ring->next_to_clean, then hns3_nic_net_xmit
reuses the cleaned buffers. But there are no memory barriers
when buffers gets recycled, so the recycled buffers can be
corrupted.

This patch uses smp_store_release to update ring->next_to_clean
and smp_load_acquire to read ring->next_to_clean to properly
hand off buffers from hns3_clean_tx_ring to hns3_nic_net_xmit.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: implement PCI driver shutdown callback
Dirk van der Merwe [Thu, 25 Apr 2019 03:18:02 +0000 (20:18 -0700)]
nfp: implement PCI driver shutdown callback

Device may be shutdown without the hardware being reinitialized, in
which case we want to ensure we cleanup properly.

This is especially important for kexec with traffic flowing.

The shutdown procedures resembles the remove procedures, so we can reuse
those common tasks.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoNFC: st95hf: remove set but not used variables 'dev, nfcddev'
YueHaibing [Thu, 25 Apr 2019 02:07:20 +0000 (02:07 +0000)]
NFC: st95hf: remove set but not used variables 'dev, nfcddev'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/nfc/st95hf/core.c: In function 'st95hf_irq_thread_handler':
drivers/nfc/st95hf/core.c:786:26: warning:
 variable 'nfcddev' set but not used [-Wunused-but-set-variable]

drivers/nfc/st95hf/core.c:784:17: warning:
 variable 'dev' set but not used [-Wunused-but-set-variable]

They are never used since introduction in
commit cab47333f0f7 ("NFC: Add STMicroelectronics ST95HF driver")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: remove rcu_read_unlock() left in tipc_udp_recv()
Eric Dumazet [Thu, 25 Apr 2019 00:21:40 +0000 (17:21 -0700)]
tipc: remove rcu_read_unlock() left in tipc_udp_recv()

I forgot to remove one rcu_read_unlock() before a return statement.

Joy of mixing goto and return styles in a function :)

Fixes: 4109a2c3b91e ("tipc: tipc_udp_recv() cleanup vs rcu verbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: improve genphy_soft_reset
Heiner Kallweit [Wed, 24 Apr 2019 19:41:06 +0000 (21:41 +0200)]
net: phy: improve genphy_soft_reset

PHY's behave differently when being reset. Some reset registers to
defaults, some don't. Some trigger an autoneg restart, some don't.

So let's also set the autoneg restart bit when resetting. Then PHY
behavior should be more consistent. Clearing BMCR_ISOLATE serves the
same purpose and is borrowed from genphy_restart_aneg.

BMCR holds the speed / duplex settings in fixed mode. Therefore
we may have an issue if a soft reset resets BMCR to its default.
So better call genphy_setup_forced() afterwards in fixed mode.
We've seen no related complaint in the last >10 yrs, so let's
treat it as an improvement.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agousbnet: ipheth: Simplify device detection
Guenter Roeck [Wed, 24 Apr 2019 17:58:24 +0000 (10:58 -0700)]
usbnet: ipheth: Simplify device detection

All Apple products use the same protocol for tethering over USB.
To simplify the code and make it future proof, use
USB_VENDOR_AND_INTERFACE_INFO() instead of
USB_DEVICE_AND_INTERFACE_INFO() to automatically detect and support
all existing and future Apple products using the same interface.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Initialize fib6_result in bpf_ipv6_fib_lookup
David Ahern [Wed, 24 Apr 2019 17:36:06 +0000 (10:36 -0700)]
ipv6: Initialize fib6_result in bpf_ipv6_fib_lookup

fib6_result is not initialized in bpf_ipv6_fib_lookup and potentially
passses garbage to the fib lookup which triggers a KASAN warning:

[  262.055450] ==================================================================
[  262.057640] BUG: KASAN: user-memory-access in fib6_rule_suppress+0x4b/0xce
[  262.059488] Read of size 8 at addr 00000a20000000b0 by task swapper/1/0
[  262.061238]
[  262.061673] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.1.0-rc5+ #56
[  262.063493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[  262.065593] Call Trace:
[  262.066277]  <IRQ>
[  262.066848]  dump_stack+0x7e/0xbb
[  262.067764]  kasan_report+0x18b/0x1b5
[  262.069921]  __asan_load8+0x7f/0x81
[  262.070879]  fib6_rule_suppress+0x4b/0xce
[  262.071980]  fib_rules_lookup+0x275/0x2cd
[  262.073090]  fib6_lookup+0x119/0x218
[  262.076457]  bpf_ipv6_fib_lookup+0x39d/0x664
...

Initialize fib6_result to 0.

Fixes: b1d40991506aa ("ipv6: Rename fib6_multipath_select and pass fib6_result")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocnic: Refactor code and mark expected switch fall-through
Gustavo A. R. Silva [Wed, 24 Apr 2019 16:37:32 +0000 (11:37 -0500)]
cnic: Refactor code and mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, refactor code a
bit and mark switch cases where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/broadcom/cnic.c: In function ‘cnic_cm_process_kcqe’:
drivers/net/ethernet/broadcom/cnic.c:4044:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
    opcode = L4_KCQE_OPCODE_VALUE_CLOSE_COMP;
drivers/net/ethernet/broadcom/cnic.c:4050:2: note: here
  case L4_KCQE_OPCODE_VALUE_RESET_RECEIVED:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4/cxgb4vf_main: Mark expected switch fall-through
Gustavo A. R. Silva [Wed, 24 Apr 2019 16:27:42 +0000 (11:27 -0500)]
cxgb4/cxgb4vf_main: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c: In function ‘fwevtq_handler’:
drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c:520:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
   cpl = (void *)p;
   ~~~~^~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c:524:2: note: here
  case CPL_SGE_EGR_UPDATE: {
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowimax/i2400m/control: Mark expected switch fall-through
Gustavo A. R. Silva [Wed, 24 Apr 2019 16:20:16 +0000 (11:20 -0500)]
wimax/i2400m/control: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warning:

In file included from drivers/net/wimax/i2400m/debug-levels.h:30,
                 from drivers/net/wimax/i2400m/control.c:86:
drivers/net/wimax/i2400m/control.c: In function ‘i2400m_report_tlv_system_state’:
./include/linux/wimax/debug.h:200:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
 do {         \
    ^
./include/linux/wimax/debug.h:404:36: note: in expansion of macro ‘_d_printf’
 #define d_printf(l, _dev, f, a...) _d_printf(l, "", _dev, f, ## a)
                                    ^~~~~~~~~
drivers/net/wimax/i2400m/control.c:354:3: note: in expansion of macro ‘d_printf’
   d_printf(1, dev, "entering BS-negotiated idle mode\n");
   ^~~~~~~~
drivers/net/wimax/i2400m/control.c:355:2: note: here
  case I2400M_SS_DISCONNECTING:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoamd-xgbe: Mark expected switch fall-throughs
Gustavo A. R. Silva [Wed, 24 Apr 2019 16:08:24 +0000 (11:08 -0500)]
amd-xgbe: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

In file included from drivers/net/ethernet/amd/xgbe/xgbe-drv.c:129:
drivers/net/ethernet/amd/xgbe/xgbe-drv.c: In function ‘xgbe_set_hwtstamp_settings’:
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1392:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
  (_var) |= (((_val) & ((0x1 << (_width)) - 1)) << (_index)); \
  ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1419:2: note: in expansion of macro ‘SET_BITS’
  SET_BITS((_var),      \
  ^~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1614:3: note: in expansion of macro ‘XGMAC_SET_BITS’
   XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSVER2ENA, 1);
   ^~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1616:2: note: here
  case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
  ^~~~
In file included from drivers/net/ethernet/amd/xgbe/xgbe-drv.c:129:
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1392:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
  (_var) |= (((_val) & ((0x1 << (_width)) - 1)) << (_index)); \
  ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1419:2: note: in expansion of macro ‘SET_BITS’
  SET_BITS((_var),      \
  ^~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1625:3: note: in expansion of macro ‘XGMAC_SET_BITS’
   XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSVER2ENA, 1);
   ^~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1627:2: note: here
  case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
  ^~~~
In file included from drivers/net/ethernet/amd/xgbe/xgbe-drv.c:129:
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1392:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
  (_var) |= (((_val) & ((0x1 << (_width)) - 1)) << (_index)); \
  ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-common.h:1419:2: note: in expansion of macro ‘SET_BITS’
  SET_BITS((_var),      \
  ^~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1636:3: note: in expansion of macro ‘XGMAC_SET_BITS’
   XGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSVER2ENA, 1);
   ^~~~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1638:2: note: here
  case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comments are modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: socket: Fix missing break in switch statement
Gustavo A. R. Silva [Wed, 24 Apr 2019 15:31:24 +0000 (10:31 -0500)]
net: socket: Fix missing break in switch statement

Add missing break statement in order to prevent the code from falling
through to cases SIOCGSTAMP_NEW and SIOCGSTAMPNS_NEW.

This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.

Fixes: 0768e17073dc ("net: socket: implement 64-bit timestamps")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 's390-qeth-cleanups'
David S. Miller [Fri, 26 Apr 2019 15:14:06 +0000 (11:14 -0400)]
Merge branch 's390-qeth-cleanups'

Julian Wiedmann says:

====================
s390/qeth: updates 2019-04-25

please apply one more patch series for qeth to net-next. Nothing special,
just a bunch of cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: trust non-IP cast type in qeth_l3_fill_header()
Julian Wiedmann [Thu, 25 Apr 2019 16:26:01 +0000 (18:26 +0200)]
s390/qeth: trust non-IP cast type in qeth_l3_fill_header()

When building the L3 HW header for non-IP packets, trust the cast type
that was passed as parameter. qeth_l3_get_cast_type() has most likely
also used h_dest to determine the cast type, so we get the same
result, and can remove that duplicated code.
In the unlikely case that we would get a _different_ cast type, then
that's based off a route lookup and should be considered authoritative.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: extract helper to determine L2 cast type
Julian Wiedmann [Thu, 25 Apr 2019 16:26:00 +0000 (18:26 +0200)]
s390/qeth: extract helper to determine L2 cast type

This de-duplicates the L2 and L3 cast-type code, and makes the L2 code
a bit more robust by removing the fragile assumption that skb->data
always points to the Ethernet Header. This would break in code paths
where we pushed the HW header onto the skb.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: cache max number of available buffer elements
Julian Wiedmann [Thu, 25 Apr 2019 16:25:59 +0000 (18:25 +0200)]
s390/qeth: cache max number of available buffer elements

The QETH_MAX_BUFFER_ELEMENTS() macro effectively returns a constant
value. To avoid some redundant pointer chasing and computations in the
xmit hot path, cache this value in the queue struct.

Take this as opportunity to shrink some of the queue struct's fields to
their appropriate value range, slightly reducing its total size.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: don't clear Output buffers on every queue init
Julian Wiedmann [Thu, 25 Apr 2019 16:25:58 +0000 (18:25 +0200)]
s390/qeth: don't clear Output buffers on every queue init

On the first initialization of a queue, its Output Buffers are in a
clean state with no attached resources. On every subsequent
initialization, qeth_l?_stop_card() has previously put them in a clean
state via qeth_drain_output_queues(). So the call to
qeth_clear_output_buffer() is redundant and can be removed.

While at it, move the initialization of the queue's card pointer into
the queue allocation. It never changes afterwards.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: use IS_* helpers for checking device type
Julian Wiedmann [Thu, 25 Apr 2019 16:25:57 +0000 (18:25 +0200)]
s390/qeth: use IS_* helpers for checking device type

We have helper macros for all possible device types, replace all
remaining open-coded accesses to the type fields.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: clean up stale buffer state documentation
Julian Wiedmann [Thu, 25 Apr 2019 16:25:56 +0000 (18:25 +0200)]
s390/qeth: clean up stale buffer state documentation

We don't keep track of Input Buffer states, so remove the comments that
make it sound like the qeth_qdio_buffer_states enum applies to
Input Buffers.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>