git.openwrt.org Git - openwrt/staging/blogic.git/log

net/mlx5e: IPSec, Innova IPSec offload infrastructure

Add Innova IPSec ESP crypto offload configuration paths.
Detect Innova IPSec device and set the NETIF_F_HW_ESP flag.
Configure Security Associations using the API introduced in a previous
patch.

Add Software-parser hardware descriptor layout
Software-Parser (swp) is a hardware feature in ConnectX which allows the
host software to specify protocol header offsets in the TX path, thus
overriding the hardware parser.
This is useful for protocols that the ASIC may not be able to parse on
its own.

Note that due to inline metadata, XDP is not supported in Innova IPSec.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Accel, Add IPSec acceleration interface

Add routines for manipulating the hardware IPSec SA database (SADB).

In Innova IPSec, a Security Association (SA) is added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Add implementation for Innova IPSec (FPGA-based) hardware.

These routines will be used by the IPSec offload support in a later patch
However they may also be used by others such as RDMA and RoCE IPSec.

mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.

In this patchset we add Innova IPSec support and mlx5/accel delegates
IPSec offloads to Innova routines.

In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add SBU infrastructure

Add interface to initialize and interact with Innova FPGA SBU
connections.
A client driver may use these functions to set up a high-speed DMA
connection with its SBU hardware logic, and send/receive messages
over this connection.

A later patch in this patchset will make use of these functions for
Innova IPSec offload in mlx5 Ethernet driver.

Add commands to retrieve Innova FPGA SBU capabilities, and to
read/write Innova FPGA configuration space registers and memory,
over internal I2C.

At high level, the FPGA configuration space is divided such:
0x00000000 - 0x007fffff is reserved for the SBU
0x00800000 - 0xffffffff is reserved for the Shell
0x400000000 - ... is DDR memory

A later patchset will add support for accessing FPGA CrSpace and memory
over a high-speed connection. This is the reason for the ACCESS_TYPE
enumeration, which currently only supports I2C.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add SBU bypass and reset flows

The Innova FPGA includes shell hardware and Sandbox-Unit (SBU) hardware.
The shell hardware is handled by mlx5_core itself, while the SBU is
handled by a client driver.

Reset the SBU to a well-known initial state when initializing a new
device, and set the FPGA to bypass mode when uninitializing a device.
This allows the client driver to assume that its device has been
reset when a new device is detected.

During SBU reset, the FPGA is put into SBU-bypass mode. In this mode
packets do not pass through the SBU, so it cannot affect the network
data stream at all.

A factory-image does not have an SBU, so skip these flows.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add high-speed connection routines

An FPGA high-speed connection has two endpoints, an FPGA QP and a
ConnectX QP.
Add library routines to create and connect the endpoints of an
FPGA high-speed connection.

These routines allow creating and interacting with both types of
connections: Shell and Sandbox Unit (SBU).

Shell connection provides an interface to the FPGA's address space,
which includes the configuration space and the DDR.
Use of the shell connection will be introduced in a later patchset.

SBU connection provides a command and/or data interface to the
application-specific logic within the FPGA.
Use of the SBU connection will be introduced in a later patch in
this patchset.

Some struct definitions are added to a new header file sdk.h, which
will be extended in later patches in the patchset.
This header file will contain the in-kernel FPGA client driver API.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Add FW commands for FPGA QPs

The FPGA QP is a high-bandwidth communication channel between the host
CPU and the FPGA device. It allows performing DMA operations between
host memory and the FPGA logic via the ConnectX chip.

Add ConnectX FW commands which create and manipulate FPGA QPs.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: FPGA, Move FPGA init/cleanup to init_once

The FPGA init and cleanup routines should be called just once per
device.
Move them to the init_once and cleanup_once routines.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Add QP WQ support

A QP in ConnectX is a concatenation of RQ and SQ which share a QP-number
and work together.
Add support for allocating and managing the work-queue buffer for a QP, in
a similar way to how SQs and RQs are already supported.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Make get_cqe routine not ethernet-specific

Move mlx5e_get_cqe routine to wq.h and rename it to
mlx5_cqwq_get_cqe.

This allows it to be used by other CQ users outside of the
ethernet driver code.

A later patch in this patchset will make use of it from
FPGA code for the FPGA high-speed connection.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

IB/mlx5: Respect mlx5_core reserved GIDs

Reserved gids are taken by the mlx5_core, report smaller GID table
size to IB core.

Set mlx5_query_roce_port's return value back to int. In case of
error, return an indication. This rolls back some of the change
in commit 50f22fd8ecf9 ("IB/mlx5: Set mlx5_query_roce_port's return value to void")

Change set_roce_addr to use gid_set function, instead of directly
sending the command.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Add support for multiple RoCE enable

Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
well.
Add support for counting number of enables, so that FPGA and IB can work
in parallel and independently.
Program the HW to enable RoCE on the first enable call, and program to
disable RoCE on the last disable call.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Add reserved-gids support

Reserved GIDs are entries in the GID table in use by the mlx5_core
and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
The entries are reserved at the high indexes of the GID table.

A mlx5 submodule may reserve a certain amount of GIDs for its own use
during the load sequence by calling mlx5_core_reserve_gids, and must
also take care to un-reserve these GIDs when it closes.
Reservation is only allowed during the load sequence and before any
interfaces (e.g. mlx5_ib or mlx5_en) are up.

After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
free to allocate entries from the reserved GIDs pool.

Reserve a GID table entry for every supported FPGA QP.

A later patch in the patchset will remove them from being reported to
IB core.
Another such patch will make use of these for FPGA QPs in Innova NIC.

Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
expose only public mlx5 API, more mlx5 library files will be added in
future submissions.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Set interface flags before cleanup in unload_one

In load_one, the interface flags are changed from down to up,
only after initializing the interfaces.
In unload_one, the flags are changed from up to down before the
interface cleanup.

Change the cleanup order to be opposite to initialization order.

This fixes flag consistency between init and cleanup.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx4: fix spelling mistake: "coalesing" -> "coalescing"

Trivial fix to spelling mistake in en_dbg debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-add-netlink_ext_ack-support-to-rtnl_link_ops'

Matthias Schiffer says:

====================
net: add netlink_ext_ack support to rtnl_link_ops

Same changes as http://patchwork.ozlabs.org/patch/780351/ , split into
separate patches for each rtnl_link_ops field as requested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.slave_validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.slave_changelink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.changelink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.newlink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: add fixed-link node support

In case the MACB is directly connected to a
non-mdio PHY/device, it should be possible to provide
a fixed link configuration in the DT.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2017-06-25' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.13

New features and bug fixes to quite a few different drivers, but
nothing really special standing out.

What makes me happy that we have now more vendors actively
contributing to upstream drivers. In this pull request we have patches
from Broadcom, Intel, Qualcomm, Realtek and Redpine Signals, and I
still have patches from Marvell and Quantenna pending in patchwork. Now
that's something comparing to how things looked 11 years ago in Jeff
Garzik's "State of the Union: Wireless" email:

https://lkml.org/lkml/2006/1/5/671

Major changes:

wil6210

* add low level RF sector interface via nl80211 vendor commands

* add module parameter ftm_mode to load separate firmware for factory
testing

* support devices with different PCIe bar size

* add support for PCIe D3hot in system suspend

* remove ioctl interface which should not be in a wireless driver

ath10k

* go back to using dma_alloc_coherent() for firmware scratch memory

* add per chain RSSI reporting

brcmfmac

* add support multi-scheduled scan

* add scheduled scan support for specified BSSIDs

* add support for brcm43430 revision 0

wlcore

* add wil1285 compatible

rsi

* add RS9113 USB support

iwlwifi

* FW API documentation improvements (for tools and htmldoc)

* continuing work for the new A000 family

* bump the maximum supported FW API to 31

* improve the differentiation between 8000, 9000 and A000 families
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sctp-RFC-4960-Errata-fixes'

Marcelo Ricardo Leitner says:

====================
sctp: RFC 4960 Errata fixes

This patchset contains fixes for 4 Errata topics from
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01
Namely, sections:
3.12. Order of Adjustments of partial_bytes_acked and cwnd
3.22. Increase of partial_bytes_acked in Congestion Avoidance
3.26. CWND Increase in Congestion Avoidance Phase
3.27. Refresh of cwnd and ssthresh after Idle Period

Tests performed with netperf using net namespaces, with drop rates at
0%, 0.5% and 1% by netem, IPv4 and IPv6, 10 runs for each combination.
I couldn't spot differences on the stats. With and without these patches
the results vary in a similar way in terms of throughput and
retransmissions.

Tests with 20ms delay and 20ms delay + drops at 0.5% and 1% also had
results in a similar way, no noticeable difference.

Looking at cwnd, it was possible to notice slightly lower values being
used while still sustaining same throughput profile.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: adjust ssthresh when transport is idle

RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
because otherwise it could cause the transport to lock into congestion
avoidance phase specially if ssthresh was previously reduced by some
packet drop, leading to poor performance.

The Errata says to adjust ssthresh to cwnd only once, though the same
goal is achieved by updating it every time we update cwnd too. The
caveat is that we could take longer to get back up to speed but that
should be compensated by the fact that we don't adjust on RTO basis (as
RFC says) but based on Heartbeats, which are usually way longer.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: adjust cwnd increase in Congestion Avoidance phase

RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
underspecified and as described could allow increasing cwnd more than
that.

This patch updates it so partial_bytes_acked is maxed to cwnd if
flight_size doesn't reach cwnd, protecting it from such case.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: allow increasing cwnd regardless of ctsn moving or not

As per RFC4960 Errata 3.22, this condition is not needed anymore as it
could cause the partial_bytes_acked to not consider the TSNs acked in
the Gap Ack Blocks although they were received by the peer successfully.

This patch thus drops the check for new Cumulative TSN Ack Point,
leaving just the flight_size < cwnd one.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: update order of adjustments of partial_bytes_acked and cwnd

RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
adjustments applied to partial_bytes_acked and cwnd in the congestion
avoidance phase, and that the actual order should be:
partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
is increased by MTU.

We were first increasing cwnd, and then subtracting the new value pba,
which leads to a different result as pba is smaller than what it should
and could cause cwnd to not grow as much.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove ndo_dfwd_start_xmit

Looks like commit f663dd9aaf9e ("net: core: explicitly select a txq before doing l2 forwarding")
has removed the need for this dedicated xmit function [it even explicitly
states so in its commit log message] but it hasn't removed the definition
of the ndo.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Jason Wang <jasowang@redhat.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qcom-emac-various-minor-improvements'

Timur Tabi says:

====================
net: qcom/emac: various minor improvements

A collection of minor fixes and features to the Qualcomm Technologies
EMAC network driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: add support for emulation systems

On emulation systems, the EMAC's internal PHY ("SGMII") is not present,
but is not needed for network functionality. So just display a warning
message and ignore the SGMII.

Tested-by: Philip Elcan <pelcan@codeaurora.org>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: do not reset the EMAC during initialization

On ACPI systems, the driver depends on firmware pre-initializing the
EMAC because we don't have access to the clocks, and the EMAC has specific
clock programming requirements. Therefore, we don't want to reset the
EMAC while we are completing the initialization.

Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: add shutdown function

The shutdown function halts all DMA and interrupts, so that all
operations are discontinued when the system shuts down, e.g. via
kexec or a forced reboot.

Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers

Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.

Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in iucv_sock_bind()")
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
[jwi: removed unneeded null-check for addr]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/iucv: improve endianness handling

Use proper endianness conversion for an skb protocol assignment. Given
that IUCV is only available on big endian systems (s390), this simply
avoids an endianness warning reported by sparse.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix error code in mv88e6390_serdes_power()

We're accidentally returning the wrong variable. "cmode" is
uninitialized at this point so it causes a static checker warning.

Fixes: 6335e9f2446b ("net: dsa: mv88e6xxx: mv88e6390X SERDES support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-add-flower-app-with-representors'

Simon Horman says:

====================
nfp: add flower app with representors

this series adds a flower app to the NFP driver.
It initialises four types of netdevs:

* PF netdev - lower-device for communication of packets to device
* PF representor netdev
* VF representor netdevs
* Phys port representor netdevs

The PF netdev acts as a lower-device which sends and receives packets to
and from the firmware. The representors act as upper-devices. For TX
representors attach a metadata dst to the skb which is used by the PF
netdev to prepend metadata to the packet before forwarding the firmware. On
RX the PF netdev looks up the representor based on the prepended metadata
received from the firmware and forwards the skb to the representor after
removing the metadata.

Control queues are used to send and receive control messages which are
used to communicate configuration information with the firmware. These
are in separate vNIC to the queues belonging to the PF netdev. The control
queues are not exposed to use-space via a netdev or any other means.

The first 9 patches of this series provide app-independent infrastructure
to instantiate representors and the remaining 3 patches provide an app
which uses this infrastructure.

As the name implies this app is targeted at providing offload of TC flower.
Flower offload - allowing classifiers to be attached to representor netdevs
- is intended to be provided by follow-up patches at which point it will
become the dominant feature of the app.

Minor changes since v2 noted in changelogs of individual patches.
Review of v1 and v2 of this patchset have been addressed either
through discussion on-list or changes in this patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add VF and PF representors to flower app

Initialise VF and PF representors in flower app.

Based in part on work by Benjamin LaHaise, Bert van Leeuwen and
Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add flower app

Add app for flower offload. At this point the PF netdev and phys port
representor netdevs are initialised. Follow-up work will add support for
VF and PF representors and beyond that offloading the flower classifier.

Based in part on work by Benjamin LaHaise and Bert van Leeuwen.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for control messages for flower app

In preparation for adding a new flower app - targeted at offloading
the flower classifier - provide support for control message that it will
use to communicate with the NFP.

Based in part on work by Bert van Leeuwen.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for tx/rx with metadata portid

Allow tx/rx with metadata port id. This will be used for tx/rx of
representor netdevs acting as upper-devices while a pf netdev acts
as a lower-device.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: provide nfp_port to of nfp_net_get_mac_addr()

Provide port rather than vNIC as parameter of nfp_net_get_mac_addr.
This is to allow this function to be used by representor netdevs where
a vNIC may have more than one physical port none of which are associated
with the vNIC.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: app callbacks for SRIOV

Add app-callbacks for app-specific initialisation of SRIOV.

Disabling SRIOV is brought forward in nfp_pci_remove()
so that nfp_app_sriov_disable is called while the app still exists.

This is intended to be used to implement representor netdevs for virtual
ports.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add stats and xmit helpers for representors

Provide helpers for stats and xmit on representor netdevs.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise and
Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: general representor implementation

Provide infrastructure to create and destroy representors of a given type.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise,
and Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: map mac_stats and vf_cfg BARs

If present map mac_stats and vf_cfg BARs. These will be used by
representor netdevs to read statistics for phys port and vf representors.

Also provide defines describing the layout of the mac_stats area.
Similar defines are already present for the cf_cfg area.

Based in part on work by Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move physical port init into a helper

Move MAC/PHY port init into a helper to make it easier to reuse
it in the representor code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: devlink add support for getting eswitch mode

Add app callback for reporting eswitch mode. Non-SRIOV apps
should not implement this callback, nfp_app code will then
respond with -EOPNOTSUPP.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: store port/representator id in metadata_dst

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id. This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission. The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-internal'

Florian Fainelli says:

====================
net: phy: Support "internal" PHY interface

This makes the "internal" phy-mode property generally available and
documented and this allows us to remove some custom parsing code
we had for bcmgenet and bcm_sf2 which both used that specific value.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Remove special handling of "internal" phy-mode

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Support "internal" PHY interface

Now that the Device Tree binding has been updated, update the PHY
library phy_interface_t and phy_modes to support the "internal" PHY
interface type.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: Add "internal" as a valid 'phy-mode' property

A number of Ethernet MACs have internal Ethernet PHYs and the internal
wiring makes it so that this knowledge needs to be available using the
standard 'phy-mode' property.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2017-06-23' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-23

This series provides some updates to the mlx5 core and netdevice drivers.

Three patches from Tariq, Introduces page reuse mechanism in non-Striding
RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
as much as it could, until the page is fully consumed. RX page reuse
reduces the stress on page allocator and improves RX performance especially
with high speeds (100Gb/s).

Next four patches of the series from Or allows to offload tc flower matching
on ttl/hoplimit and header re-write of hoplimit.

The rest of the series from Yotam and Or enhances mlx5 to support FW flashing
through the mlxfw module, in a similar manner done by the mlxsw driver.
Currently, only ethtool based flashing is implemented, where both Eth and IB ports
are supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Use Firmware params to get buffer-group map

Buffer group mappings can be obtained using FW_PARAMs cmd for newer FW.

Since some of the bg_maps are obtained in atomic context, created another
t4_query_params_ns(), that wont sleep when awaiting mbox cmd completion.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update T6 Buffer Group and Channel Mappings

We were using t4_get_mps_bg_map() for both t4_get_port_stats()
to determine which MPS Buffer Groups to report statistics on for a given
Port, and also for t4_sge_alloc_rxq() to provide a TP Ingress Channel
Congestion Map. For T4/T5 these are actually the same values (because they
are ~somewhat~ related), but for T6 they should return different values
(T6 has Port 0 associated with MPS Buffer Group 0 (with MPS Buffer Group 1
silently cascading off) and Port 1 is associated with MPS Buffer Group 2
(with 3 cascading off)).

Based on the original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: return -EFAULT if copy_to_user() fails

The copy_to_user() function returns the number of bytes remaining but we
want to return -EFAULT here.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-06-23

1) Use memdup_user to spmlify xfrm_user_policy.
   From Geliang Tang.

2) Make xfrm_dev_register static to silence a sparse warning.
   From Wei Yongjun.

3) Use crypto_memneq to check the ICV in the AH protocol.
   From Sabrina Dubroca.

4) Remove some unused variables in esp6.
   From Stephen Hemminger.

5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
   From Antony Antony.

6) Include the UDP encapsulation port to km_migrate announcements.
   From Antony Antony.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ena-new-features-and-improvements'

Netanel Belgazal says:

====================
net: update ena ethernet driver to version 1.2.0

This patchset contains some new features/improvements that were added
to the ENA driver to increase its robustness and are based on
experience of wide ENA deployment.

Change log:

V2:
* Remove patch that add inline to C-file static function (contradict coding style).
* Remove patch that moves MTU parameter validation in ena_change_mtu() instead of
using the network stack.
* Use upper_32_bits()/lower_32_bits() instead of casting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: update ena driver to version 1.2.0

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: update driver's rx drop statistics

rx drop counter is reported by the device in the keep-alive
event.
update the driver's counter with the device counter.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: use lower_32_bits()/upper_32_bits() to split dma address

In ena_com_mem_addr_set(), use the above functions to split dma address
to the lower 32 bits and the higher 16 bits.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: separate skb allocation to dedicated function

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: use napi_schedule_irqoff when possible

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: allow the driver to work with small number of msix vectors

Current driver tries to allocate msix vectors as the number of the
negotiated io queues. (with another msix vector for management).
If pci_alloc_irq_vectors() fails, the driver aborts the probe
and the ENA network device is never brought up.

With this patch, the driver's logic will reduce the number of IO
queues to the number of allocated msix vectors (minus one for management)
instead of failing probe().

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add support for out of order rx buffers refill

ENA driver post Rx buffers through the Rx submission queue
for the ENA device to fill them with receive packets.
Each Rx buffer is marked with req_id in the Rx descriptor.

Newer ENA devices could consume the posted Rx buffer in out of order,
and as result the corresponding Rx completion queue will have Rx
completion descriptors with non contiguous req_id(s)

In this change the driver holds two rings.
The first ring (called free_rx_ids) is a mapping ring.
It holds all the unused request ids.
The values in this ring are from 0 to ring_size -1.

When the driver wants to allocate a new Rx buffer it uses the head of
free_rx_ids and uses it's value as the index for rx_buffer_info ring.
The req_id is also written to the Rx descriptor

Upon Rx completion,
The driver took the req_id from the completion descriptor and uses it
as index in rx_buffer_info.
The req_id is then return to the free_rx_ids ring.

This patch also adds statistics to inform when the driver receive out
of range or unused req_id.

Note:
free_rx_ids is only accessible from the napi handler, so no locking is
required

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add reset reason for each device FLR

For each device reset, log to the device what is the cause
the reset occur.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: change sizeof() argument to be the type pointer

Instead of using:
memset(ptr, 0x0, sizeof(struct ...))
use:
memset(ptr, 0x0, sizeor(*ptr))

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add hardware hints capability to the driver

With this patch, ENA device can update the ena driver about
the desired timeout values:
These values are part of the "hardware hints" which are transmitted
to the driver as Asynchronous event through ENA async
event notification queue.

In case the ENA device does not support this capability,
the driver will use its own default values.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: change return value for unsupported features unsupported return value

return -EOPNOTSUPP instead of -EPERM.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix out-of-bounds access in ULP sysctl

KASAN reports out-of-bound access in proc_dostring() coming from
proc_tcp_available_ulp() because in case TCP ULP list is empty
the buffer allocated for the response will not have anything
printed into it. Set the first byte to zero to avoid strlen()
going out-of-bounds.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: possibly avoid extra masking for narrower load in verifier

Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
context fields") permits narrower load for certain ctx fields.
The commit however will already generate a masking even if
the prog-specific ctx conversion produces the result with
narrower size.

For example, for __sk_buff->protocol, the ctx conversion
loads the data into register with 2-byte load.
A narrower 2-byte load should not generate masking.
For __sk_buff->vlan_present, the conversion function
set the result as either 0 or 1, essentially a byte.
The narrower 2-byte or 1-byte load should not generate masking.

To avoid unnecessary masking, prog-specific *_is_valid_access
now passes converted_op_size back to verifier, which indicates
the valid data width after perceived future conversion.
Based on this information, verifier is able to avoid
unnecessary marking.

Since we want more information back from prog-specific
*_is_valid_access checking, all of them are packed into
one data structure for more clarity.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: make some functions static

The functions dwmac4_dma_init_rx_chan, dwmac4_dma_init_tx_chan and
dwmac4_dma_init_channel do not need to be in global scope, so them
static.

Cleans up sparse warnings:
"symbol 'dwmac4_dma_init_rx_chan' was not declared. Should it be static?"
"symbol 'dwmac4_dma_init_tx_chan' was not declared. Should it be static?"
"symbol 'dwmac4_dma_init_channel' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'xdp-offload-mode'

Jakub Kicinski says:

====================
xdp: offload mode

While we discuss the representors.. :)

This set adds XDP flag for forcing offload and a attachment mode
for reporting to user space that program has been offloaded.  The
nfp driver is modified to make use of the new flags, but also to
adhere to the DRV_MODE flag which should disable the HW offload.

The intended driver behaviour is:
            DRV mode   offload
no flags      yes     attempted
DRV_MODE      yes        no
HW_MODE      no         yes

Where 'yes' means required, and error will be returned if setup fails.
'Attempted' means the offload will only happen automatically if HW is
capable and offloading the program will cause no change in system
behaviour (e.g. maps don't have to bound).

Thanks to loading the program both to the driver and HW by default we
can fallback to the driver mode without disruption in case user replaces
the program with one which cannot be offloaded later.

Note that the NFP driver currently claims XDP offload support but
lacks most basic features like direct packet access.

Only change compared to the RFC is fixing the double bpf_prog_put()
which Daniel has spotted (patch 5).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: xdp: report if program is offloaded

Make use of just added XDP_ATTACHED_HW.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: add reporting of offload mode

Extend the XDP_ATTACHED_* values to include offloaded mode.
Let drivers report whether program is installed in the driver
or the HW by changing the prog_attached field from bool to
u8 (type of the netlink attribute).

Exploit the fact that the value of XDP_ATTACHED_DRV is 1,
therefore since all drivers currently assign the mode with
double negation:
mode = !!xdp_prog;
no drivers have to be modified.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: add support for XDP_FLAGS_HW_MODE

Respect the XDP_FLAGS_HW_MODE. When it's set install the program
on the NIC and skip enabling XDP in the driver.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: release the reference on offloaded programs

The xdp_prog member of the adapter's data path structure is used
for XDP in driver mode.  In case a XDP program is loaded with in
HW-only mode, we need to store it somewhere else.  Add a new XDP
prog pointer in the main structure and use that when we need to
know whether any XDP program is loaded, not only a driver mode
one.  Only release our reference on adapter free instead of
immediately after netdev unregister to allow offload to be disabled
first.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: don't offload XDP programs in DRV_MODE

DRV_MODE means that user space wants the program to be run in
the driver. Do not try to offload. Only offload if no mode
flags have been specified.

Remember what the mode is when the program is installed and refuse
new setup requests if there is already a program loaded in a
different mode. This should leave it open for us to implement
simultaneous loading of two programs - one in the drv path and
another to the NIC later.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: xdp: move driver XDP setup into a separate function

In preparation of XDP offload flags move the driver setup into
a function. Otherwise the number of conditions in one function
would make it slightly hard to follow. The offload handler may
now be called with NULL prog, even if no offload is currently
active, but that's fine, offload code can handle that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: add HW offload mode flag for installing programs

Add an installation-time flag for requesting that the program
be installed only if it can be offloaded to HW.

Internally new command for ndo_xdp is added, this way we avoid
putting checks into drivers since they all return -EINVAL on
an unknown command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: pass XDP flags into install handlers

Pass XDP flags to the xdp ndo. This will allow drivers to look
at the mode flags and make decisions about offload.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: fix poll()

Michael reported an UDP breakage caused by the commit b65ac44674dd
("udp: try to avoid 2 cache miss on dequeue").
The function __first_packet_length() can update the checksum bits
of the pending skb, making the scratched area out-of-sync, and
setting skb->csum, if the skb was previously in need of checksum
validation.

On later recvmsg() for such skb, checksum validation will be
invoked again - due to the wrong udp_skb_csum_unnecessary()
value - and will fail, causing the valid skb to be dropped.

This change addresses the issue refreshing the scratch area in
__first_packet_length() after the possible checksum update.

Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb()

very similar to commit dd99e425be23 ("udp: prefetch
rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache
miss when the BH is bottle-neck for UDP over ipv6 packet
processing, e.g. for small packets when a single RX NIC ingress
queue is in use.

Performances under flood when multiple NIC RX queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~8% performance improvement.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-mvpp2-misc-improvements'

Thomas Petazzoni says:

====================
net: mvpp2: misc improvements

Here are a few patches making various small improvements/refactoring
in the mvpp2 driver. They are based on today's net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: remove mvpp2_pool_refill()

When all a function does is calling another function with the exact same
arguments, in the exact same order, you know it's time to remove said
function. Which is exactly what this commit does.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: remove unused mvpp2_bm_cookie_pool_set() function

This function is not used in the driver, remove it.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add comments about smp_processor_id() usage

A previous commit modified a number of smp_processor_id() used in
migration-enabled contexts into get_cpu/put_cpu sections. However, a few
smp_processor_id() calls remain in the driver, and this commit adds
comments explaining why they can be kept.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-pci-Refactor-DMI-probing'

Jan Kiszka says:

====================
stmmac: pci: Refactor DMI probing

Some cleanups of the way we probe DMI platforms in the driver. Reduces
a bit of open-coding and makes the logic easier reusable for any
potential DMI platform != Quark.

Tested on IOT2000 and Galileo Gen2.

Changes in v5:
- fixed a remaining issue in patch 5
- dropped patch 6 for now
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Use dmi_system_id table for retrieving PHY addresses

Avoids reimplementation of DMI matching in stmmac_pci_find_phy_addr.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Select quark_pci_dmi_data from quark_default_data

No need to carry this reference in stmmac_pci_info - the Quark-specific
setup handler knows that it needs to use the Quark-specific DMI table.
This also allows to drop the stmmac_pci_info reference from the setup
handler parameter list.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Make stmmac_pci_find_phy_addr truly generic

Move the special case for the early Galileo firmware into
quark_default_setup. This allows to use stmmac_pci_find_phy_addr for
non-quark cases.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Use stmmac_pci_info for all devices

Make stmmac_default_data compatible with stmmac_pci_info.setup and use
an info structure for all devices. This allows to make the probing more
regular.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Make stmmac_pci_info structure constant

By removing the PCI device reference from the structure and passing it
as parameters to the interested functions, we can make quark_pci_info
const.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Fix the carrier state error when data path is off

When the VF NIC is opened, the synthetic NIC's carrier state is set to
off. This tells the host to transitions data path to the VF device. But
if startup script or user manipulates the admin state of the netvsc
device directly for example:
# ifconfig eth0 down
# ifconfig eth0 up
Then the carrier state of the synthetic NIC would be on, even though the
data path was still over the VF NIC. This patch sets the carrier state
of synthetic NIC with consideration of the related VF state.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Remove unnecessary var link_state from struct netvsc_device_info

We simply use rndis_device->link_state in the netdev_dbg. The variable,
link_state from struct netvsc_device_info, is not used anywhere else.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: fix a build problem

tracex5_kern.c build failed with the following error message:
../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
#include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.

The fix is to add $(obj) into the clang compilation path.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rds-tcp-fixes'

Sowmini Varadhan says:

====================
rds: tcp: fixes

Patch1 is a bug fix for correct reconnect when a connection
is restarted. Patch 2 accelerates cleanup by setting linger
to 1 and sending a RST to the peer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: set linger to 1 when unloading a rds-tcp

If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: send handshake ping-probe from passive endpoint

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>