git.openwrt.org Git - openwrt/staging/blogic.git/log

cxgb4: fix to bring link down after adapter crash

Use PORT_REG for T4 and T5_PORT_REG for > T4 to write to correct
register to bring down link during shutdown after adapter crash.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipmr: add getlink support

Currently there's no way to dump the VIF table for an ipmr table other
than the default (via proc). This is a major issue when debugging ipmr
issues and in general it is good to know which interfaces are
configured. This patch adds support for RTM_GETLINK for the ipmr family
so we can dump the VIF table and the ipmr table's current config for
each table. We're protected by rtnl so no need to acquire RCU or
mrt_lock.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-Remove-compatibility-with-old-firmware'

Jiri Pirko says:

====================
mlxsw: Remove compatibility with old firmware

Up until recently we couldn't enforce a minimal firmware version, which
forced us to be compatible with old firmware versions. This patchset
removes this code and simplifies the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Pass port argument to module mapping functions

Previous patch made it unnecessary to map ports to modules before we
allocate their struct. We can now therefore pass the port struct to
these functions, thereby making them consistent with other functions
that operate on ports.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Simplify port split flow

In commit be94535f9531 ("mlxsw: spectrum: Make split flow match firmware
requirements") we had to modify the port split flow to overcome quirks
in the device's firmware. This resulted in asymmetrical code with
regards to port creation and removal.

The problem in the firmware is long gone and since we can now enforce a
minimal firmware version, we can simplify the code and make it symmetric
again.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Mark only first LPM tree as reserved

In new firmware versions (that we can now enforce via
request_firmware()), only the first LPM tree is reserved and not the
first two as in older versions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Remove-support-from-bridge-bypass-for-mlxsw-rocker-drivers'

Jiri Pirko says:

===================
net: Remove support from bridge bypass for mlxsw/rocker drivers

Currently setting bridge port attributes and adding FDBs are done via
setting the SELF flag which implies unconsistent offloading model. This
patch-set fixes this behavior by making the bridge and drivers which are
using it to be totally in sync.

This implies several changes:
- Offloading bridge flags from the bridge code.
- Sending notification about FDB add/del to the software bridge in a
similiar way it is done for the hardware externally learned FDBs.

By making the offloading model more consistent a cleanup is done in
the drivers supporting it. This is done in order to remove un-needed
logic related to dump operation which is redundant.

First add missing functionality to bridge, then clean up the mlxsw/rocker
drivers.

v1->v2
- Move bridge-switchdev related stuff to br_switchdev.c as suggested by Nik
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Remove support bridge bypass FDB

The FDB add/delete are now done through the notification chain. The FDBs
are synced with the bridge and there is no need for extra dumping.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Remove support for bypass bridge port attributes/vlan set

The bridge port attributes/vlan for mlxsw devices should be set only
from bridge code. The vlans are synced totally with the bridge so
there is no need to special dump support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Add support for learning FDB through notification

Add support for learning FDB through notification. The driver defers
the hardware update via ordered work queue.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Change world_ops API and implementation to be switchdev independant

Currently the switchdev_trans struct is embedded in the world_ops API.
In order to add support for adding FDB via a notfication chain the API should
be switchdev independent.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Add support for querying supported bridge flags

Add support for querying supported bridge flags.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Remove support for bridge FDB learning sync

Currently the rocker driver supports an option for disabling syncing
the hardware learned FDBs with the software bridge. This behavior
breaks the bridge offload model and thus it is removed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove support for bridge bypass ndos from stacked devices

Remove support for bridge bypass ndos from stacked devices. At this point
no driver which supports stack device behavior offload supports operation
with SELF flag. The case for upper device is already taken care of in both
of the following cases:

1. FDB add/del - driver should check at the notification cb if the
stacked device contains his ports.

2. Port attribute - calls switchdev code directly which checks
for case of stack device.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Remove support for bridge bypass FDB add/del

The FDB add/del are now done through the notification chain. The FDBs
are synced with the bridge and there is no need for extra dumping.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Add support for learning FDB through notification

Add support for learning FDB through notification. The driver defers
the hardware update via ordered work queue. Support for stacked devices
is also provided. In case of a successful FDB add a notification is
sent back to bridge.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Change switchdev notifier API

The current API for sending switchdev notifications implies only FDB
add/del. In order to support notification about successful FDB offload
the API is changed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set

The bridge port attributes/vlan for mlxsw devices should be set only
from bridge code. The vlans are synced totally with the bridge so
there is no need to special dump support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Add support for querying supported bridge flags

Add support for querying supported bridge flags.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Remove support for bridge FDB learning sync

Currently the mlxsw driver supports an option for disabling syncing
the hardware learned FDBs with the software bridge. This behavior
breaks the bridge offload model and thus it is removed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Receive notification about successful FDB offload

When a new static FDB is added to the bridge a notification is sent to
the driver for offload. In case of successful offload the driver should
notify the bridge back, which in turn should mark the FDB as offloaded.

Currently, externally learned is equivalent for being offloaded which is
not correct due to the fact that FDBs which are added from user-space are
also marked as externally learned. In order to specify if an FDB was
successfully offloaded a new flag is introduced.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Add support for notifying devices about FDB add/del

Currently the bridge doesn't notify the underlying devices about new
FDBs learned. The FDB sync is placed on the switchdev notifier chain
because devices may potentially learn FDB that are not directly related
to their ports, for example:

1. Mixed SW/HW bridge - FDBs that point to the ASICs external devices
should be offloaded as CPU traps in order to
perform forwarding in slow path.
2. EVPN - Externally learned FDBs for the vtep device.

Notification is sent only about static FDB add/del. This is done due
to fact that currently this is the only scenario supported by switch
drivers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: switchdev: Change notifier chain to be atomic

In order to use the switchdev notifier chain for FDB sync with the
device it has to be changed to atomic. The is done because the bridge
can learn new FDBs in atomic context.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Add support for calling FDB external learning under rcu

This is done as a preparation to moving the switchdev notifier chain
to be atomic. The FDB external learning should be called under rtnl
or rcu.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Add support for offloading port attributes

Currently the flood, learning and learning_sync port attributes are
offloaded by setting the SELF flag. Add support for offloading the
flood and learning attribute through the bridge code. In case of
setting an unsupported flag on a offloded port the operation will
fail.

The learning_sync attribute doesn't have any software representation
and cannot be offloaded through the bridge code.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: switchdev: Add support for querying supported bridge flags by hardware

This is done as a preparation stage before setting the bridge port flags
from the bridge code. Currently the device can be queried for the bridge
flags state, but the querier cannot distinguish if the flag is disabled
or if it is not supported at all. Thus, add new attr and a bit-mask which
include information regarding the support on a per-flag basis.

Drivers that support bridge offload but not support bridge flags should
return zeroed bitmask.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-add-cross-chip-VLAN-support'

Vivien Didelot says:

====================
net: dsa: add cross-chip VLAN support

The current code in DSA does not support cross-chip VLAN. This means
that in a multi-chip environment such as this one (similar to ZII Rev B)

         [CPU].................... (mdio)
    (eth0) |   :       :          :
          _|_____    _______    _______
         [__sw0__]--[__sw1__]--[__sw2__]
          |  |  |    |  |  |    |  |  |
          v  v  v    v  v  v    v  v  v
          p1 p2 p3   p4 p5 p6   p7 p8 p9

adding a VLAN to p9 won't be enough to reach the CPU, until at least one
port of sw0 and sw1 join the VLAN as well and become aware of the VID.

This patchset makes the DSA core program the VLAN on the CPU and DSA
links itself, which brings seamlessly cross-chip VLAN support to DSA.

With this series applied*, the hardware VLAN tables of a 3-switch setup
look like this after adding a VLAN to only one port of the end switch:

    # cat /sys/class/net/br0/bridge/default_pvid
    42
    # cat /sys/kernel/debug/mv88e6xxx/sw{0,1,2}/vtu
    # ip link set up master br0 dev lan6
    # cat /sys/kernel/debug/mv88e6xxx/sw{0,1,2}/vtu
     VID  FID  SID  0  1  2  3  4  5  6
      42    1    0  x  x  x  x  x  =  =
     VID  FID  SID  0  1  2  3  4  5  6
      42    1    0  x  x  x  x  x  =  =
     VID  FID  SID  0  1  2  3  4  5  6  7  8  9
      42    1    0  u  x  x  x  x  x  x  x  x  =

('x' is excluded, 'u' is untagged, '=' is unmodified DSA and CPU ports.)

Completely removing a VLAN entry (which is currently the responsibility
of drivers anyway) is not supported yet since it requires some caching.

(*) the output is shown from this out-of-tree debugfs patch:
https://github.com/vivien/linux/commit/7b61a684b9d6b6a499135a587c7f62a1fddceb8b.patch

Changes in v2:
  - canonical incrementation (port++ instead of ++port)
  - check CPU and DSA ports before purging a VLAN
  - add Reviewed-by tags
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: do not skip ports on VLAN del

The mv88e6xxx driver currently tries to be smart and remove by itself a
VLAN entry from the VTU when the driven switch sees no user ports as
members of the VLAN.

This is bad in a multi-chip switch fabric, since a chip in between
others may have no bridge port members, but still needs to be aware of
the VID in order to correctly pass frames in the data path.

Now that the DSA core explicitly manages DSA and CPU ports, do not skip
them when checking remaining VLAN members.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: exclude all ports in new VLAN

Now that the DSA core adds the CPU and DSA ports itself to the new VLAN
entry, there is no need to include them as members of this VLAN when
initializing a new VTU entry.

As of now, initialize a new VTU entry with all ports excluded.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add CPU and DSA ports as VLAN members

In a multi-chip switch fabric, it is currently the responsibility of the
driver to add the CPU or DSA (interconnecting chips together) ports as
members of a new VLAN entry. This makes the drivers more complicated.

We want the DSA drivers to be stupid and the DSA core being the one
responsible for caring about the abstracted switch logic and topology.

Make the DSA core program the CPU and DSA ports as part of the VLAN.

This makes all chips of the data path to be aware of VIDs spanning the
the whole fabric and thus, seamlessly add support for cross-chip VLAN.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: check VLAN capability of every switch

Now that the VLAN object is propagated to every switch chip of the
switch fabric, we can easily ensure that they all support the required
VLAN operations before modifying an entry on a single switch.

To achieve that, remove the condition skipping other target switches,
and add a bitmap of VLAN members, eventually containing the target port,
if we are programming the switch target.

This will allow us to easily add other VLAN members, such as the DSA or
CPU ports (to introduce cross-chip VLAN support) or the other port
members if we want to reduce hardware accesses later.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: define membership on VLAN add

Define the target port membership of the VLAN entry in
mv88e6xxx_port_vlan_add where ds is scoped.

Allow the DSA core to call later the port_vlan_add operation for CPU or
DSA ports, by using the Unmodified membership for these ports, as in the
current behavior.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rxrpc-rewrite-20170607-v2' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Tx length parameter

Here's a set of patches that allows someone initiating a client call with
AF_RXRPC to indicate upfront the total amount of data that will be
transmitted.  This will allow AF_RXRPC to encrypt directly from source
buffer to packet rather than having to copy into the buffer and only
encrypt when it's full (the encrypted portion of the packet starts with a
length and so we can't encrypt until we know what the length will be).

The three patches are:

(1) Provide a means of finding out what control message types are actually
     supported.  EINVAL is reported if an unsupported cmsg type is seen, so
     we don't want to set the new cmsg unless we know it will be accepted.

(2) Consolidate some stuff into a struct to reduce the parameter count on
     the function that parses the cmsg buffer.

(3) Introduce the RXRPC_TX_LENGTH cmsg.  This can be provided on the first
     sendmsg() that contributes data to a client call request or a service
     call reply.  If provided, the user must provide exactly that amount of
     data or an error will be incurred.

Changes in version 2:

(*) struct rxrpc_send_params::tx_total_len should be s64 not u64.  Thanks to
     Julia Lawall for reporting this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qrtr-features'

Bjorn Andersson says:

====================
Missing QRTR features

The QMUX specification covers packet routing as well as service life cycle and
discovery. The current implementation of qrtr supports the prior part, but in
order to fully implement service management on-top a few more parts are needed.

The first patch in the series serves the purpose of reducing duplication in
patch two and three.

The second and third patch adds two qrtr-level notifications required by the
specification, in order to notify local and remote service controllers about
dying clients.

The last patch serves the purpose of notifying local clients about the presence
of a local service register, allowing them to register services as well as
querying for remote registered services.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Inform open sockets about new controller

As the higher level communication only deals with "services" the
a service directory is required to keep track of local and remote
services. In order for qrtr clients to be informed about when the
service directory implementation is available some event needs to be
passed to them.

Rather than introducing support for broadcasting such a message in-band
to all open local sockets we flag each socket with ENETRESET, as there
are no other expected operations that would benefit from having support
from locally broadcasting messages.

Cc: Courtney Cavin <ccavin@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Broadcast DEL_CLIENT message when endpoint is closed

Per the QMUXv2 protocol specificiation a DEL_CLIENT message should be
broadcasted when an endpoint is disconnected.

The protocol specification does suggest that the router can keep track
of which nodes the endpoint has been communicating with to not wake up
sleeping remotes unecessarily, but implementation of this suggestion is
left for the future.

Cc: Courtney Cavin <ccavin@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Inject BYE on remote termination

Per the QMUX protocol specification a terminating node can send a BYE
control message to signal that the link is going down, upon receiving
this all information about remote services should be discarded and local
clients should be notified.

In the event that the link was brought down abruptly the router is
supposed to act like a BYE message has arrived. As there is no harm in
receiving an extra BYE from the remote this patch implements the latter
by injecting a BYE when the link to the remote is unregistered.

The name service will receive the BYE and can implement the notification
to the local clients.

Cc: Courtney Cavin <ccavin@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Refactor packet allocation

Extract the allocation and filling in the control message header fields
to a separate function in order to reuse this in subsequent patches.

Cc: Courtney Cavin <ccavin@gmail.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: remove unnecessary variable assignments

Remove unnecessary variable assignments.

Addresses-Coverity-ID: 1226917
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add TCPMemoryPressuresChrono counter

DRAM supply shortage and poor memory pressure tracking in TCP
stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
limits) and tcp_mem[] quite hazardous.

TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
limits being hit, but only tracking number of transitions.

If TCP stack behavior under stress was perfect :
1) It would maintain memory usage close to the limit.
2) Memory pressure state would be entered for short times.

We certainly prefer 100 events lasting 10ms compared to one event
lasting 200 seconds.

This patch adds a new SNMP counter tracking cumulative duration of
memory pressure events, given in ms units.

$ cat /proc/sys/net/ipv4/tcp_mem
3088    4117    6176
$ grep TCP /proc/net/sockstat
TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
$ nstat -n ; sleep 10 ; nstat |grep Pressure
TcpExtTCPMemoryPressures        1700
TcpExtTCPMemoryPressuresChrono  5209

v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David
instructed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-Namespaceify-3-sysctls'

Eric Dumazet says:

====================
tcp: Namespaceify 3 sysctls

Move tcp_sack, tcp_window_scaling and tcp_timestamps
sysctls to network namespaces.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Namespaceify sysctl_tcp_timestamps

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Namespaceify sysctl_tcp_window_scaling

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Namespaceify sysctl_tcp_sack

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add a struct net parameter to tcp_parse_options()

We want to move some TCP sysctls to net namespaces in the future.

tcp_window_scaling, tcp_sack and tcp_timestamps being fetched
from tcp_parse_options(), we need to pass an extra parameter.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: propagate tc filter chain index down the ndo_setup_tc call

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-drivers-version-update'

Tariq Toukan says:

====================
mlx4 drivers: version update

This patchset contains version updates for the MLX4 drivers:
Core, EN, and IB.

Just like we've done in mlx5, we modify the outdated driver
version (reported in ethtool for example).
This better reflects the current driver state, and removes the
redundant date string.
We are not going to change this frequently or even use it.

I include the IB patch in this series as it has similar subject
and content.
It does not cause any kind of conflict with Doug's tree.
The rdma mailing list is CCed.
Please let me know if I need to submit this differently.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Bump driver version

Remove date and bump version for mlx4_ib driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Bump driver version

Remove date and bump version for mlx4_en driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Bump driver version

Remove date and bump version for mlx4_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Have 6161/6123 use EDSA tags

The mv88e6161 and mv88e6123 are capable of using EDSA tags when
passing frames from the host to the switch and back.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: use a more suitable function when assigning NULL

When stopping the vxlan interface we detach it from the socket.
Use RCU_INIT_POINTER() and not rcu_assign_pointer() to do so.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Fix tids count for ipv6 offload connection

the adapter consumes two tids for every ipv6 offload
connection be it active or passive, calculate tid usage
count accordingly.

Also change the signatures of relevant functions to get
the address family.

Signed-off-by: Rizwan Ansari <rizwana@chelsio.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-ctrl-vNIC'

Jakub Kicinski says:

====================
nfp: ctrl vNIC

This series adds the ability to use one vNIC as a control channel
for passing messages to and from the application firmware.  The
implementation restructures the existing netdev vNIC code to be able
to deal with nfp_nets with netdev pointer set to NULL.  Control vNICs
are not visible to userspace (other than for dumping ring state), and
since they don't have netdevs we use a tasklet for RX and simple skb
list for TX queuing.

Due to special status of the control vNIC we have to reshuffle the
init code a bit to make sure control vNIC will be fully brought up
(and therefore communication with app FW can happen) before any netdev
or port is visible to user space.

FW will designate which vNIC is supposed to be used as control one
by setting _pf%u_net_ctrl_bar symbol.  Some FWs depend on metadata
being prepended to control message, some prefer to look at queue ID
to decide that something is a control message.  Our implementation
can cater to both.

First two users of this code will be eBPF maps and flower offloads.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: advertise support for NFD ABI 0.5

NFD ABI 0.5 is equivalent to NFD ABI 3.0 but requires that the
driver checks the APP id symbol and makes sure it can support
given app. Most advanced apps will likely require control vNIC
(ability to exchange control messages between the driver and
app FW). Detailed app version checking and capability exchange
is left to app-specific code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: create control vNICs and wire up rx/tx

When driver encounters an nfp_app which has a control message handler
defined, allocate a control vNIC. This control channel will be used
to exchange data with the application FW such as flow table programming,
statistics and global datapath control.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allow non-equal distribution of IRQs

Thus far the code assumed all vNICs will request similar number of IRQs.
This will be no longer true with control vNICs (where 1 IRQ will suffice).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: slice the netdev spawning function

We want to be able to create a special vNIC for control messages.
This vNIC should be created before any netdev is registered to allow
nfp_app logic to exchange messages with the FW app before any netdev
is visible to user space. Unfortunately we can't enable IRQs until
we know how many vNICs we will need to spawn.

Divide the function which spawns netdevs for vNICs into three parts:
- vNIC/memory allocation;
- IRQ allocation;
- netdev init and register.

This will help us insert the initialization of the control channel
after IRQ allocation but before netdev init and register.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: don't clutter init code passing fw_ver around

Reading fw version from the BAR is trivial.  Don't pass it around
through layers of init functions, simply read it again where needed.

This commit has the side effect of each vNIC having the exact NFD
version from its own control memory, rather than all data vNICs
assuming the version of the first one.  This should not result in
user-visible changes, though.  Capabilities of data vNICs of trival
apps are identical.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: map all queue controllers at once

RX and TX queue controllers are interleaved. Instead of creating
two mappings which map the same area at slightly different offset,
create only one mapping. Always map all queue controllers to simplify
the code and allow reusing the mapping for non-data vNICs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: make vNIC ctrl memory mapping function reusable

We will soon need to map control vNIC PCI memory as well as data vNIC
memory. Make the function for mapping areas pointed to by an RTsym
reusable.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add control vNIC datapath

Since control vNICs don't have a netdev, they can't use napi and
queuing stack provides. Add simple tasklet-based data receive
and send of control messages with queuing on a skb_list.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: prepare config and enable for working without netdevs

Out of the three stages of ifup/ifdown (allocate, configure, start)
- this commit prepares the configuration stage for working with
control vNICs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: allow allocation and initialization of netdev-less vNICs

vNICs used for sending and receiving control messages shouldn't
really have a netdev. Add the ability to initialize vNICs for
netdev-less operation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: make sure debug accesses don't depend on netdevs

We want to be able to inspect the state of descriptor rings of
the control vNIC, so it will use the same interface as data vNICs.

Make sure the code doesn't use netdevs to determine state
of the rings and names things appropriately.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: prepare print macros for use without netdev

To be able to reuse print macros easily with control vNICs make the
macros check if netdev pointer is populated and use dev_* print
functions otherwise.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move nfp_net_vecs_init()

Move nfp_net_vecs_init() after all datapath functions. We will need
to init poll() callbacks from this function soon.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: reuse ring free code on close

On the close path reuse the ring free helpers introduced for runtime
reconfiguration.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: split out the allocation part of open

Our open/close implementations have 3 stages:
- allocation/freeing of ring resources, irqs etc.,
- device config,
- device/stack enable (can't fail).

Right now all of those stages are placed in separate functions,
apart from allocation during open. Fix that. It will make it
easier for us to allocate resources for netdev-less vNICs.
Because we want to reuse allocation code in netdev-less vNICs
leave the netif_set_real_num_[rt]x_queues() calls inside open.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: reorder open and close functions

We will soon reuse parts of .ndo_stop() for clean up after errors
in .ndo_open(). Reorder the associated functions to make that possible.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Provide a cmsg to specify the amount of Tx data for a call

Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Consolidate sendmsg parameters

Consolidate the sendmsg control message parameters into a struct rather
than passing them individually through the argument list of
rxrpc_sendmsg_cmsg(). This makes it easier to add more parameters.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Provide a getsockopt call to query what cmsgs types are supported

Provide a getsockopt() call that can query what cmsg types are supported by
AF_RXRPC.

net: fec: Clear and enable MIB counters on imx51

Both the IMX51 and IMX53 datasheet indicates that the MIB counters
should be cleared during setup. Otherwise random numbers are returned
via ethtool -S. Add a quirk and a function to do this.

Tested on an IMX51.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phylib-support-for-MV88X3310-10G-phy'

Russell King says:

====================
net: Add phylib support for MV88X3310 10G phy

This patch series adds support for the Marvell 88x3310 PHY found on
the SolidRun Macchiatobin board.

The first patch introduces a set of generic Clause 45 PHY helpers that
C45 PHY drivers can make use of if they wish.

Patch 2 ensures that the Clause 22 aneg_done function will not be
called for incompatible Clause 45 PHYs.

Patch 3 fixes the aneg restart to be compatible with C45 PHYs - it can
currently only cope with C22 PHYs.

Patch 4 moves the "gen10g" driver into the Clause 45 code, grouping all
core clause 45 code together.

Patch 5 adds the phy_interface_t types for XAUI and 10GBase-KR links.
As 10GBase-KR appears to be compatible with XFI and SFI, XFI and SFI,
I currently see no reason to add XFI and SFI interface modes.  There
seems to be vendor code out there using these, but they all alias back
to the same hardware settings.

Patch 6 adds support for the MV88X3310 PHY, which supports both the
copper and fiber interfaces.  It should be noted that the MV88X3310
automatically switches its MAC facing interface between 10GBase-KR
and SGMII depending on the negotiated speed.  This was discussed with
Florian, and we agreed to update the phy interface mode depending on
the properties of the actual link mode to the PHY.

v2:
- update sysfs-class-net-phydev documentation
- avoid genphy_aneg_done for non-C22 PHYs
- expand comment about 0x30 constant
- add comment about lack of reset
- configure driver using MARVELL_10G_PHY
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support

Add phylib support for the Marvell Alaska X 10 Gigabit PHY (MV88X3310).
This phy is able to operate at 10G, 1G, 100M and 10M speeds, and only
supports Clause 45 accesses.

The PHY appears (based on the vendor IDs) to be two different vendors
IP, with each devad containing several instances.

This PHY driver has only been tested with the RJ45 copper port, fiber
port and a Marvell Armada 8040-based ethernet interface.

It should be noted that to use the full range of speeds, MAC drivers
need to also reconfigure the link mode as per phydev->interface, since
the PHY automatically changes its interface mode depending on the
negotiated speed.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add XAUI and 10GBASE-KR PHY connection types

XAUI allows XGMII to reach an extended distance by using a XGXS layer at
each end of the MAC to PHY link, operating over four Serdes lanes.

10GBASE-KR is a single lane Serdes backplane ethernet connection method
with autonegotiation on the link. Some PHYs use this to connect to the
ethernet interface at 10G speeds, switching to other connection types
when utilising slower speeds.

10GBASE-KR is also used for XFI and SFI to connect to XFP and SFP fiber
modules.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: split out 10G genphy support

Move the old 10G genphy support to sit beside the new clause 45 library
functions, so all the 10G phy code is together.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: hook up clause 45 autonegotiation restart

genphy_restart_aneg() can only restart autonegotiation on clause 22
PHYs. Add a phy_restart_aneg() function which selects between the
clause 22 and clause 45 restart functionality depending on the PHY
type and whether the Clause 45 PHY supports the Clause 22 register set.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support

Avoid calling genphy_aneg_done() for PHYs that do not implement the
Clause 22 register set.

Clause 45 PHYs may implement the Clause 22 register set along with the
Clause 22 extension MMD. Hence, we can't simply block access to the
Clause 22 functions based on the PHY being a Clause 45 PHY.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add 802.3 clause 45 support to phylib

Add generic helpers for 802.3 clause 45 PHYs for >= 10Gbps support.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Made TCP congestion control documentation match current reality,
    from Anmol Sarma.

2) Various build warning and failure fixes from Arnd Bergmann.

3) Fix SKB list leak in ipv6_gso_segment().

4) Use after free in ravb driver, from Eugeniu Rosca.

5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.

6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
    Piccoli.

7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
    Zhang.

8) Use after free in vxlan deletion, from Mark Bloch.

9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
    Filippov.

10) Fix stmmac hangs with TSO, from Niklas Cassel.

11) Fix crash in CALIPSO ipv6, from Richard Haines.

12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.

13) Fix regression in sk_err socket error queue handling, noticed by
    ping applications. From Soheil Hassas Yeganeh.

14) Update mlx4/mlx5 MAINTAINERS information.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
  net: stmmac: fix a broken u32 less than zero check
  net: stmmac: fix completely hung TX when using TSO
  net: ethoc: enable NAPI before poll may be scheduled
  net: bridge: fix a null pointer dereference in br_afspec
  ravb: Fix use-after-free on `ifconfig eth0 down`
  net/ipv6: Fix CALIPSO causing GPF with datagram support
  net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
  Revert "sit: reload iphdr in ipip6_rcv"
  i40e/i40evf: proper update of the page_offset field
  i40e: Fix state flags for bit set and clean operations of PF
  iwlwifi: fix host command memory leaks
  iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
  iwlwifi: mvm: clear new beacon command template struct
  iwlwifi: mvm: don't fail when removing a key from an inexisting sta
  iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
  iwlwifi: mvm: fix firmware debug restart recording
  iwlwifi: tt: move ucode_loaded check under mutex
  iwlwifi: mvm: support ibss in dqa mode
  iwlwifi: mvm: Fix command queue number on d0i3 flow
  iwlwifi: mvm: rs: start using LQ command color
  ...

Merge git://git./linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

1) Fix TLB context wrap races, from Pavel Tatashin.

2) Cure some gcc-7 build issues.

3) Handle invalid setup_hugepagesz command line values properly, from
    Liam R Howlett.

4) Copy TSB using the correct address shift for the huge TSB, from Mike
    Kravetz.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: delete old wrap code
  sparc64: new context wrap
  sparc64: add per-cpu mm of secondary contexts
  sparc64: redefine first version
  sparc64: combine activate_mm and switch_mm
  sparc64: reset mm cpumask after wrap
  sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
  sparc: Machine description indices can vary
  sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
  arch/sparc: support NR_CPUS = 4096
  sparc64: Add __multi3 for gcc 7.x and later.
  sparc64: Fix build warnings with gcc 7.
  arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5

compiler, clang: suppress warning for unused static inline functions

GCC explicitly does not warn for unused static inline functions for
-Wunused-function. The manual states:

Warn whenever a static function is declared but not defined or
a non-inline static function is unused.

Clang does warn for static inline functions that are unused.

It turns out that suppressing the warnings avoids potentially complex
#ifdef directives, which also reduces LOC.

Suppress the warning for clang.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'sparc64-context-wrap-fixes'

Pavel Tatashin says:

====================
sparc64: context wrap fixes

This patch series contains fixes for context wrap: when we are out of
context ids, and need to get a new version.

It fixes memory corruption issues which happen when more than number of
context ids (currently set to 8K) number of processes are started
simultaneously, and processes can get a wrong context.

sparc64: new context wrap:
- contains explanation of new wrap method, and also explanation of races
that it solves
sparc64: reset mm cpumask after wrap
- explains issue of not reseting cpu mask on a wrap
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: delete old wrap code

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: new context wrap

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: add per-cpu mm of secondary contexts

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: redefine first version

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: combine activate_mm and switch_mm

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: reset mm cpumask after wrap

After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.

hugetlb_bad_size needs to be called on invalid values. Also change the
pr_warn to a pr_err to better align with other platforms.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: Machine description indices can vary

VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: mm: fix copy_tsb to correctly copy huge page TSBs

When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used.  As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB.  When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB.  We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arch/sparc: support NR_CPUS = 4096

Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: cgroup skb progs cannot access ld_abs/ind

Commit fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to
access sk_buff") enabled programs of BPF_PROG_TYPE_CGROUP_SKB
type to use ld_abs/ind instructions. However, at this point,
we cannot use them, since offsets relative to SKF_LL_OFF will
end up pointing skb_mac_header(skb) out of bounds since in the
egress path it is not yet set at that point in time, but only
after __dev_queue_xmit() did a general reset on the mac header.
bpf_internal_load_pointer_neg_helper() will then end up reading
data from a wrong offset.

BPF_PROG_TYPE_CGROUP_SKB programs can use bpf_skb_load_bytes()
already to access packet data, which is also more flexible than
the insns carried over from cBPF.

Fixes: fb9a307d11d6 ("bpf: Allow CGROUP_SKB eBPF program to access sk_buff")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: fix a broken u32 less than zero check

The check that queue is less or equal to zero is always true
because queue is a u32; queue is decremented and will wrap around
and never go -ve. Fix this by making queue an int.

Detected by CoverityScan, CID#1428988 ("Unsigned compared against 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: fix completely hung TX when using TSO

stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.

This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.

When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.

When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: use symmetric hash

Tun actually expects a symmetric hash for queue selecting to work
correctly, otherwise packets belongs to a single flow may be
redirected to the wrong queue. So this patch switch to use
__skb_get_hash_symmetric().

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>