git.openwrt.org Git - openwrt/staging/blogic.git/log

ionic: support longer tx sg lists

The version 1 Tx queues can use longer SG lists than the
original version 0 queues, but we need to check to see if the
firmware supports the v1 Tx queues. This implements the queue
type query for all queue types, and uses the information to
set up for using the longer Tx SG lists.

Because the Tx SG list can be longer, we need to limit the
max ring length to be sure we stay inside the boundaries of a
DMA allocation max size, so we lower the max Tx ring size.

The driver sets its highest known version in the Q_IDENTITY
command, and the FW returns the highest version that it knows,
bounded by the driver's version. The negotiated version number
is later used in the Q_INIT commands.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

checkpatch: warn about uses of ENOTSUPP

ENOTSUPP often feels like the right error code to use, but it's
in fact not a standard Unix error. E.g.:

$ python
>>> import errno
>>> errno.errorcode[errno.ENOTSUPP]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'errno' has no attribute 'ENOTSUPP'

There were numerous commits converting the uses back to EOPNOTSUPP
but in some cases we are stuck with the high error code for backward
compatibility reasons.

Let's try prevent more ENOTSUPPs from getting into the kernel.

Recent example:
https://lore.kernel.org/netdev/20200510182252.GA411829@lunn.ch/

v3 (Joe):
- fix the "not file" condition.

v2 (Joe):
- add a link to recent discussion,
- don't match when scanning files, not patches to avoid sudden
influx of conversion patches.
https://lore.kernel.org/netdev/20200511165319.2251678-1-kuba@kernel.org/

v1:
https://lore.kernel.org/netdev/20200510185148.2230767-1-kuba@kernel.org/

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'improve-msg_control-kernel-vs-user-pointer-handling'

Christoph Hellwig says:

====================
improve msg_control kernel vs user pointer handling

this series replace the msg_control in the kernel msghdr structure
with an anonymous union and separate fields for kernel vs user
pointers. In addition to helping a bit with type safety and reducing
sparse warnings, this also allows to remove the set_fs() in
kernel_recvmsg, helping with an eventual entire removal of set_fs().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: cleanly handle kernel vs user buffers for ->msg_control

The msg_control field in struct msghdr can either contain a user
pointer when used with the recvmsg system call, or a kernel pointer
when used with sendmsg. To complicate things further kernel_recvmsg
can stuff a kernel pointer in and then use set_fs to make the uaccess
helpers accept it.

Replace it with a union of a kernel pointer msg_control field, and
a user pointer msg_control_user one, and allow kernel_recvmsg operate
on a proper kernel pointer using a bitfield to override the normal
choice of a user pointer for recvmsg.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/scm: cleanup scm_detach_fds

Factor out two helpes to keep the code tidy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a CMSG_USER_DATA macro

Add a variant of CMSG_DATA that operates on user pointer to avoid
sparse warnings about casting to/from user pointers. Also fix up
CMSG_DATA to rely on the gcc extension that allows void pointer
arithmetics to cut down on the amount of casts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-Constify-two-tagger-ops'

Florian Fainelli says:

====================
net: dsa: Constify two tagger ops

This patch series constifies the dsa_device_ops for ocelot and sja1105
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: tag_sja1105: Constify dsa_device_ops

sja1105_netdev_ops should be const since that is what the DSA layer
expects.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ocelot: Constify dsa_device_ops

ocelot_netdev_ops should be const since that is what the DSA layer
expects.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-remove-nic_data-usage-in-common-code'

Edward Cree says:

====================
sfc: remove nic_data usage in common code

efx->nic_data should only be used from NIC-specific code (i.e. nic_type
functions and things they call), in files like ef10[_sriov].c and
siena.c. This series refactors several nic_data usages from common
code (mainly in mcdi_filters.c) into nic_type functions, in preparation
for the upcoming ef100 driver which will use those functions but have
its own struct layout for efx->nic_data distinct from ef10's.
After this series, one nic_data usage (in ptp.c) remains; it wasn't
clear to me how to fix it, and ef100 devices don't yet have PTP support
(so the initial ef100 driver will not call that code).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: make firmware-variant printing a nic_type function

Instead of having efx_mcdi_print_fwver() look at efx_nic_rev and
conditionally poke around inside ef10-specific nic_data, add a new
efx->type->print_additional_fwver() method to do this work.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: make filter table probe caller responsible for adding VLANs

By making the caller of efx_mcdi_filter_table_probe() loop over the
vlan_list calling efx_mcdi_filter_add_vlan(), instead of doing it in
efx_mcdi_filter_table_probe(), the latter avoids looking in ef10-
specific nic_data.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: move rx_rss_context_exclusive into struct efx_mcdi_filter_table

It's both set and used solely by mcdi_filters.c, so there's no reason
for it to be in ef10-specific nic_data.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: rework handling of (firmware) multicast chaining state

Store the mc_chaining bit in struct efx_mcdi_filter_table, so that common
code in mcdi_filters.c doesn't need to get it from ef10-specific nic_data.
Also, probe the firmware workaround just before the call to
efx_mcdi_filter_table_probe(), rather than in a random other part of the
driver bringup, to ensure that (a) it gets probed in time and (b) it gets
reprobed as necessary on resets, no matter how the surrounding code gets
reorganised and reordered.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: move 'must restore' flags out of ef10-specific nic_data

Common code in mcdi_filters.c uses these flags, so by moving them to
either struct efx_nic (in the case of must_realloc_vis) or struct
efx_mcdi_filter_table (for must_restore_rss_contexts and
must_restore_filters), decouple this code from ef10's nic_data.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: use efx_has_cap for capability checks outside of NIC-specific code

Removes some efx_ef10_nic_data references from common code.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: make capability checking a nic_type function

Various MCDI functions (especially in filter handling) need to check the
datapath caps, but those live in nic_data (since they don't exist on
Siena). Decouple from ef10-specific data structures by adding check_caps
to the nic_type, to allow using these functions from non-ef10 drivers.

Also add a convenience macro efx_has_cap() to reduce the amount of
boilerplate involved in calling it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: move vport_id to struct efx_nic

Remove some usage of ef10-specific nic_data structs from common MCDI
functions, in preparation for using them from a non-EF10 driver.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Optimize-the-qed-allocations-inside-kdump-kernel'

Bhupesh Sharma says:

====================
net: Optimize the qed* allocations inside kdump kernel

Changes since v1:
----------------
- v1 can be seen here: http://lists.infradead.org/pipermail/kexec/2020-May/024935.html
- Addressed review comments received on v1:
  * Removed unnecessary paranthesis.
  * Used a different macro for minimum RX/TX ring count value in kdump
    kernel.

Since kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs, large memory allocations done by a network driver
can cause the crashkernel to panic with OOM.

The qed* drivers take up approximately 214MB memory when run in the
kdump kernel with the default configuration settings presently used in
the driver. With an usual crashkernel size of 512M, this allocation
is equal to almost half of the total crashkernel size allocated.

See some logs obtained via memstrack tool (see [1]) below:
dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)

This patchset tries to reduce the overall memory allocation profile of
the qed* driver when they run in the kdump kernel. With these
optimization we can see a saving of approx 85M in the kdump kernel:
dracut-pre-pivot[671]: ======== Report format module_summary: ========
dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
<..snip..>
dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 4.6MB (74 pages)

And the kdump kernel can save vmcore successfully via both ssh and nfs
interfaces.

This patchset contains two patches:
[PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
[PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
              normally not a supported kdump target for saving
      vmcore).

[1]. Memstrack tool: https://github.com/ryncsn/memstrack
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qed: Disable SRIOV functionality inside kdump kernel

Since we have kdump kernel(s) running under severe memory constraint
it makes sense to disable the qed SRIOV functionality when running the
kdump kernel as kdump configurations on several distributions don't
support SRIOV targets for saving the vmcore (see [1] for example).

Currently the qed SRIOV functionality ends up consuming memory in
the kdump kernel, when we don't really use the same.

An example log seen in the kdump kernel with the SRIOV functionality
enabled can be seen below (obtained via memstrack tool, see [2]):
dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)

This patch disables the SRIOV functionality inside kdump kernel and with
the same applied the memory consumption goes down:
dracut-pre-pivot[671]: ======== Report format module_summary: ========
dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)

[1]. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qed*: Reduce RX and TX default ring count when running inside kdump kernel

Normally kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs.

Currently the qed* ethernet driver ends up consuming a lot of memory in
the kdump kernel, leading to kdump kernel panic when one tries to save
the vmcore via ssh/nfs (thus utilizing the services of the underlying
qed* network interfaces).

An example OOM message log seen in the kdump kernel can be seen here
[1], with crashkernel size reservation of 512M.

Using tools like memstrack (see [2]), we can track the modules taking up
the bulk of memory in the kdump kernel and organize the memory usage
output as per 'highest allocator first'. An example log for the OOM case
indicates that the qed* modules end up allocating approximately 216M
memory, which is a large part of the total crashkernel size:

dracut-pre-pivot[676]: ======== Report format module_summary: ========
dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)

This patch reduces the default RX and TX ring count from 1024 to 64
when running inside kdump kernel, which leads to a significant memory
saving.

An example log with the patch applied shows the reduced memory
allocation in the kdump kernel:
dracut-pre-pivot[674]: ======== Report format module_summary: ========
dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 141.8MB (2268 pages)
<..snip..>
[dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 4.9MB (78 pages)

Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
network interface after applying this patch.

[1] OOM log:
------------

kworker/0:6: page allocation failure: order:6,
mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
kworker/0:6 cpuset=/ mems_allowed=0
CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
01/18/2019
Workqueue: events work_for_cpu_fn
Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x24/0x30
  dump_stack+0x90/0xb4
  warn_alloc+0xf4/0x178
  __alloc_pages_nodemask+0xcac/0xd58
  alloc_pages_current+0x8c/0xf8
  kmalloc_order_trace+0x38/0x108
  qed_iov_alloc+0x40/0x248 [qed]
  qed_resc_alloc+0x224/0x518 [qed]
  qed_slowpath_start+0x254/0x928 [qed]
   __qede_probe+0xf8/0x5e0 [qede]
  qede_probe+0x68/0xd8 [qede]
  local_pci_probe+0x44/0xa8
  work_for_cpu_fn+0x20/0x30
  process_one_work+0x1ac/0x3e8
  worker_thread+0x44/0x448
  kthread+0x130/0x138
  ret_from_fork+0x10/0x18
  Cannot start slowpath
  qede: probe of 0000:05:00.1 failed with error -12

[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hinic: add link_ksettings ethtool_ops support

add set_link_ksettings implementation and improve the implementation
of get_link_ksettings

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: atarilance: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cross-chip-bridging-for-disjoint-dsa-trees'

Vladimir Oltean says:

====================
This series adds support for boards where DSA switches of multiple types
are cascaded together. Actually this type of setup was brought up before
on netdev, and it looks like utilizing disjoint trees is the way to go:

https://lkml.org/lkml/2019/7/7/225

The trouble with disjoint trees (prior to this patch series) is that only
bridging of ports within the same hardware switch can be offloaded.
After scratching my head for a while, it looks like the easiest way to
support hardware bridging between different DSA trees is to bridge their
DSA masters and extend the crosschip bridging operations.

I have given some thought to bridging the DSA masters with the slaves
themselves, but given the hardware topology described in the commit
message of patch 4/4, virtually any number (and combination) of bridges
(forwarding domains) can be created on top of those 3x4-port front-panel
switches. So it becomes a lot less obvious, when the front-panel ports
are enslaved to more than 1 bridge, which bridge should the DSA masters
be enslaved to.

So the least awkward approach was to just create a completely separate
bridge for the DSA masters, whose entire purpose is to permit hardware
forwarding between the discrete switches beneath it.

This is a direct resend of v3, which was deferred due to lack of review.
In the meantime Florian has reviewed and tested some of them.

v1 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200429161952.17769-1-olteanv@gmail.com/

v2 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200430202542.11797-1-olteanv@gmail.com/

v3 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200503221228.10928-1-olteanv@gmail.com/
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: sja1105: implement cross-chip bridging operations

sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
and which is compatible with cascading. A complete description of this
tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
each external-facing port tags incoming frames with a unique pvid, and
this special VLAN is transmitted as tagged towards the inside of the
system, and as untagged towards the exterior. The tag encodes the switch
id and the source port index.

This means that cross-chip bridging for dsa_8021q only entails adding
the dsa_8021q pvids of one switch to the RX filter of the other
switches. Everything else falls naturally into place, as long as the
bottom-end of ports (the leaves in the tree) is comprised exclusively of
dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
a chance that a front-panel switch transmits a packet tagged with a
dsa_8021q header, header which it wouldn't be able to remove, and which
would hence "leak" out.

The only use case I tested (due to lack of board availability) was when
the sja1105 switches are part of disjoint trees (however, this doesn't
change the fact that multiple sja1105 switches still need unique switch
identifiers in such a system). But in principle, even "true" single-tree
setups (with DSA links) should work just as fine, except for a small
change which I can't test: dsa_towards_port should be used instead of
dsa_upstream_port (I made the assumption that the routing port that any
sja1105 should use towards its neighbours is the CPU port. That might
not hold true in other setups).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: introduce a dsa_switch_find function

Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch
structure pointer by searching for its tree index and switch index (the
parameters from dsa,member). To be used, for example, by drivers who
implement .crosschip_bridge_join and need a reference to the other
switch indicated to by the tree_index and sw_index arguments.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: permit cross-chip bridging between all trees in the system

One way of utilizing DSA is by cascading switches which do not all have
compatible taggers. Consider the following real-life topology:

      +---------------------------------------------------------------+
      | LS1028A                                                       |
      |               +------------------------------+                |
      |               |      DSA master for Felix    |                |
      |               |(internal ENETC port 2: eno2))|                |
      |  +------------+------------------------------+-------------+  |
      |  | Felix embedded L2 switch                                |  |
      |  |                                                         |  |
      |  | +--------------+   +--------------+   +--------------+  |  |
      |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
      |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
      |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
      +--+-+--------------+---+--------------+---+--------------+--+--+

+-----------------------+ +-----------------------+ +-----------------------+
|   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
+-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
|sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
+-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+

The above can be described in the device tree as follows (obviously not
complete):

mscc_felix {
dsa,member = <0 0>;
ports {
port@4 {
ethernet = <&enetc_port2>;
};
};
};

sja1105_switch1 {
dsa,member = <1 1>;
ports {
port@4 {
ethernet = <&mscc_felix_port1>;
};
};
};

sja1105_switch2 {
dsa,member = <2 2>;
ports {
port@4 {
ethernet = <&mscc_felix_port2>;
};
};
};

sja1105_switch3 {
dsa,member = <3 3>;
ports {
port@4 {
ethernet = <&mscc_felix_port3>;
};
};
};

Basically we instantiate one DSA switch tree for every hardware switch
in the system, but we still give them globally unique switch IDs (will
come back to that later). Having 3 disjoint switch trees makes the
tagger drivers "just work", because net devices are registered for the
3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
port. So packets received on the ENETC port are stripped of their
stacked DSA tags one by one.

Currently, hardware bridging between ports on the same sja1105 chip is
possible, but switching between sja1105 ports on different chips is
handled by the software bridge. This is fine, but we can do better.

In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
In other words, a sja1105 switch can correctly parse and route a packet
containing a dsa_8021q tag. So if we could enable hardware bridging on
the Felix DSA master ports, cross-chip bridging could be completely
offloaded.

Such as system would be used as follows:

ip link add dev br0 type bridge && ip link set dev br0 up
for port in sw0p0 sw0p1 sw0p2 sw0p3 \
    sw1p0 sw1p1 sw1p2 sw1p3 \
    sw2p0 sw2p1 sw2p2 sw2p3; do
ip link set dev $port master br0
done

The above makes switching between ports on the same row be performed in
hardware, and between ports on different rows in software. Now assume
the Felix switch ports are called swp0, swp1, swp2. By running the
following extra commands:

ip link add dev br1 type bridge && ip link set dev br1 up
for port in swp0 swp1 swp2; do
ip link set dev $port master br1
done

the CPU no longer sees packets which traverse sja1105 switch boundaries
and can be forwarded directly by Felix. The br1 bridge would not be used
for any sort of traffic termination.

For this to work, we need to give drivers an opportunity to listen for
bridging events on DSA trees other than their own, and pass that other
tree index as argument. I have made the assumption, for the moment, that
the other existing DSA notifiers don't need to be broadcast to other
trees. That assumption might turn out to be incorrect. But in the
meantime, introduce a dsa_broadcast function, similar in purpose to
dsa_port_notify, which is used only by the bridging notifiers.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bridge: allow enslaving some DSA master network devices

Commit 8db0a2ee2c63 ("net: bridge: reject DSA-enabled master netdevices
as bridge members") added a special check in br_if.c in order to check
for a DSA master network device with a tagging protocol configured. This
was done because back then, such devices, once enslaved in a bridge
would become inoperative and would not pass DSA tagged traffic anymore
due to br_handle_frame returning RX_HANDLER_CONSUMED.

But right now we have valid use cases which do require bridging of DSA
masters. One such example is when the DSA master ports are DSA switch
ports themselves (in a disjoint tree setup). This should be completely
equivalent, functionally speaking, from having multiple DSA switches
hanging off of the ports of a switchdev driver. So we should allow the
enslaving of DSA tagged master network devices.

Instead of the regular br_handle_frame(), install a new function
br_handle_frame_dummy() on these DSA masters, which returns
RX_HANDLER_PASS in order to call into the DSA specific tagging protocol
handlers, and lift the restriction from br_add_if.

Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-hns3-misc-updates-for-next'

Huazhong Tan says:

====================
net: hns3: misc updates for -next

This patchset includes some misc updates for the HNS3 ethernet driver.

#1 & #2 add two cleanups.
#3 provides an interface for the client to query the CMDQ's status.
#4 adds a little optimization about debugfs.
#5 prevents 1000M auto-negotiation off setting.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: disable auto-negotiation off with 1000M setting in ethtool

The 802.3 specification does not specify the behavior of
auto-negotiation off with 1000M in PHY. Therefore, some PHY
compatibility issues occur. This patch forbids the setting of
this unreasonable mode by ethtool in driver.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: optimized the judgment of the input parameters of dump ncl config

This patch optimizes the judgment of the input parameters of dump ncl
config by checking the number and value of the input parameters apart.
It's clearer and more reasonable.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: provide .get_cmdq_stat interface for the client

This patch provides a new interface for the client to query
whether CMDQ is ready to work.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: modify two uncorrect macro names

According to the UM, command 0x0B03 and 0x0B13 are used to
query the statistics about TX and RX, not the status, so
modifies the unsuitable macro name of these two command.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hns3: remove a redundant register macro definition

HCLGE_MISC_VECTOR_INT_STS and HCLGE_VECTOR_PF_OTHER_INT_STS_REG
both represent the misc interrupt status register(0x20800), so
removes HCLGE_VECTOR_PF_OTHER_INT_STS_REG and replaces it with
HCLGE_MISC_VECTOR_INT_STS.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'Ethernet-Cable-test-support'

Andrew Lunn says:

====================
Ethernet Cable test support

any copper Ethernet PHY have support for performing diagnostics of
the cable. Are the cable shorted, broken, not plugged into anything at
the other end? And they can report roughly how far along the cable any
fault is.

Add infrastructure in ethtool and phylib support for triggering a
cable test and reporting the results. The Marvell 1G PHY driver is
then extended to make use of this infrastructure.

For testing, a modified ethtool(1) can be found here:
https://github.com/lunn/ethtool.git feature/cable-test-v4. This also
contains extra code for TDR dump, which will be added to the kernel in
a later patch series.

Thanks to Chris Healy for extensive testing.

v2:
See individual patches but:

Remove _REPLY messages
Change length into a u32
Grammar fixes
Rename functions for consistency
Extack for cable test already running
Remove ethnl_cable_test_act_ops
Add status attributes
Rename pairs from numbers to letters

v3:

See individual patches but:
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
Remove unused cable_test_get_policy
Fixed example in document
Add ETHTOOL_A_CABLE_NEST_* enum
Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation
Poison phydev->skb
Return -EMSGSIZE when ethnl_bcastmsg_put() fails
Return valid error code when nla_nest_start() fails
Use u8 for results
Actually put u32 length into message
s/mavell/marvell/g
Remove include of <uapi/linux/ethtool_netlink.h>
EMSGSIZE when ethnl_bcastmsg_put() fails
Print an error message on failure, since this is a void function.

v4:
See individual patches but:
Remove unwanted blank line
ENOTSUPP->EOPNOTSUPP
Move EINVAL->EMSGSIZE fix to correct patch
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Send notifier when starting the cable test

Given that it takes time to run a cable test, send a notify message at
the start, as well as when it is completed.

v3:
EMSGSIZE when ethnl_bcastmsg_put() fails
Print an error message on failure, since this is a void function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Put interface into oper testing during cable test

Since running a cable test is disruptive, put the interface into
operative state testing while the test is running.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: marvell: Add cable test support

The Marvell PHYs have a couple of different register sets for
performing cable tests. Page 7 provides the simplest to use.

v3:
s/mavell/marvell/g
Remove include of <uapi/linux/ethtool_netlink.h>

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Add helpers for reporting test results

The PHY drivers can use these helpers for reporting the results. The
results get translated into netlink attributes which are added to the
pre-allocated skbuf.

v3:
Poison phydev->skb
Return -EMSGSIZE when ethnl_bcastmsg_put() fails
Return valid error code when nla_nest_start() fails
Use u8 for results
Actually put u32 length into message

v4:
s/ENOTSUPP/EOPNOTSUPP/g

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Add infrastructure for reporting cable test results

Provide infrastructure for PHY drivers to report the cable test
results. A netlink skb is associated to the phydev. Helpers will be
added which can add results to this skb. Once the test has finished
the results are sent to user space.

When netlink ethtool is not part of the kernel configuration stubs are
provided. It is also impossible to trigger a cable test, so the error
code returned by the alloc function is of no consequence.

v2:
Include the status complete in the netlink notification message

v4:
Replace -EINVAL with -EMSGSIZE

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Make helpers public

Make some helpers for building ethtool netlink messages available
outside the compilation unit, so they can be used for building
messages which are not simple get/set.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Add attributes for cable test reports

Add the attributes needed to report cable test results to userspace.
The reports are expected to be per twisted pair. A nested property per
pair can report the result of the cable test. A nested property can
also report the length of the cable to any fault.

v2:
Grammar fixes
Change length from u16 to u32
s/DEV/HEADER/g
Add status attributes
Rename pairs from numbers to letters.

v3:
Fixed example in document
Add ETHTOOL_A_CABLE_NEST_* enum
Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: netlink: Add support for triggering a cable test

Add new ethtool netlink calls to trigger the starting of a PHY cable
test.

Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when
ETHTOOL_NETLINK is builtin, which would result in kernel linking errors.

v2:
Remove unwanted white space change
Remove ethnl_cable_test_act_ops and use doit handler
Rename cable_test_set_policy cable_test_act_policy
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY

v3:
Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
Remove unused cable_test_get_policy
Add Reviewed-by tags

v4:
Remove unwanted blank line

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add support for polling cable test

Some PHYs are not capable of generating interrupts when a cable test
finished. They do however support interrupts for normal operations,
like link up/down. As such, the PHY state machine would normally not
poll the PHY.

Add support for indicating the PHY state machine must poll the PHY
when performing a cable test.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: phy: Add cable test support to state machine

Running a cable test is desruptive to normal operation of the PHY and
can take a 5 to 10 seconds to complete. The RTNL lock cannot be held
for this amount of time, and add a new state to the state machine for
running a cable test.

The driver is expected to implement two functions. The first is used
to start a cable test. Once the test has started, it should return.

The second function is called once per second, or on interrupt to
check if the cable test is complete, and to allow the PHY to report
the status.

v2:
Rename phy_cable_test_abort to phy_abort_cable_test
Return different extack when already running test
Use phy_init_hw() to reset the PHY

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: qmi_wwan: remove redundant assignment to variable status

The variable status is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: huawei_cdc_ncm: remove redundant assignment to variable ret

The variable ret is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: usb: ax88179_178a: remove redundant assignment to variable ret

The variable ret is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dsa: sja1105: fix semicolon.cocci warnings

drivers/net/dsa/sja1105/sja1105_ethtool.c:481:11-12: Unneeded semicolon

Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: ae1804de93f6 ("dsa: sja1105: dynamically allocate stats structure")
CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

octeontx2-pf: Use the napi_alloc_frag() to alloc the pool buffers

In the current codes, the octeontx2 uses its own method to allocate
the pool buffers, but there are some issues in this implementation.
1. We have to run the otx2_get_page() for each allocation cycle and
   this is pretty error prone. As I can see there is no invocation
   of the otx2_get_page() in otx2_pool_refill_task(), this will leave
   the allocated pages have the wrong refcount and may be freed wrongly.
2. It wastes memory. For example, if we only receive one packet in a
   NAPI RX cycle, and then allocate a 2K buffer with otx2_alloc_rbuf()
   to refill the pool buffers and leave the remain area of the allocated
   page wasted. On a kernel with 64K page, 62K area is wasted.

IMHO it is really unnecessary to implement our own method for the
buffers allocate, we can reuse the napi_alloc_frag() to simplify
our code.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

IB/mlx4: Replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-spectrum-Enforce-some-HW-limitations-for-matchall-TC-offload'

Ido Schimmel says:

====================
mlxsw: spectrum: Enforce some HW limitations for matchall TC offload

Jiri says:

There are some limitations for TC matchall classifier offload that are
given by the mlxsw HW dataplane. It is not possible to do sampling on
egress and also the mirror/sample vs. ACL (flower) ordering is fixed. So
check this and forbid to offload incorrect setup.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: tc_restrictions: add couple of test for the correct matchall-flower ordering

Make sure that the drive restricts incorrect order of inserted matchall
vs. flower rules.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: tc_restrictions: add test to check sample action restrictions

Check that matchall rules with sample actions are not possible to be
inserted to egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mlxsw: rename tc_flower_restrictions.sh to tc_restrictions.sh

The file is about to contain matchall restrictions too, so change the
name to make it more generic.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_flower: Forbid to insert flower rules in collision with matchall rules

On ingress, the matchall rules doing mirroring and sampling are offloaded
into hardware blocks that are processed before any flower rules.
On egress, the matchall mirroring rules are offloaded into hardware
block that is processed after all flower rules.

Therefore check the priorities of inserted flower rules against
existing matchall rules and ensure the correct ordering.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_matchall: Forbid to insert matchall rules in collision with flower rules

On ingress, the matchall rules doing mirroring and sampling are offloaded
into hardware blocks that are processed before any flower rules.
On egress, the matchall mirroring rules are offloaded into hardware
block that is processed after all flower rules.

Therefore check the priorities of inserted matchall rules against
existing flower rules and ensure the correct ordering.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_matchall: Expose a function to get min and max rule priority

Introduce an infrastructure that allows to get minimum and maximum
rule priority for specified chain. This is going to be used by
a subsequent patch to enforce ordering between flower and
matchall filters.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_matchall: Put matchall list into substruct of flow struct

As there are going to be other matchall specific fields in flow
structure, put the existing list field into matchall substruct.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_flower: Expose a function to get min and max rule priority

Introduce an infrastructure that allows to get minimum and maximum
rule priority for specified chain. This is going to be used by
a subsequent patch to enforce ordering between flower and
matchall filters.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: spectrum_matchall: Restrict sample action to be allowed only on ingress

HW supports packet sampling on ingress only. Check and fail if user
is adding sample on egress.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

hinic: add three net_device_ops of vf

adds ndo_set_vf_rate/ndo_set_vf_spoofchk/ndo_set_vf_link_state
to configure netdev of virtual function

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'mlx5-updates-2020-05-09' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-05-09

This series includes updates to mlx5 netdev driver and bonding updates
to support getting the next active tx slave.

1) merge commit with mlx5-next that includes bonding updates from Maor
Bonding: Add support to get xmit slave
2) Maxim makes some general code improvements to TX data path
3) Tariq makes some general code improvements to kTLS and mlx5 accel layer
in preparation for mlx5 TLS RX.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-atlantic-driver-updates'

Mark Starovoytov says:

====================
net: atlantic: driver updates

This patch series contains several minor cleanups for the previously
submitted series.

We also add Marvell copyrights on newly touched files.

v2:
* accommodated review comments related to the last patch in series
(MAC generation)

v1: https://patchwork.ozlabs.org/cover/1285011/
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: unify MAC generation

This patch unifies invalid MAC address handling with other drivers.

Basically we've switched to using standard APIs (is_valid_ether_addr /
eth_hw_addr_random) where possible.
It's worth noting that some of engineering Aquantia NICs might be
provisioned with a partially zeroed out MAC, which is still invalid,
but not caught by is_valid_ether_addr(), so we've added a special
handling for this case.

Also adding a warning in case of fallback to random MAC, because
this shouldn't be needed on production NICs, they should all be
provisioned with unique MAC.

NB! Default systemd/udevd configuration is 'MACAddressPolicy=persistent'.
This causes MAC address to be persisted across driver reloads and
reboots. We had to change it to 'none' for verification purposes.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: remove check for boot code survivability before reset request

This patch removes unnecessary check for boot code survivability before
reset request.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: remove hw_atl_b0_hw_rss_set call from A2 code

No need to call hw_atl_b0_hw_rss_set from hw_atl2_hw_rss_set

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: remove TPO2 check from A0 code

TPO2 was introduced in B0 only, no reason to check for it in A0 code.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: rename AQ_NIC_RATE_2GS to AQ_NIC_RATE_2G5

This patch changes the constant name to a more logical "2G5"
(for 2.5G speeds).

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: minor MACSec code cleanup

This patch fixes a couple of minor merge issues found in macsec_api.c
after corresponding patch series has been applied.

These are not real bugs, so pushing to net-next.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: atlantic: use __packed instead of the full expansion.

This patches fixes the review comment made by Jakub Kicinski
in the "net: atlantic: A2 support" patch series.

Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Enhance ICOSQ WQE info fields

The same WQE opcode might be used in different ICOSQ flows
and WQE types.
To have a better distinguishability, replace it with an enum that
better indicates the WQE type and flow it is used for.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5: Accel, Remove unnecessary header include

The include of Ethernet driver header in core is not needed
and actually wrong.
Remove it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Use struct assignment for WQE info updates

Struct assignment looks more clean, and implies resetting
the not assigned fields to zero, instead of holding values
from older ring cycles.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Take TX WQE info structures out of general EN header

Into the txrx header file.
The mlx5e_sq_wqe_info structure describes WQE info for the ICOSQ,
rename it to better reflect this.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: kTLS, Do not fill edge for the DUMP WQEs in TX flow

Every single DUMP WQE resides in a single WQEBB.
As the pi is calculated per each one separately, there is
no real need for a contiguous room for them, allow them to populate
different WQ fragments.
This reduces WQ waste and improves its utilization.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: kTLS, Fill work queue edge separately in TX flow

For the static and progress context params WQEs, do the edge
filling separately.
This improves the WQ utilization, code readability, and reduces
the chance of future bugs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Split TX acceleration offloads into two phases

After previous modifications, the offloads are no longer called one by
one, the pi is calculated and the wqe is cleared on between of TLS and
IPSEC offloads, which doesn't quite fit mlx5e_accel_handle_tx's purpose.

This patch splits mlx5e_accel_handle_tx into two functions that
correspond to two logical phases of running offloads:

1. Before fetching a WQE. Here runs the code that can post WQEs on its
own, before the main WQE is fetched. It's the main part of TLS offload.

2. After fetching a WQE. Here runs the code that updates the WQE's
fields, but can't post other WQEs any more. It's a minor part of TLS
offload that sets the tisn field in the cseg, and eseg-based offloads
(currently IPSEC, and later patches will move GENEVE and checksum
offloads there, too).

It allows to make mlx5e_xmit take care of all actions needed to transmit
a packet in the right order, improve the structure of the code and
reduce unnecessary operations. The structure will be further improved in
the following patches (all eseg-based offloads will be moved to a single
place, and reserving space for the main WQE will happen between phase 1
and phase 2 of offloads to eliminate unneeded data movements).

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Update UDP fields of the SKB for GSO first

mlx5e_udp_gso_handle_tx_skb updates the length field in the UDP header
in case of GSO. It doesn't interfere with other offloads, so do it first
to simplify further restructuring of the code. This way we'll make all
independent modifications to the SKB before starting to work with WQEs.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Make TLS offload independent of wqe and pi

TLS offload may write a 32-bit field (tisn) to the cseg of the WQE. To
do that, it receives pi and wqe pointers. As TLS offload may also send
additional WQEs, it has to update pi and wqe, and in many cases it even
doesn't use pi calculated before and wqe zeroed before and does it
itself. Also, mlx5e_sq_xmit has to copy the whole cseg if it goes to the
mlx5e_fill_sq_frag_edge flow. This all is not efficient.

It's more efficient to do the following:

1. Just return tisn from TLS offload and make the caller fill it in a
more appropriate place.

2. Calculate pi and clear wqe after calling TLS offload.

3. If TLS offload has to send WQEs, calculate pi and clear wqe just
before that. It's already done in all places anyway, so this commit
allows to remove some redundant memsets and calls.

Copying of cseg will be eliminated in one of the following commits, and
all other stuff is done here.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Pass only eseg to IPSEC offload

IPSEC offload needs to modify the eseg of the WQE that is being filled,
but it receives a pointer to the whole WQE. To make the contract
stricter, pass only the pointer to the eseg of that WQE. This commit is
preparation for the following refactoring of offloads in the TX path and
for the MPWQE support.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Return void from mlx5e_sq_xmit and mlx5i_sq_xmit

mlx5e_sq_xmit and mlx5i_sq_xmit always return NETDEV_TX_OK. Drop the
return value to simplify the code.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Unify checks of TLS offloads

Both INNOVA and ConnectX TLS offloads perform the same checks in the
beginning. Unify them to reduce repeating code. Do WARN_ON_ONCE on
netdev mismatch and finish with an error in both offloads, not only in
the ConnectX one.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Return bool from TLS and IPSEC offloads

TLS and IPSEC offloads currently return struct sk_buff *, but the value
is either NULL or the same skb that was passed as a parameter. Return
bool instead to provide stronger guarantees to the calling code (it
won't need to support handling a different SKB that could be potentially
returned before this change) and to simplify restructuring this code in
the following commits.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux

This merge includes updates to bonding driver needed for the rdma stack,
to avoid conflicts with the RDMA branch.

Maor Gottlieb Says:

====================
Bonding: Add support to get xmit slave

The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_get_xmit_slave. Every LAG module can
implement it and it first implemented in the bond driver.
This is follow-up to the RFC discussion [1].

The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.

The first part of this patchset introduces the new .ndo and add the
support to the bonding module.

The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new
.ndo.

The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.
====================

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net: lio_core: remove redundant assignment to variable tx_done

The variable tx_done is being assigned with a value that is never read
as the function returns a few statements later. The assignment is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/atheros: remove redundant assignment to variable size

The variable size is being assigned with a value that is never read,
the assignment is redundant and cab be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cnic: remove redundant assignment to variable ret

The variable ret is being assigned with a value that is never read,
the assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: tg3: tidy up loop, remove need to compute off with a multiply

Currently the value for 'off' is computed using a multiplication and
a couple of statements later off is being incremented by len and
this value is never read. Clean up the code by removing the
multiplication and just increment off by len on each iteration.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'r8169-sync-few-functionalities-with-vendor-driver'

Heiner Kallweit says:

====================
r8169: sync few functionalities with vendor driver

Add few helpers (with names copied from vendor drivers) to make clearer
what the respective code is doing. In addition improve reset preparation
for chips from RTL8168g.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: improve reset handling for chips from RTL8168g

Sync the reset preparation for chips from RTL8168g with the r8168 and
r8125 vendor drivers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: add helper rtl_wait_txrx_fifo_empty

Add a helper for waiting for FIFO's to be empty, again the name is
borrowed from the vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: add helper rtl_enable_rxdvgate

Add a helper for setting RXDV_GATED_EN, the 2ms delay is copied from
the vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: add helper r8168g_wait_ll_share_fifo_ready

Create a helper for this waiting function, name of the helper is
borrowed from the vendor driver. In addition don't return in the two
hw_init functions if the first wait runs into a timeout, there's no
benefit in doing so.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: Remove ipa_endpoint_stop{,_rx_dma} again

When building arm64 allyesconfig:

drivers/net/ipa/ipa_endpoint.c: In function 'ipa_endpoint_stop_rx_dma':
drivers/net/ipa/ipa_endpoint.c:1274:13: error: 'IPA_ENDPOINT_STOP_RX_SIZE' undeclared (first use in this function)
drivers/net/ipa/ipa_endpoint.c:1274:13: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ipa/ipa_endpoint.c:1289:2: error: implicit declaration of function 'ipa_cmd_dma_task_32b_addr_add' [-Werror=implicit-function-declaration]
drivers/net/ipa/ipa_endpoint.c:1291:45: error: 'ENDPOINT_STOP_DMA_TIMEOUT' undeclared (first use in this function)
drivers/net/ipa/ipa_endpoint.c: In function 'ipa_endpoint_stop':
drivers/net/ipa/ipa_endpoint.c:1309:16: error: 'IPA_ENDPOINT_STOP_RX_RETRIES' undeclared (first use in this function)

These functions were removed in a series, merged in as
commit 33395f4a5c1b ("Merge branch 'net-ipa-kill-endpoint-stop-workaround'").

Remove them again so that the build works properly.

Fixes: 3793faad7b5b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()

We currently have to adjust ipv6 route gc_thresh/max_size depending
on number of cpus on a server, this makes very little sense.

If the kernels sets /proc/sys/net/ipv6/route/gc_thresh to 1024
and /proc/sys/net/ipv6/route/max_size to 4096, then we better
not track the percpu dst that our implementation uses.

Only routes not added (directly or indirectly) by the admin
should be tracked and limited.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Maciej Żenczykowski <maze@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: sja1105: remove set but not used variable 'prev_time'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/dsa/sja1105/sja1105_vl.c:468:6: warning: variable ‘prev_time’ set but not used [-Wunused-but-set-variable]
u32 prev_time = 0;
^~~~~~~~~

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Samuel Zou <zou_wei@huawei.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: vsc73xx: convert to devm_platform_ioremap_resource

Use the helper function that wraps the calls to platform_get_resource()
and devm_ioremap_resource() together.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>