git.openwrt.org Git - openwrt/staging/blogic.git/log

netfilter: flowtable: restrict flow dissector match on meta ingress device

Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: fetch stats only if flow is still alive

Do not fetch statistics if flow has expired since it might not in
hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING
check from nf_flow_offload_stats() since this flag is never set on.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>

netfilter: nft_bitwise: correct uapi header comment.

The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations.  However, the row for OR:

mask    xor
0       x

is incorrect.

  dreg = (sreg & 0) ^ x

is not equivalent to:

  dreg = sreg | x

What the code actually does is:

  dreg = (sreg & ~x) ^ x

Update the documentation to match.

Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

sfc: remove duplicated include from efx.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: fix typos in qed documentation

Review of the recently added documentation file for the qed driver
noticed a couple of typos. Fix them now.

Noticed-by: Michal Kalderon <mkalderon@marvell.com>
Fixes: 0f261c3ca09e ("devlink: add a driver-specific file for the qed driver")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pptp: support sockets bound to an interface

use sk_bound_dev_if for route lookup as already done
in most of the other ip_route_output_ports() calls.

Since most PPPoA providers use 10.0.0.138 as default gateway IP
this will allow connections to multiple PPTP providers with the
same IP address over different interfaces.

Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batadv-next-for-davem-20200114' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

- bump version strings, by Simon Wunderlich

- fix typo and kerneldocs, by Sven Eckelmann

- use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer

- silence some endian sparse warnings by adding annotations,
by Sven Eckelmann

- Update copyright years to 2020, by Sven Eckelmann

- Disable deprecated sysfs configuration by default, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-add-vlan-notifications-and-rtm-support'

Nikolay Aleksandrov says:

====================
net: bridge: add vlan notifications and rtm support

This patch-set is a prerequisite for adding per-vlan options support
because we need to be able to send vlan-only notifications and do larger
vlan netlink dumps. Per-vlan options are needed as we move the control
more to vlans and would like to add per-vlan state (needed for per-vlan
STP and EVPN), per-vlan multicast options and control, and I'm sure
there would be many more per-vlan options coming.
Now we create/delete/dump vlans with the device AF_SPEC attribute which is
fine since we support vlan ranges or use a compact bridge_vlan_info
structure, but that cannot really be extended to support per-vlan options
well. The biggest issue is dumping them - we tried using the af_spec with
a new vlan option attribute but that led to insufficient message size
quickly, also another minor problem with that is we have to dump all vlans
always when notifying which, with options present, can be huge if they have
different options set, so we decided to add new rtm message types
specifically for vlans and register handlers for them and a new bridge vlan
notification nl group for vlan-only notifications.
The new RTM NEW/DEL/GETVLAN types introduced match the current af spec
bridge functionality and in fact use the same helpers.
The new nl format is:
[BRIDGE_VLANDB_ENTRY]
    [BRIDGE_VLANDB_ENTRY_INFO] - bridge_vlan_info (either 1 vlan or
                                                   range start)
    [BRIDGE_VLANDB_ENTRY_RANGE] - range end

This allows to encapsulate a range in a single attribute and also to
create vlans and immediately set options on all of them with a single
attribute. The GETVLAN dump can span multiple messages and dump all the
necessary information. The vlan-only notifications are sent on
NEW/DELVLAN events or when vlan options change (currently only flags),
we try hard to compress the vlans into ranges in the notifications as
well. When the per-vlan options are added we'll add helpers to check for
option equality between neighbor vlans and will keep compressing them
when possible.

Note patch 02 is not really required, it's just a nice addition to have
human-readable error messages from the different vlan checks.

iproute2 changes and selftests will be sent with the next set which adds
the first per-vlan option - per-vlan state similar to the port state.

v2: changed patch 03 and patch 04 to use nlmsg_parse() in order to
    strictly validate the msg and make sure there are no remaining bytes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: notify on vlan add/delete/change flags

Now that we can notify, send a notification on add/del or change of flags.
Notifications are also compressed when possible to reduce their number
and relieve user-space of extra processing, due to that we have to
manually notify after each add/del in order to avoid double
notifications. We try hard to notify only about the vlans which actually
changed, thus a single command can result in multiple notifications
about disjoint ranges if there were vlans which didn't change inside.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add rtnetlink group and notify support

Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add rtm range support

Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
similar vlans into ranges. This will be also used when per-vlan options
are introduced and vlans' options match, they will be put into a single
range which is encapsulated in one netlink attribute. We need to run
similar checks as br_process_vlan_info() does because these ranges will
be used for options setting and they'll be able to skip
br_process_vlan_info().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add del rtm message support

Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to
map DELVLAN to DELLINK and register the handler.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add new rtm message support

Add initial RTM_NEWVLAN support which can only create vlans, operating
similar to the current br_afspec(). We will use it later to also change
per-vlan options. Old-style (flag-based) vlan ranges are not allowed
when using RTM messages, we will introduce vlan ranges later via a new
nested attribute which would allow us to have all the information about a
range encapsulated into a single nl attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add rtm definitions and dump support

This patch adds vlan rtm definitions:
- NEWVLAN: to be used for creating vlans, setting options and
notifications
- DELVLAN: to be used for deleting vlans
- GETVLAN: used for dumping vlan information

Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: netlink: add extack error messages when processing vlans

Add extack messages on vlan processing errors. We need to move the flags
missing check after the "last" check since we may have "last" set but
lack a range end flag in the next entry.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: add helpers to check for vlan id/range validity

Add helpers to check if a vlan id or range are valid. The range helper
must be called when range start or end are detected.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Add-route-offload-indication'

Ido Schimmel says:

====================
net: Add route offload indication

This patch set adds offload indication to IPv4 and IPv6 routes. So far
offload indication was only available for the nexthop via
'RTNH_F_OFFLOAD', which is problematic as a nexthop is usually shared
between multiple routes.

Based on feedback from Roopa and David on the RFC [1], the indication is
split to 'offload' and 'trap'. This is done because not all the routes
present in hardware actually offload traffic from the kernel. For
example, host routes merely trap packets to the kernel. The two flags
are dumped to user space via the 'rtm_flags' field in the ancillary
header of the rtnetlink message.

In addition, the patch set uses the new flags in order to test the FIB
offload API by adding a dummy FIB offload implementation to netdevsim.
The new tests are added to a shared library and can be therefore shared
between different drivers.

Patches #1-#3 add offload indication to IPv4 routes.
Patches #4 adds offload indication to IPv6 routes.
Patches #5-#6 add support for the offload indication in mlxsw.
Patch #7 adds dummy FIB offload implementation in netdevsim.
Patches #8-#10 add selftests.

v2 (feedback from David Ahern):
* Patch #2: Name last argument of fib_dump_info()
* Patch #2: Move 'struct fib_rt_info' to include/net/ip_fib.h so that it
could later be passed to fib_alias_hw_flags_set()
* Patch #3: Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
* Patch #6: Convert to new fib_alias_hw_flags_set() interface
* Patch #7: Convert to new fib_alias_hw_flags_set() interface

[1] https://patchwork.ozlabs.org/cover/1170530/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: Add test for FIB offload API

The test reuses the common FIB offload tests in order to make sure that
mlxsw correctly implements FIB offload.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: netdevsim: Add test for FIB offload API

Test various aspects of the FIB offload API on top of the netdevsim
implementation. Both good and bad flows are tested.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: forwarding: Add helpers and tests for FIB offload

Implement a set of common helpers and tests for FIB offload that can be
used by multiple drivers to check their FIB offload implementations.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: fib: Add dummy implementation for FIB offload

Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing
currently "programmed" routes in a hash table. Each route in the hash
table is marked with "trap" indication. The indication is cleared when
the route is replaced or when the netdevsim instance is deleted.

This will later allow us to test the route offload API on top of
netdevsim.

v2:
* Convert to new fib_alias_hw_flags_set() interface

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Set hardware flags for routes

Previous patches added support for two hardware flags for IPv4 and IPv6
routes: 'RTM_F_OFFLOAD' and 'RTM_F_TRAP'. Both indicate the presence of
the route in hardware. The first indicates that traffic is actually
offloaded from the kernel, whereas the second indicates that packets
hitting such routes are trapped to the kernel for processing (e.g., host
routes).

Use these two flags in mlxsw. The flags are modified in two places.
Firstly, whenever a route is updated in the device's table. This
includes the addition, deletion or update of a route. For example, when
a host route is promoted to perform NVE decapsulation, its action in the
device is updated, the 'RTM_F_OFFLOAD' flag set and the 'RTM_F_TRAP'
flag cleared.

Secondly, when a route is replaced and overwritten by another route, its
flags are cleared.

v2:
* Convert to new fib_alias_hw_flags_set() interface

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Separate nexthop offload indication from route

The driver currently uses the 'RTNH_F_OFFLOAD' flag for both routes and
nexthops, which is cumbersome and unnecessary now that we have separate
flag for the route itself.

Separate the offload indication for nexthops from routes and call it
whenever the offload state within the nexthop group changes.

Note that IPv6 (unlike IPv4) does not share the same nexthop group
between different routes, whereas mlxsw does. Therefore, whenever the
offload indication within an IPv6 nexthop group changes, all the linked
routes need to be updated.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Add "offload" and "trap" indications to routes

In a similar fashion to previous patch, add "offload" and "trap"
indication to IPv6 routes.

This is done by using two unused bits in 'struct fib6_info' to hold
these indications. Capable drivers are expected to set these when
processing the various in-kernel route notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Add "offload" and "trap" indications to routes

When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Encapsulate function arguments in a struct

fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).

The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.

v2:
* Name last argument of fib_dump_info()
* Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
later be passed to fib_alias_hw_flags_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Replace route in list before notifying

Subsequent patches will add an offload / trap indication to routes which
will signal if the route is present in hardware or not.

After programming the route to the hardware, drivers will have to ask
the IPv4 code to set the flags by passing the route's key.

In the case of route replace, the new route is notified before it is
actually inserted into the FIB alias list. This can prevent simple
drivers (e.g., netdevsim) that program the route to the hardware in the
same context it is notified in from being able to set the flag.

Solve this by first inserting the new route to the list and rollback the
operation in case the route was vetoed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: socionext: get rid of huge dma sync in netsec_alloc_rx_data

Socionext driver can run on dma coherent and non-coherent devices.
Get rid of huge dma_sync_single_for_device in netsec_alloc_rx_data since
now the driver can let page_pool API to managed needed DMA sync

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'QRTR-flow-control-improvements'

Bjorn Andersson says:

====================
QRTR flow control improvements

In order to prevent overconsumption of resources on the remote side QRTR
implements a flow control mechanism.

Move the handling of the incoming confirm_rx to the receiving process to
ensure incoming flow is controlled. Then implement outgoing flow
control, using the recommended algorithm of counting outstanding
non-confirmed messages and blocking when hitting a limit. The last three
patches refactors the node assignment and port lookup, in order to
remove the worker in the receive path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Remove receive worker

Rather than enqueuing messages and scheduling a worker to deliver them
to the individual sockets we can now, thanks to the previous work, move
this directly into the endpoint callback.

This saves us a context switch per incoming message and removes the
possibility of an opportunistic suspend to happen between the message is
coming from the endpoint until it ends up in the socket's receive
buffer.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Make qrtr_port_lookup() use RCU

The important part of qrtr_port_lookup() wrt synchronization is that the
function returns a reference counted struct qrtr_sock, or fail.

As such we need only to ensure that an decrement of the object's
refcount happens inbetween the finding of the object in the idr and
qrtr_port_lookup()'s own increment of the object.

By using RCU and putting a synchronization point after we remove the
mapping from the idr, but before it can be released we achieve this -
with the benefit of not having to hold the mutex in qrtr_port_lookup().

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Migrate node lookup tree to spinlock

Move operations on the qrtr_nodes radix tree under a separate spinlock
and make the qrtr_nodes tree GFP_ATOMIC, to allow operation from atomic
context in a subsequent patch.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Implement outgoing flow control

In order to prevent overconsumption of resources on the remote side QRTR
implements a flow control mechanism.

The mechanism works by the sender keeping track of the number of
outstanding unconfirmed messages that has been transmitted to a
particular node/port pair.

Upon count reaching a low watermark (L) the confirm_rx bit is set in the
outgoing message and when the count reaching a high watermark (H)
transmission will be blocked upon the reception of a resume_tx message
from the remote, that resets the counter to 0.

This guarantees that there will be at most 2H - L messages in flight.
Values chosen for L and H are 5 and 10 respectively.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qrtr: Move resume-tx transmission to recvmsg

The confirm-rx bit is used to implement a per port flow control, in
order to make sure that no messages are dropped due to resource
exhaustion. Move the resume-tx transmission to recvmsg to only confirm
messages as they are consumed by the application.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: Allow configuration of IPv6 source address range

Pktgen can use only one IPv6 source address from output device or src6
command setting. In pressure test we need create lots of sessions more
than 65535. So add src6_min and src6_max command to set the range.

Signed-off-by: Niu Xilei <niu_xilei@163.com>
Changes since v3:
- function set_src_in6_addr use static instead of static inline
- precompute min_in6_l,min_in6_h,max_in6_h,max_in6_l in setup time
Changes since v2:
- reword subject line
Changes since v1:
- only create IPv6 source address over least significant 64 bit range

Signed-off-by: David S. Miller <davem@davemloft.net>

nfc: No need to set .owner platform_driver_register

the i2c_add_driver will set the .owner to THIS_MODULE

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'skb_list_walk_safe-refactoring'

Jason A. Donenfeld says:

====================
skb_list_walk_safe refactoring for net/*'s skb_gso_segment usage

This patchset adjusts all return values of skb_gso_segment in net/* to
use the new skb_list_walk_safe helper.

First we fix a minor bug in the helper macro that didn't come up in the
last patchset's uses. Then we adjust several cases throughout net/. The
xfrm changes were a bit hairy, but doable. Reading and thinking about
the code in mac80211 indicates a memory leak, which the commit
addresses. All the other cases were pretty trivial.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mac80211: use skb_list_walk_safe helper for gso segments

This is a conversion case for the new function, keeping the flow of the
existing code as intact as possible. We also switch over to using
skb_mark_not_on_list instead of a null write to skb->next.

Finally, this code appeared to have a memory leak in the case where
header building fails before the last gso segment. In that case, the
remaining segments are not freed. So this commit also adds the proper
kfree_skb_list call for the remainder of the skbs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netfilter: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: openvswitch: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function, keeping
the flow of the existing code as intact as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: xfrm: use skb_list_walk_safe helper for gso segments

This is converts xfrm segment iteration to use the new function, keeping
the flow of the existing code as intact as possible. One case is very
straight-forward, whereas the other case has some more subtle code that
likes to peak at ->next and relink skbs. By keeping the variables the
same as before, we can upgrade this code with minimal surgery required.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: udp: use skb_list_walk_safe helper for gso segments

This is a straight-forward conversion case for the new function,
iterating over the return value from udp_rcv_segment, which actually is
a wrapper around skb_gso_segment.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: disambiguate argument and member for skb_list_walk_safe helper

This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macsec-hw-offload'

Antoine Tenart says:

====================
net: macsec: initial support for hardware offloading

This series intends to add support for offloading MACsec transformations
to hardware enabled devices. The series adds the necessary
infrastructure for offloading MACsec configurations to hardware drivers,
in patches 1 to 5; then introduces MACsec offloading support in the
Microsemi MSCC PHY driver, in patches 6 to 10.

The series can also be found at:
https://github.com/atenart/linux/tree/net-next/macsec

IProute2 modifications can be found at:
https://github.com/atenart/iproute2/tree/macsec

MACsec hardware offloading infrastructure
-----------------------------------------

Linux has a software implementation of the MACsec standard. There are
hardware engines supporting MACsec operations, such as the Intel ixgbe
NIC and some Microsemi PHYs (the one we use in this series). This means
the MACsec offloading infrastructure should support networking PHY and
MAC drivers. Note that MAC driver preliminary support is part of this
series, but should not be merged before we actually have a provider for
this.

We do intend in this series to re-use the logic, netlink API and data
structures of the existing MACsec software implementation. This allows
not to duplicate definitions and structure storing the same information;
as well as using the same userspace tools to configure both software or
hardware offloaded MACsec flows (with `ip macsec`).

When adding a new MACsec virtual interface the existing logic is kept:
offloading is disabled by default. A user driven configuration choice is
needed to switch to offloading mode (a patch in iproute2 is needed for
this). A single MACsec interface can be offloaded for now, and some
limitations are there: no flow can be moved from one implementation to
the other so the decision needs to be done before configuring the
interface.

MACsec offloading ops are called in 2 steps: a preparation one, and a
commit one. The first step is allowed to fail and should be used to
check if a provided configuration is compatible with a given MACsec
capable hardware. The second step is not allowed to fail and should
only be used to enable a given MACsec configuration.

A limitation as of now is the counters and statistics are not reported
back from the hardware to the software MACsec implementation. This
isn't an issue when using offloaded MACsec transformations, but it
should be added in the future so that the MACsec state can be reported
to the user (which would also improve the debug).

Microsemi PHY MACsec support
----------------------------

In order to add support for the MACsec offloading feature in the
Microsemi MSCC PHY driver, the __phy_read_page and __phy_write_page
helpers had to be exported. This is because the initialization of the
PHY is done while holding the MDIO bus lock, and we need to change the
page to configure the MACsec block.

The support itself is then added in three patches. The first one adds
support for configuring the MACsec block within the PHY, so that it is
up, running and available for future configuration, but is not doing any
modification on the traffic passing through the PHY. The second patch
implements the phy_device MACsec ops in the Microsemi MSCC PHY driver,
and introduce helpers to configure MACsec transformations and flows to
match specific packets. The last one adds support for PN rollover.

Thanks!
Antoine

Since v5:
  - Fixed a compilation issue due to an inclusion from an UAPI header.
  - Added an EXPORT_SYMBOL_GPL for the PN rollover helper, to fix module
    compilation issues.
  - Added a dependency for the MSCC driver on MACSEC || MACSEC=n.
  - Removed the patches including the MAC offloading support as they are
    not to be applied for now.

Since v4:
  - Reworked the MACsec read and write functions in the MSCC PHY driver
    to remove the conditional locking.

Since v3:
  - Fixed a check when enabling offloading that was too restrictive.
  - Fixed the propagation of the changelink event to the underlying
    device drivers.

Since v2:
  - Allow selection the offloading from userspace, defaulting to the
    software implementation when adding a new MACsec interface. The
    offloading mode is now also reported through netlink.
  - Added support for letting MKA packets in and out when using MACsec
    (there are rules to let them bypass the MACsec h/w engine within the
    PHY).
  - Added support for PN rollover (following what's currently done in
    the software implementation: the flow is disabled).
  - Split patches to remove MAC offloading support for now, as there are
    no current provider for this (patches are still included).
  - Improved a few parts of the MACsec support within the MSCC PHY
    driver (e.g. default rules now block non-MACsec traffic, depending
    on the configuration).
  - Many cosmetic fixes & small improvements.

Since v1:
  - Reworked the MACsec offloading API, moving from a single helper
    called for all MACsec configuration operations, to a per-operation
    function that is provided by the underlying hardware drivers.
  - Those functions now contain a verb to describe the configuration
    action they're offloading.
  - Improved the error handling in the MACsec genl helpers to revert
    the configuration to its previous state when the offloading call
    failed.
  - Reworked the file inclusions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: PN rollover support

This patch adds support for handling MACsec PN rollover in the mscc PHY
driver. When a flow rolls over, an interrupt is fired. This patch adds
the logic to check all flows and identify the one rolling over in the
handle_interrupt PHY helper, then disables the flow and report the event
to the MACsec core.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: PN wrap callback

Allow to call macsec_pn_wrapped from hardware drivers to notify when a
PN rolls over. Some drivers might used an interrupt to implement this.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: macsec support

This patch adds MACsec offloading support to some Microsemi PHYs, to
configure flows and transformations so that matched packets can be
processed by the MACsec engine, either at egress, or at ingress.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: macsec initialization

This patch adds support for initializing the MACsec engine found within
some Microsemi PHYs. The engine is initialized in a passthrough mode and
does not modify any incoming or outgoing packet. But thanks to this it
now can be configured to perform MACsec transformations on packets,
which will be supported by a future patch.

The MACsec read and write functions are wrapped into two versions: one
called during the init phase, and the other one later on. This is
because the init functions in the Microsemi PHY driver are called while
the MDIO bus lock is taken.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: add nla support for changing the offloading selection

MACsec offloading to underlying hardware devices is disabled by default
(the software implementation is used). This patch adds support for
changing this setting through the MACsec netlink interface. Many checks
are done when enabling offloading on a given MACsec interface as there
are limitations (it must be supported by the hardware, only a single
interface can be offloaded on a given physical device at a time, rules
can't be moved for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: hardware offloading infrastructure

This patch introduces the MACsec hardware offloading infrastructure.

The main idea here is to re-use the logic and data structures of the
software MACsec implementation. This allows not to duplicate definitions
and structure storing the same kind of information. It also allows to
use a unified genlink interface for both MACsec implementations (so that
the same userspace tool, `ip macsec`, is used with the same arguments).
The MACsec offloading support cannot be disabled if an interface
supports it at the moment.

The MACsec configuration is passed to device drivers supporting it
through macsec_ops which are called from the MACsec genl helpers. Those
functions call the macsec ops of PHY and Ethernet drivers in two steps:
a preparation one, and a commit one. The first step is allowed to fail
and should be used to check if a provided configuration is compatible
with the features provided by a MACsec engine, while the second step is
not allowed to fail and should only be used to enable a given MACsec
configuration. Two extra calls are made: when a virtual MACsec interface
is created and when it is deleted, so that the hardware driver can stay
in sync.

The Rx and TX handlers are modified to take in account the special case
were the MACsec transformation happens in the hardware, whether in a PHY
or in a MAC, as the packets seen by the networking stack on both the
physical and MACsec virtual interface are exactly the same. This leads
to some limitations: the hardware and software implementations can't be
used on the same physical interface, as the policies would be impossible
to fulfill (such as strict validation of the frames). Also only a single
virtual MACsec interface can be offloaded to a physical port supporting
hardware offloading as it would be impossible to guess onto which
interface a given packet should go (for ingress traffic).

Another limitation as of now is that the counters and statistics are not
reported back from the hardware to the software MACsec implementation.
This isn't an issue when using offloaded MACsec transformations, but it
should be added in the future so that the MACsec state can be reported
to the user (which would also improve the debug).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add MACsec ops in phy_device

This patch adds a reference to MACsec ops in the phy_device, to allow
PHYs to support offloading MACsec operations. The phydev lock will be
held while calling those helpers.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: introduce MACsec ops

This patch introduces MACsec ops for drivers to support offloading
MACsec operations.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: introduce the macsec_context structure

This patch introduces the macsec_context structure. It will be used
in the kernel to exchange information between the common MACsec
implementation (macsec.c) and the MACsec hardware offloading
implementations. This structure contains pointers to MACsec specific
structures which contain the actual MACsec configuration, and to the
underlying device (phydev for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macsec: move some definitions in a dedicated header

This patch moves some structure, type and identifier definitions into a
MACsec specific header. This patch does not modify how the MACsec code
is running and only move things around. This is a preparation for the
future MACsec hardware offloading support, which will re-use those
definitions outside macsec.c.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netns-Optimise-netns-ID-lookups'

Guillaume Nault says:

====================
netns: Optimise netns ID lookups

Netns ID lookups can be easily protected by RCU, rather than by holding
a spinlock.

Patch 1 prepares the code, patch 2 does the RCU conversion, and finally
patch 3 stops disabling BHs on updates (patch 2 makes that unnecessary).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netns: don't disable BHs when locking "nsid_lock"

When peernet2id() had to lock "nsid_lock" before iterating through the
nsid table, we had to disable BHs, because VXLAN can call peernet2id()
from the xmit path:
vxlan_xmit() -> vxlan_fdb_miss() -> vxlan_fdb_notify()
-> __vxlan_fdb_notify() -> vxlan_fdb_info() -> peernet2id().

Now that peernet2id() uses RCU protection, "nsid_lock" isn't used in BH
context anymore. Therefore, we can safely use plain
spin_lock()/spin_unlock() and let BHs run when holding "nsid_lock".

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: protect netns ID lookups with RCU

__peernet2id() can be protected by RCU as it only calls idr_for_each(),
which is RCU-safe, and never modifies the nsid table.

rtnl_net_dumpid() can also do lockless lookups. It does two nested
idr_for_each() calls on nsid tables (one direct call and one indirect
call because of rtnl_net_dumpid_one() calling __peernet2id()). The
netnsid tables are never updated. Therefore it is safe to not take the
nsid_lock and run within an RCU-critical section instead.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netns: Remove __peernet2id_alloc()

__peernet2id_alloc() was used for both plain lookups and for netns ID
allocations (depending the value of '*alloc'). Let's separate lookups
from allocations instead. That is, integrate the lookup code into
__peernet2id() and make peernet2id_alloc() responsible for allocating
new netns IDs when necessary.

This makes it clear that __peernet2id() doesn't modify the idr and
prepares the code for lockless lookups.

Also, mark the 'net' argument of __peernet2id() as 'const', since we're
modifying this line.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio_bus: Simplify reset handling and extend to non-DT systems

Convert mdiobus_register_reset() from open-coded DT-only optional reset
handling to reset_control_get_optional_exclusive(). This not only
simplifies the code, but also adds support for lookup-based resets on
non-DT systems.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Added IRQ print to phylink_bringup_phy()

The information about the PHY attached to the PHYLINK instance is useful
but is missing the IRQ prints that phy_attached_info() adds.
phy_attached_info() is a bit long and it would not be possible to use
phylink_info() anyway.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-ETF-support'

Jose Abreu says:

====================
net: stmmac: ETF support

This series adds the support for ETF scheduler in stmmac.

1) Starts adding the support by implementing Enhanced Descriptors in stmmac
main core. This is needed for ETF feature in XGMAC and QoS cores.

2) Integrates the ETF logic into stmmac TC core.

3) and 4) adds the HW specific support for ETF in XGMAC and QoS cores. The
IP feature is called TBS (Time Based Scheduling).

5) Enables ETF in GMAC5 IPK PCI entry for all Queues except Queue 0.

6) Adds the new TBS feature and even more information into the debugFS
HW features file.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: selftests: Add a test for TBS feature

Add a new test for TBS feature which is used in ETF scheduler. In this
test, we send a packet with a launch time specified as now + 500ms and
check if the packet was transmitted on that time frame.

Changes from v2:
- Use the TBS bitfield
- Remove debug message

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: selftests: Switch to dev_direct_xmit()

In the upcoming commit for TBS selftest we will need to send a packet on
a specific Queue. As stmmac fallsback to netdev_pick_tx() on the select
Queue callback, we need to switch all selftests logic to
dev_direct_xmit() so that we can send the given SKB on a specific Queue.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Add missing information in DebugFS capabilities file

Adds more information regarding HW Capabilities in the corresponding
DebugFS file.

Changes from v2:
- Remove the TX/RX queues in use (Jakub)

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: pci: Enable TBS on GMAC5 IPK PCI entry

Enable TBS support on GMAC5 PCI entry for all Queues except Queue 0.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: gmac4+: Add TBS support

Adds all the necessary HW hooks to support TBS feature in QoS cores.

Changes from v1:
- Remove unneeded LT shift as the IP already does this.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: xgmac: Add TBS support

Adds all the necessary HW hooks to support TBS feature in XGMAC cores.

Changes from v1:
- Remove unneeded LT shift as the IP already does this.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: tc: Add support for ETF Scheduler using TBS

Adds the support for ETF scheduler using TBS feature which is available
in XGMAC and QoS IPs.

Changes from v2:
- Fix checkpatch issues (Jakub)
- Use the TBS bitfield

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: Initial support for TBS

Adds the initial hooks for TBS support. This needs a 32 byte descriptor
in order for it to work with current HW. Adds all the logic for Enhanced
Descriptors in main core but no HW related logic for now.

Changes from v2:
- Use bitfield for TBS status / support (Jakub)
- Remove unneeded cache alignment (Jakub)
- Fix checkpatch issues

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amd-xgbe: remove unnecessary conversion to bool

The conversion to bool is not needed, remove it.

Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptr_ring: add include of linux/mm.h

Commit 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails")
started to use kvmalloc_array and kvfree, which are defined in mm.h,
the previous functions kcalloc and kfree, which are defined in slab.h.

Add the missing include of linux/mm.h. This went unnoticed as other
include files happened to include mm.h.

Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mvneta: change page pool nid to NUMA_NO_NODE

With 'commit 44768decb7c0 ("page_pool: handle page recycle for NUMA_NO_NODE
condition")' we can safely change nid to NUMA_NO_NODE and accommodate
future NUMA aware hardware using mvneta network interface

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc/ethtool_common: Make some function to static

Fix sparse warning:

drivers/net/ethernet/sfc/ethtool_common.c
  warning: symbol 'efx_fill_test' was not declared. Should it be static?
  warning: symbol 'efx_fill_loopback_test' was not declared.
           Should it be static?
  warning: symbol 'efx_describe_per_queue_stats' was not declared.
           Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: amd: a2065: Use print_hex_dump_debug() helper

Use the print_hex_dump_debug() helper, instead of open-coding the same
operations.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: amd: a2065: Kill Sun LANCE relics

Remove unused fields, copied from the Sun LANCE driver eons ago.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'IXP4xx-networking-cleanups'

Linus Walleij says:

====================
IXP4xx networking cleanups

This is a patch series which jams together Arnds and mine
cleanups for the IXP4xx networking.

I also have patches for device tree support but that
requires more elaborate work, this series is some of
mine and some of Arnds patches that is a good foundation
for his multiplatform work and my device tree work.

These are for application to the networking tree so
that can be taken in one separate sweep.

I have tested the patches for a bit using zeroday builds
and some boots on misc IXP4xx devices and haven't run
into any major problems. We might find some new stuff
as a result from the new compiler coverage.

I had to depromote enabling compiler coverage at one
point in the v2 set because it depended on other patches
making the code more generic.

The change in v3 was simply dropping one offending
patch hardcoding base addresses into the driver.

The change in v4 drops a stable@ tag that was
unnecessary.

This v5 is a rebase of the v4 patch set on top of
net-next.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ixp4xx: Use parent dev for DMA pool

Use the netdevice struct device .parent field when calling
dma_pool_create(): the .dma_coherent_mask and .dma_mask
pertains to the bus device on the hardware (platform)
bus in this case, not the struct device inside the network
device. This makes the pool allocation work.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ARM/net: ixp4xx: Pass ethernet physical base as resource

In order to probe this ethernet interface from the device tree
all physical MMIO regions must be passed as resources. Begin
this rewrite by first passing the port base address as a
resource for all platforms using this driver, remap it in
the driver and avoid using any reference of the statically
mapped virtual address in the driver.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ehernet: ixp4xx: Use netdev_* messages

Simplify and correct a bunch of messages using printk
directly to use the netdev_* macros. I have not changed
all of them, just the low-hanging fruit.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ixp4xx: Use distinct local variable

Use "ndev" for the struct net_device and "dev" for the
struct device in probe() and remove(). Add the local
"dev" pointer for later use in refactoring.

Take this opportunity to fix inverse christmas tree
coding style.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ixp4xx: Standard module init

The IXP4xx driver was initializing the MDIO bus before even
probing, in the callbacks supposed to be used for setting up
the module itself, and with the side effect of trying to
register the MDIO bus as soon as this module was loaded or
compiled into the kernel whether the device was discovered
or not.

This does not work with multiplatform environments.

To get rid of this: set up the MDIO bus from the probe()
callback and remove it in the remove() callback. Rename
the probe() and remove() calls to reflect the most common
conventions.

Since there is a bit of checking for the ethernet feature
to be present in the MDIO registering function, making the
whole module not even be registered if we can't find an
MDIO bus, we need something similar: register the MDIO
bus when the corresponding ethernet is probed, and
return -EPROBE_DEFER on the other interfaces until this
happens. If no MDIO bus is present on any of the
registered interfaces we will eventually bail out.

None of the platforms I've seen has e.g. MDIO on EthB
and only uses EthC, there is always a Ethernet hardware
on the NPE (B, C) that has the MDIO bus, we just might
have to wait for it.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixp4xx_eth: move platform_data definition

The platform data is needed to compile the driver as standalone,
so move it to a global location along with similar files.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ptp: ixp46x: move adjacent to ethernet driver

The ixp46x ptp driver has a somewhat unusual setup, where the ptp
driver and the ethernet driver are in different directories but
access the same registers that are defined a platform specific
header file.

Moving everything into drivers/net/ makes it look more like most
other ptp drivers and allows compile-testing this driver on
other targets.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wan: ixp4xx_hss: prepare compile testing

The ixp4xx_hss driver needs the platform data definition and the
system clock rate to be compiled. Move both into a new platform_data
header file.

This is a prerequisite for compile testing, but turning on compile
testing requires further patches to isolate the SoC headers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wan: ixp4xx_hss: fix compile-testing on 64-bit

Change the driver to use portable integer types to avoid
warnings during compile testing:

drivers/net/wan/ixp4xx_hss.c:863:21: error: cast to 'u32 *' (aka 'unsigned int *') from smaller integer type 'int' [-Werror,-Wint-to-pointer-cast]
        memcpy_swab32(mem, (u32 *)((int)skb->data & ~3), bytes / 4);
                           ^
drivers/net/wan/ixp4xx_hss.c:979:12: error: incompatible pointer types passing 'u32 *' (aka 'unsigned int *') to parameter of type 'dma_addr_t *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
                                              &port->desc_tab_phys)))
                                              ^~~~~~~~~~~~~~~~~~~~
include/linux/dmapool.h:27:20: note: passing argument to parameter 'handle' here
                     dma_addr_t *handle);
                                 ^

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlx4: Bump up MAX_MSIX from 64 to 128

On modern hardware with a large number of cpus and using XDP,
the current MSIX limit is insufficient. Bump the limit in
order to allow more queues.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: add some misc update about reset issue

This series includes some misc update relating to reset issue.
[patch 1/7] & [patch 2/7] splits hclge_reset()/hclgevf_reset()
into two parts: preparing and rebuilding. Since the procedure
of FLR should be separated out from the reset task([patch 3/7 &
patch 3/7]), then the FLR's processing can reuse these codes.

pci_error_handlers.reset_prepare() is void type function, so
[patch 6/7] & [patch 7/7] factor some codes related to PF
function reset to make the preparing done before .reset_prepare()
return.

BTW, [patch 5/7] enlarges the waiting time of reset for matching
the hardware's.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor the notification scheme of PF reset

hclge_reset_prepare_down() is only used to inform VF that PF is
going to do function reset, then using hclge_func_reset_sync_vf()
in hclge_reset_prepare_wait() to query whether VF is ready before
asserting PF function reset. To make the code more readable,
this patch uses a new function hclge_function_reset_notify_vf()
to do this job.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: modify hclge_func_reset_sync_vf()'s return type to void

When synchronizes with VFs fail before PF function reset,
PF driver should go on its function reset, otherwise it
can not run normally anymore. So, hclge_func_reset_sync_vf()
should not affect the processing of PF reset, this patch
modifies its return type to void.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: enlarge HCLGE_RESET_WAIT_CNT

When the load of firmware is high, its reset task may takes
more time(which will be as long as 35 seconds). So this
patch modifies HCLGE_RESET_WAIT_CNT to match the firmware's.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor the procedure of VF FLR

Currently, the actual work of VF FLR is handled in the reset task,
which is asynchronous. So in some case, if the preparing and
rebuilding are not done, then the VF FLR will trigger some problems,
for example, makes hardware go into chaos.

So this patch separates the process of VF FLR from reset task, and
adds a semaphore to serialize this reset and others.

When FLR's preparing fails, if there has other higher level reset
pending or failing times less than the HCLGE_FLR_RETRY_CNT, this
preparing should be retried, otherwise it will get into a wrong state.

BTW, while the hardware reports misc interrupt during pcie_flr(),
the driver can not receive this interrupt anymore, so disable it
when hclgevf_flr_prepare() return, and re-enable it when enter
hclgevf_flr_done().

Avoid declaring internal function hclgevf_enable_vector(), this patch
also moves its definition forward, and removes unused enum
hnae3_flr_state.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: refactor the precedure of PF FLR

Currently, the actual work of PF FLR is handled in the reset task,
which is asynchronous. So in some case, if the preparing and
rebuilding are not done, then the PF FLR will trigger some problems,
for example, makes hardware go into chaos.

So this patch separates the process of PF FLR from reset task, and
adds a semaphore to serialize this reset and others.

When FLR's preparing fails, if there has other higher level reset
pending or failing times less than the HCLGE_FLR_RETRY_CNT, this
preparing should be retried, otherwise PF and its VF may get into
wrong state.

BTW, while the hardware reports misc interrupt during pcie_flr(),
the driver can not receive this interrupt anymore, so disable it
when hclge_flr_prepare() return, and re-enable it when enter
hclge_flr_done().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: split hclgevf_reset() into preparing and rebuilding part

hclgevf_reset() is a little bloated, and the process of VF FLR will
be separated from the reset task later. So this patch splits
hclgevf_reset() into hclgevf_reset_prepare() and hclge_reset_rebuild(),
then FLR can also reuse these two functions. Also moves HNAE3_UP_CLIENT
into hclgevf_reset_stack().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: split hclge_reset() into preparing and rebuilding part

hclge_reset() is a little bloated, and the process of PF FLR will
be separated from the reset task later. So this patch splits
hclge_reset() into hclge_reset_prepare() and hclge_reset_rebuild(),
then FLR can also reuse these two functions.

BTW, since hclge_clear_reset_cause() and hclge_reset_prepare_up()
will not affect the device, so in hclge_reset_rebuild(), these
functions are called without rtnl_lock.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: remove set but not used variable 'nic_data'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/sfc/mcdi_functions.c: In function 'efx_mcdi_ev_init':
drivers/net/ethernet/sfc/mcdi_functions.c:79:28: warning:
variable 'nic_data' set but not used [-Wunused-but-set-variable]

commit 4438b587fe4b ("sfc: move MCDI event queue management code")
introduces this unused variable.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: remove duplicated include from ef10.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt: Detach page from page pool before sending up the stack

When running in XDP mode, pages come from the page pool, and should
be freed back to the same pool or specifically detached. Currently,
when the driver re-initializes, the page pool destruction is delayed
forever since it thinks there are oustanding pages.

Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'devlink-documentation-refactor'

Jacob Keller says:

====================
devlink documentation refactor

This series updates the devlink documentation, with a few primary goals

* move all of the devlink documentation into a dedicated subfolder
* convert that documentation to the reStructuredText format
* merge driver-specific documentations into a single file per driver
* add missing documentation, including per-driver and devlink generally

For each driver, I took the time to review the code and add further
documentation on the various features it currently supports. Additionally, I
added new documentation files for some of the features such as
devlink-dpipe, devlink-resource, and devlink-regions.

Note for the region snapshot triggering, I kept that as a separate patch as
that is based on work that has not yet been merged to net-next, and may
change.

I also improved the existing documentation for devlink-info and
devlink-param by adding a bit more of an introduction when converting it to
the rst format.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>