openwrt/staging/blogic.git
4 years agonet: phylink: extend clause 45 PHY validation workaround
Russell King [Fri, 13 Dec 2019 18:22:07 +0000 (18:22 +0000)]
net: phylink: extend clause 45 PHY validation workaround

Commit e45d1f5288b8 ("net: phylink: support Clause 45 PHYs on SFP+
modules") added a workaround to support clause 45 PHYs which
dynamically switch their interface mode on SFP+ modules.  This was
implemented by validating the PHYs supported/advertising using
PHY_INTERFACE_MODE_NA, rather than the specific interface mode that
we attached the PHY with.

However, we already have a situation where phylink is used to connect
a Marvell 88X3310 PHY which also behaves in exactly the same way, but
which seemingly doesn't need this.  The reason seems to be that the
mvpp2 driver sets a whole bunch of link modes for
PHY_INTERFACE_MODE_10GKR down to 10Mb/s, despite 10GBASE-R not actually
supporting anything but 10Gb/s speeds.

When testing with drivers that (correctly) take the mvneta approach,
where the validate() method only returns what can be supported /
advertised for the specified link mode, we find that Clause 45 PHYs do
not behave as we expect: their advertisement is restricted to what
the current link will support, rather than what the PHY supports
through its dynamic switching.

Extend this workaround to all such cases; if we have a Clause 45 PHY
attaching via any means, except in USXGMII, XAUI and RXAUI which are
all unable to support this dynamic switching or have other solutions
to it, then we need to validate using PHY_INTERFACE_MODE_NA.

This should allow mvpp2 to switch to a more conformant validate()
implementation.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: improve clause 45 PHY ksettings_set implementation
Russell King [Fri, 13 Dec 2019 18:22:02 +0000 (18:22 +0000)]
net: phylink: improve clause 45 PHY ksettings_set implementation

While testing ethtool with the Methode DM7052 module, it was noticed
that attempting to set the advertising mask results in the mask being
truncated to the support offered by the currently chosen PHY interface
mode.

When a PHY dynamically changes the PHY interface mode, limiting the
advertising mask in this way is not correct - if the PHY happened to
negotiate 10GBASE-T, and selected 10GBASE-R as the host interface, we
don't want to restrict the advertisement to just 10GBASE-* modes.

Rework setting the advertisement to take account of this; do not pass
the requested advertisement through phylink_validate(), but rely on
the advertisement restriction (supported mask) set when the PHY was
initially setup.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'WireGuard-CI-and-housekeeping'
David S. Miller [Tue, 17 Dec 2019 03:22:22 +0000 (19:22 -0800)]
Merge branch 'WireGuard-CI-and-housekeeping'

Jason A. Donenfeld says:

====================
WireGuard CI and housekeeping

This is a collection of commits gathered during the last 1.5 weeks since
merging WireGuard. If you'd prefer, I can send tree pull requests
instead, but I figure it might be best for now to just send things as
full patch sets to netdev.

The first part of this adds in the CI test harness that we've been using
for quite some time with success. You can type `make` and get the
selftests running in a fresh VM immediately. This has been an
instrumental tool in developing WireGuard, and I think it'd benefit most
from being in-tree alongside the selftests that are already there. Once
this lands, I plan to get build.wireguard.com building wireguard-
linux.git and net-next.git on every single commit pushed, and do so on a
bunch of different architectures. As this migrates into Linus' tree
eventually and then into net.git, I'll get net.git building there too on
every commit. Future work with this involves generalizing it to include
more networking subsystem tests beyond just WireGuard, but one step at a
time. In the process of porting this to the tree, the builder uncovered
a mistake in the config menu file, which the second commit fixes.

The last three commits are small housekeeping things, fixing spelling
mistakes, replacing call_rcu with kfree_rcu, and removing an unused
include.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowireguard: allowedips: use kfree_rcu() instead of call_rcu()
Wei Yongjun [Sun, 15 Dec 2019 21:08:04 +0000 (22:08 +0100)]
wireguard: allowedips: use kfree_rcu() instead of call_rcu()

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowireguard: main: remove unused include <linux/version.h>
YueHaibing [Sun, 15 Dec 2019 21:08:03 +0000 (22:08 +0100)]
wireguard: main: remove unused include <linux/version.h>

Remove <linux/version.h> from the includes for main.c, which is unused.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowireguard: global: fix spelling mistakes in comments
Josh Soref [Sun, 15 Dec 2019 21:08:02 +0000 (22:08 +0100)]
wireguard: global: fix spelling mistakes in comments

This fixes two spelling errors in source code comments.

Signed-off-by: Josh Soref <jsoref@gmail.com>
[Jason: rewrote commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowireguard: Kconfig: select parent dependency for crypto
Jason A. Donenfeld [Sun, 15 Dec 2019 21:08:01 +0000 (22:08 +0100)]
wireguard: Kconfig: select parent dependency for crypto

This fixes the crypto selection submenu depenencies. Otherwise, we'd
wind up issuing warnings in which certain dependencies we also select
couldn't be satisfied. This condition was triggered by the addition of
the test suite autobuilder in the previous commit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agowireguard: selftests: import harness makefile for test suite
Jason A. Donenfeld [Sun, 15 Dec 2019 21:08:00 +0000 (22:08 +0100)]
wireguard: selftests: import harness makefile for test suite

WireGuard has been using this on build.wireguard.com for the last
several years with considerable success. It allows for very quick and
iterative development cycles, and supports several platforms.

To run the test suite on your current platform in QEMU:

  $ make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

To run it with KASAN and such turned on:

  $ DEBUG_KERNEL=yes make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

To run it emulated for another platform in QEMU:

  $ ARCH=arm make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

At the moment, we support aarch64_be, aarch64, arm, armeb, i686, m68k,
mips64, mips64el, mips, mipsel, powerpc64le, powerpc, and x86_64.

The system supports incremental rebuilding, so it should be very fast to
change a single file and then test it out and have immediate feedback.

This requires for the right toolchain and qemu to be installed prior.
I've had success with those from musl.cc.

This is tailored for WireGuard at the moment, though later projects
might generalize it for other network testing.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: caif: replace BUG_ON with recovery code
Aditya Pakki [Sun, 15 Dec 2019 17:51:30 +0000 (11:51 -0600)]
net: caif: replace BUG_ON with recovery code

In caif_xmit, there is a crash if the ptr dev is NULL. However, by
returning the error to the callers, the error can be handled. The
patch fixes this issue.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agofore200e: Fix incorrect checks of NULL pointer dereference
Aditya Pakki [Sun, 15 Dec 2019 16:14:51 +0000 (10:14 -0600)]
fore200e: Fix incorrect checks of NULL pointer dereference

In fore200e_send and fore200e_close, the pointers from the arguments
are dereferenced in the variable declaration block and then checked
for NULL. The patch fixes these issues by avoiding NULL pointer
dereferences.

Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Simplify-IPv4-route-offload-API'
David S. Miller [Tue, 17 Dec 2019 00:14:43 +0000 (16:14 -0800)]
Merge branch 'Simplify-IPv4-route-offload-API'

Ido Schimmel says:

====================
Simplify IPv4 route offload API

Motivation
==========

The aim of this patch set is to simplify the IPv4 route offload API by
making the stack a bit smarter about the notifications it is generating.
This allows driver authors to focus on programming the underlying device
instead of having to duplicate the IPv4 route insertion logic in their
driver, which is error-prone.

This is the first patch set out of a series of four. Subsequent patch
sets will simplify the IPv6 API, add offload/trap indication to routes
and add tests for all the code paths (including error paths). Available
here [1].

Details
=======

Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.

This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.

The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.

The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I am familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.

Testing
=======

To ensure there is no degradation in route insertion rates, I averaged
the insertion rate of 512k routes (/24 and /32) over 50 runs. Did not
observe any degradation.

Functional tests are available here [1]. They rely on route trap
indication, which is only added in the last patch set.

In addition, I have been running syzkaller for the past week with all
four patch sets and debug options enabled. Did not observe any problems.

Patch set overview
==================

Patches #1-#8 gradually introduce the new FIB notifications
Patch #9 converts mlxsw to use the new notifications
Patch #10 converts the remaining listeners and removes the old
notifications

v2:
* Extend fib_find_alias() with another argument instead of introducing a
  new function (David Ahern)

RFC: https://patchwork.ozlabs.org/cover/1170530/

[1] https://github.com/idosch/linux/tree/fib-notifier
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Remove old route notifications and convert listeners
Ido Schimmel [Sat, 14 Dec 2019 15:53:15 +0000 (17:53 +0200)]
ipv4: Remove old route notifications and convert listeners

Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlxsw: spectrum_router: Start using new IPv4 route notifications
Ido Schimmel [Sat, 14 Dec 2019 15:53:14 +0000 (17:53 +0200)]
mlxsw: spectrum_router: Start using new IPv4 route notifications

With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv4 code.

Instead, mlxsw only needs to take care of inserting and removing routes
from the device.

Convert mlxsw to use the new IPv4 route notifications and simplify the
code.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Only Replay routes of interest to new listeners
Ido Schimmel [Sat, 14 Dec 2019 15:53:13 +0000 (17:53 +0200)]
ipv4: Only Replay routes of interest to new listeners

When a new listener is registered to the FIB notification chain it
receives a dump of all the available routes in the system. Instead, make
sure to only replay the IPv4 routes that are actually used in the data
path and are of any interest to the new listener.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Handle route deletion notification during flush
Ido Schimmel [Sat, 14 Dec 2019 15:53:12 +0000 (17:53 +0200)]
ipv4: Handle route deletion notification during flush

In a similar fashion to previous patch, when a route is deleted as part
of table flushing, promote the next route in the list, if exists.
Otherwise, simply emit a delete notification.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Handle route deletion notification
Ido Schimmel [Sat, 14 Dec 2019 15:53:11 +0000 (17:53 +0200)]
ipv4: Handle route deletion notification

When a route is deleted we potentially need to promote the next route in
the FIB alias list (e.g., with an higher metric). In case we find such a
route, a replace notification is emitted. Otherwise, a delete
notification for the deleted route.

v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Notify newly added route if should be offloaded
Ido Schimmel [Sat, 14 Dec 2019 15:53:10 +0000 (17:53 +0200)]
ipv4: Notify newly added route if should be offloaded

When a route is added, it should only be notified in case it is the
first route in the FIB alias list with the given {prefix, prefix length,
table ID}. Otherwise, it is not used in the data path and should not be
considered by switch drivers.

v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Notify route if replacing currently offloaded one
Ido Schimmel [Sat, 14 Dec 2019 15:53:09 +0000 (17:53 +0200)]
ipv4: Notify route if replacing currently offloaded one

When replacing a route, its replacement should only be notified in case
the replaced route is of any interest to listeners. In other words, if
the replaced route is currently used in the data path, which means it is
the first route in the FIB alias list with the given {prefix, prefix
length, table ID}.

v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Extend FIB alias find function
Ido Schimmel [Sat, 14 Dec 2019 15:53:08 +0000 (17:53 +0200)]
ipv4: Extend FIB alias find function

Extend the function with another argument, 'find_first'. When set, the
function returns the first FIB alias with the matching {prefix, prefix
length, table ID}. The TOS and priority parameters are ignored. Current
callers are converted to pass 'false' in order to maintain existing
behavior.

This will be used by subsequent patches in the series.

v2:
* New patch

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv4: Notify route after insertion to the routing table
Ido Schimmel [Sat, 14 Dec 2019 15:53:07 +0000 (17:53 +0200)]
ipv4: Notify route after insertion to the routing table

Currently, a new route is notified in the FIB notification chain before
it is inserted to the FIB alias list.

Subsequent patches will use the placement of the new route in the
ordered FIB alias list in order to determine if the route should be
notified or not.

As a preparatory step, change the order so that the route is first
inserted into the FIB alias list and only then notified.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: fib_notifier: Add temporary events to the FIB notification chain
Ido Schimmel [Sat, 14 Dec 2019 15:53:06 +0000 (17:53 +0200)]
net: fib_notifier: Add temporary events to the FIB notification chain

Subsequent patches are going to simplify the IPv4 route offload API,
which will only use two events - replace and delete.

Introduce a temporary version of these two events in order to make the
conversion easier to review.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'hns3-next'
David S. Miller [Tue, 17 Dec 2019 00:12:25 +0000 (16:12 -0800)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: some optimizaions related to work task

This series refactors the work task of the HNS3 ethernet driver.

[patch 1/5] uses delayed workqueue to replace the timer for
hclgevf_service task, make the code simpler.

[patch 2/5] & [patch 3/5] unifies current mailbox, reset and
service work into one.

[patch 4/5] allocates a private work queue with WQ_MEM_RECLAIM
for the HNS3 driver.

[patch 5/5] adds a new flag to indicate whether reset fails,
and prevent scheduling service task to handle periodic task
when this flag has been set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: do not schedule the periodic task when reset fail
Guojia Liao [Sat, 14 Dec 2019 02:06:41 +0000 (10:06 +0800)]
net: hns3: do not schedule the periodic task when reset fail

service_task will be scheduled  per second to do some periodic
jobs. When reset fails, it means this device is not available
now, so the periodic jobs do not need to be handled.

This patch adds flag HCLGE_STATE_RST_FAIL/HCLGEVF_STATE_RST_FAIL
to indicate that reset fails, and checks this flag before
schedule periodic task.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: allocate WQ with WQ_MEM_RECLAIM flag
Yunsheng Lin [Sat, 14 Dec 2019 02:06:40 +0000 (10:06 +0800)]
net: hns3: allocate WQ with WQ_MEM_RECLAIM flag

The hns3 driver may be used in memory reclaim path when it
is the low level transport of a network file system, so it
needs to guarantee forward progress even under memory pressure.

This patch allocates a private WQ with WQ_MEM_RECLAIM set for
both hclge_main and hclgevf_main modules.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: remove unnecessary work in hclgevf_main
Yunsheng Lin [Sat, 14 Dec 2019 02:06:39 +0000 (10:06 +0800)]
net: hns3: remove unnecessary work in hclgevf_main

There are four work (mbx_service_task, service_task,
rst_service_task and keep_alive_task)in the hclgevf module,
mbx_service_task is for handling mailbox issue, service_task is
for periodic management issue and rst_service_task is for reset
related issue, keep_alive_task is used to keepalive between PF
and VF, which can be done in a single work.

This patch removes the mbx_service_task, rst_service_task and
keep_alive_task, and moves the related handling to the
service_task work in order to remove concurrency between the four
work and to improve efficiency.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: remove mailbox and reset work in hclge_main
Yunsheng Lin [Sat, 14 Dec 2019 02:06:38 +0000 (10:06 +0800)]
net: hns3: remove mailbox and reset work in hclge_main

There are three work (mbx_service_task, service_task,
rst_service_task) in the HNS3 driver, mbx_service_task is for
handling mailbox work, service_task is for periodic management
issue and rst_service_task is for reset related issue, which can
be handled in a single work.

This patch removes the mbx_service_task and rst_service_task
work, and moves the related handling to the service_task work
in order to remove concurrency between the three work and to
improve efficiency.

BTW, since stats_timer in struct hclge_hw_stats is not needed
anymore, so removes the definition of struct hclge_hw_stats,
and moves mac_stats into struct hclge_dev.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: hns3: schedule hclgevf_service by using delayed workqueue
Yunsheng Lin [Sat, 14 Dec 2019 02:06:37 +0000 (10:06 +0800)]
net: hns3: schedule hclgevf_service by using delayed workqueue

Currently, a timer is defined to schedule hclgevf_service per
second. To simplify the code, this patch uses the delayed work
instead of timer to schedule hclgevf_serive.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: Annotate ipv6_addr_is_* bitwise pointer casts
Sven Eckelmann [Fri, 13 Dec 2019 20:24:28 +0000 (21:24 +0100)]
ipv6: Annotate ipv6_addr_is_* bitwise pointer casts

The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).

This triggered a warning in the 64 bit optimized ipv6_addr_is_*() functions
because sparse doesn't know that the buffer already points to some data in
the correct bitwise integer format. But these were correct and can
therefore be marked with __force to signalize sparse an intended cast to a
specific bitwise type.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoipv6: Annotate bitwise IPv6 dsfield pointer cast
Sven Eckelmann [Fri, 13 Dec 2019 20:24:27 +0000 (21:24 +0100)]
ipv6: Annotate bitwise IPv6 dsfield pointer cast

The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).

This triggered a warning in ipv6_get_dsfield() because sparse doesn't know
that the buffer already points to some data in the correct bitwise integer
format. This was already fixed in ipv6_change_dsfield() by the __force
attribute and can be fixed here the same way.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: check that Realtek PHY driver module is loaded
Heiner Kallweit [Fri, 13 Dec 2019 15:53:37 +0000 (16:53 +0100)]
r8169: check that Realtek PHY driver module is loaded

Some users complained about problems with r8169 and it turned out that
the generic PHY driver was used instead instead of the dedicated one.
In all cases reason was that r8169.ko was in initramfs, but realtek.ko
not. Manually adding realtek.ko to initramfs fixed the issues.
Root cause seems to be that tools like dracut and genkernel don't
consider softdeps. Add a check for loaded Realtek PHY driver module
and provide the user with a hint if it's not loaded.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'dpaa2-ptp-support-external-trigger-event'
David S. Miller [Mon, 16 Dec 2019 23:56:41 +0000 (15:56 -0800)]
Merge branch 'dpaa2-ptp-support-external-trigger-event'

Yangbo Lu says:

====================
dpaa2-ptp: support external trigger event

This patch-set is to add external trigger event support for
dpaa2-ptp driver since MC firmware has supported external
trigger interrupt with a new v2 dprtc_set_irq_mask() API.
And extts_clean_up() function in ptp_qoriq driver needs to be
exported with minor fixes for dpaa2-ptp reusing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodpaa2-ptp: add external trigger event support
Yangbo Lu [Thu, 12 Dec 2019 10:08:06 +0000 (18:08 +0800)]
dpaa2-ptp: add external trigger event support

This patch is to add external trigger event support
for dpaa2-ptp driver with v2 dprtc_set_irq_mask() API.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp_qoriq: export extts_clean_up() function
Yangbo Lu [Thu, 12 Dec 2019 10:08:05 +0000 (18:08 +0800)]
ptp_qoriq: export extts_clean_up() function

Export extts_clean_up() function so that dpaa2-ptp
driver is able to reuse it.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp_qoriq: check valid status before reading extts fifo
Yangbo Lu [Thu, 12 Dec 2019 10:08:04 +0000 (18:08 +0800)]
ptp_qoriq: check valid status before reading extts fifo

For PTP timer supporting external trigger timestamp FIFO,
there is a valid bit in TMR_STAT register indicating the
timestamp is available. For PTP timer without FIFO, there
is not TMR_STAT register.
This patch is to check the valid bit for the FIFO before
reading timestamp, and to avoid operating TMR_STAT register
for PTP timer without the FIFO.

Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: Set rcv zerocopy hint correctly if skb last frag is < PAGE_SIZE
Arjun Roy [Sun, 15 Dec 2019 19:54:51 +0000 (11:54 -0800)]
tcp: Set rcv zerocopy hint correctly if skb last frag is < PAGE_SIZE

At present, if the last frag of paged data in a skb has < PAGE_SIZE
data, we compute the recv_skip_hint as being equal to the size of that
frag and the entire next skb.

Instead, just return the runt frag size as the hint.

recv_skip_hint is used by the application to skip over
bytes that can not be mmaped, so returning a too big
chunk is pessimistic and forces more bytes to be copied.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet/smc: shorten lgr_cnt initialization
Ursula Braun [Thu, 12 Dec 2019 21:35:41 +0000 (22:35 +0100)]
net/smc: shorten lgr_cnt initialization

Save a line of code by making use of ATOMIC_INIT() for lgr_cnt.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: phylink: propagate phy_attach_direct() return code
Russell King [Thu, 12 Dec 2019 17:16:12 +0000 (17:16 +0000)]
net: phylink: propagate phy_attach_direct() return code

of_phy_attach() hides the return value of phy_attach_direct(), forcing
us to return a "generic" ENODEV error code that is indistinguishable
from the lack-of-phy-property case.

Switch to using of_phy_find_device() to find the PHY device, and then
propagating any phy_attach_direct() error back to the caller.

Link: https://lore.kernel.org/lkml/20191210113829.GT25745@shell.armlinux.org.uk
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: bridge: add STP xstats
Vivien Didelot [Thu, 12 Dec 2019 01:07:10 +0000 (20:07 -0500)]
net: bridge: add STP xstats

This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
transition_fwd xstats counters to the bridge ports copied over via
netlink, providing useful information for STP.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoselftests/net: make so_txtime more robust to timer variance
Willem de Bruijn [Thu, 12 Dec 2019 16:36:46 +0000 (11:36 -0500)]
selftests/net: make so_txtime more robust to timer variance

The SO_TXTIME test depends on accurate timers. In some virtualized
environments the test has been reported to be flaky. This is easily
reproduced by disabling kvm acceleration in Qemu.

Allow greater variance in a run and retry to further reduce flakiness.

Observed errors are one of two kinds: either the packet arrives too
early or late at recv(), or it was dropped in the qdisc itself and the
recv() call times out.

In the latter case, the qdisc queues a notification to the error
queue of the send socket. Also explicitly report this cause.

Link: https://lore.kernel.org/netdev/CA+FuTSdYOnJCsGuj43xwV1jxvYsaoa_LzHQF9qMyhrkLrivxKw@mail.gmail.com
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agonet: phy: dp83869: Remove unneeded semicolon
zhengbin [Sat, 14 Dec 2019 10:17:24 +0000 (18:17 +0800)]
net: phy: dp83869: Remove unneeded semicolon

Fixes coccicheck warning:

drivers/net/phy/dp83869.c:337:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agobonding: move 802.3ad port state flags to uapi
Andy Roulin [Wed, 11 Dec 2019 22:30:58 +0000 (14:30 -0800)]
bonding: move 802.3ad port state flags to uapi

The bond slave actor/partner operating state is exported as
bitfield to userspace, which lacks a way to interpret it, e.g.,
iproute2 only prints the state as a number:

ad_actor_oper_port_state 15

For userspace to interpret the bitfield, the bitfield definitions
should be part of the uapi. The bitfield itself is defined in the
802.3ad standard.

This commit moves the 802.3ad bitfield definitions to uapi.

Related iproute2 patches, soon to be posted upstream, use the new uapi
headers to pretty-print bond slave state, e.g., with ip -d link show

ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>

Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
4 years agoRevert "nfp: abm: fix memory leak in nfp_abm_u32_knode_replace"
Jakub Kicinski [Tue, 10 Dec 2019 18:20:32 +0000 (10:20 -0800)]
Revert "nfp: abm: fix memory leak in nfp_abm_u32_knode_replace"

This reverts commit 78beef629fd9 ("nfp: abm: fix memory leak in
nfp_abm_u32_knode_replace").

The quoted commit does not fix anything and resulted in a bogus
CVE-2019-19076.

If match is NULL then it is known there is no matching entry in
list, hence, calling nfp_abm_u32_knode_delete() is pointless.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
4 years agoMerge branch 'netdev-ndo_tx_timeout-cleanup'
David S. Miller [Fri, 13 Dec 2019 05:39:14 +0000 (21:39 -0800)]
Merge branch 'netdev-ndo_tx_timeout-cleanup'

Michael S. Tsirkin says:

====================
netdev: ndo_tx_timeout cleanup

Yet another forward declaration I missed. Hopfully the last one ...

A bunch of drivers want to know which tx queue triggered a timeout,
and virtio wants to do the same.
We actually have the info to hand, let's just pass it on to drivers.
Note: tested with an experimental virtio patch by Julio.
That patch itself isn't ready yet though, so not included.
Other drivers compiled only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonetronome: use the new txqueue timeout argument
Michael S. Tsirkin [Tue, 10 Dec 2019 14:26:00 +0000 (09:26 -0500)]
netronome: use the new txqueue timeout argument

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agomlx4: use new txqueue timeout argument
Michael S. Tsirkin [Tue, 10 Dec 2019 14:23:55 +0000 (09:23 -0500)]
mlx4: use new txqueue timeout argument

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonetdev: pass the stuck queue to the timeout handler
Michael S. Tsirkin [Tue, 10 Dec 2019 14:23:51 +0000 (09:23 -0500)]
netdev: pass the stuck queue to the timeout handler

This allows incrementing the correct timeout statistic without any mess.
Down the road, devices can learn to reset just the specific queue.

The patch was generated with the following script:

use strict;
use warnings;

our $^I = '.bak';

my @work = (
["arch/m68k/emu/nfeth.c", "nfeth_tx_timeout"],
["arch/um/drivers/net_kern.c", "uml_net_tx_timeout"],
["arch/um/drivers/vector_kern.c", "vector_net_tx_timeout"],
["arch/xtensa/platforms/iss/network.c", "iss_net_tx_timeout"],
["drivers/char/pcmcia/synclink_cs.c", "hdlcdev_tx_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/infiniband/ulp/ipoib/ipoib_main.c", "ipoib_timeout"],
["drivers/message/fusion/mptlan.c", "mpt_lan_tx_timeout"],
["drivers/misc/sgi-xp/xpnet.c", "xpnet_dev_tx_timeout"],
["drivers/net/appletalk/cops.c", "cops_timeout"],
["drivers/net/arcnet/arcdevice.h", "arcnet_timeout"],
["drivers/net/arcnet/arcnet.c", "arcnet_timeout"],
["drivers/net/arcnet/com20020.c", "arcnet_timeout"],
["drivers/net/ethernet/3com/3c509.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c515.c", "corkscrew_timeout"],
["drivers/net/ethernet/3com/3c574_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c589_cs.c", "el3_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/3c59x.c", "vortex_tx_timeout"],
["drivers/net/ethernet/3com/typhoon.c", "typhoon_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390.h", "eip_tx_timeout"],
["drivers/net/ethernet/8390/8390.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/8390p.c", "eip_tx_timeout"],
["drivers/net/ethernet/8390/ax88796.c", "ax_ei_tx_timeout"],
["drivers/net/ethernet/8390/axnet_cs.c", "axnet_tx_timeout"],
["drivers/net/ethernet/8390/etherh.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/hydra.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mac8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/mcf8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/lib8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/8390/ne2k-pci.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/pcnet_cs.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/smc-ultra.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/wd.c", "ei_tx_timeout"],
["drivers/net/ethernet/8390/zorro8390.c", "__ei_tx_timeout"],
["drivers/net/ethernet/adaptec/starfire.c", "tx_timeout"],
["drivers/net/ethernet/agere/et131x.c", "et131x_tx_timeout"],
["drivers/net/ethernet/allwinner/sun4i-emac.c", "emac_timeout"],
["drivers/net/ethernet/alteon/acenic.c", "ace_watchdog"],
["drivers/net/ethernet/amazon/ena/ena_netdev.c", "ena_tx_timeout"],
["drivers/net/ethernet/amd/7990.h", "lance_tx_timeout"],
["drivers/net/ethernet/amd/7990.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/a2065.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/am79c961a.c", "am79c961_timeout"],
["drivers/net/ethernet/amd/amd8111e.c", "amd8111e_tx_timeout"],
["drivers/net/ethernet/amd/ariadne.c", "ariadne_tx_timeout"],
["drivers/net/ethernet/amd/atarilance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/au1000_eth.c", "au1000_tx_timeout"],
["drivers/net/ethernet/amd/declance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/lance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/mvme147.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/ni65.c", "ni65_timeout"],
["drivers/net/ethernet/amd/nmclan_cs.c", "mace_tx_timeout"],
["drivers/net/ethernet/amd/pcnet32.c", "pcnet32_tx_timeout"],
["drivers/net/ethernet/amd/sunlance.c", "lance_tx_timeout"],
["drivers/net/ethernet/amd/xgbe/xgbe-drv.c", "xgbe_tx_timeout"],
["drivers/net/ethernet/apm/xgene-v2/main.c", "xge_timeout"],
["drivers/net/ethernet/apm/xgene/xgene_enet_main.c", "xgene_enet_timeout"],
["drivers/net/ethernet/apple/macmace.c", "mace_tx_timeout"],
["drivers/net/ethernet/atheros/ag71xx.c", "ag71xx_tx_timeout"],
["drivers/net/ethernet/atheros/alx/main.c", "alx_tx_timeout"],
["drivers/net/ethernet/atheros/atl1c/atl1c_main.c", "atl1c_tx_timeout"],
["drivers/net/ethernet/atheros/atl1e/atl1e_main.c", "atl1e_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl1.c", "atlx_tx_timeout"],
["drivers/net/ethernet/atheros/atlx/atl2.c", "atl2_tx_timeout"],
["drivers/net/ethernet/broadcom/b44.c", "b44_tx_timeout"],
["drivers/net/ethernet/broadcom/bcmsysport.c", "bcm_sysport_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2.c", "bnx2_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c", "bnx2x_tx_timeout"],
["drivers/net/ethernet/broadcom/bnxt/bnxt.c", "bnxt_tx_timeout"],
["drivers/net/ethernet/broadcom/genet/bcmgenet.c", "bcmgenet_timeout"],
["drivers/net/ethernet/broadcom/sb1250-mac.c", "sbmac_tx_timeout"],
["drivers/net/ethernet/broadcom/tg3.c", "tg3_tx_timeout"],
["drivers/net/ethernet/calxeda/xgmac.c", "xgmac_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_main.c", "liquidio_tx_timeout"],
["drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c", "lio_vf_rep_tx_timeout"],
["drivers/net/ethernet/cavium/thunder/nicvf_main.c", "nicvf_tx_timeout"],
["drivers/net/ethernet/cirrus/cs89x0.c", "net_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cisco/enic/enic_main.c", "enic_tx_timeout"],
["drivers/net/ethernet/cortina/gemini.c", "gmac_tx_timeout"],
["drivers/net/ethernet/davicom/dm9000.c", "dm9000_timeout"],
["drivers/net/ethernet/dec/tulip/de2104x.c", "de_tx_timeout"],
["drivers/net/ethernet/dec/tulip/tulip_core.c", "tulip_tx_timeout"],
["drivers/net/ethernet/dec/tulip/winbond-840.c", "tx_timeout"],
["drivers/net/ethernet/dlink/dl2k.c", "rio_tx_timeout"],
["drivers/net/ethernet/dlink/sundance.c", "tx_timeout"],
["drivers/net/ethernet/emulex/benet/be_main.c", "be_tx_timeout"],
["drivers/net/ethernet/ethoc.c", "ethoc_tx_timeout"],
["drivers/net/ethernet/faraday/ftgmac100.c", "ftgmac100_tx_timeout"],
["drivers/net/ethernet/fealnx.c", "fealnx_tx_timeout"],
["drivers/net/ethernet/freescale/dpaa/dpaa_eth.c", "dpaa_tx_timeout"],
["drivers/net/ethernet/freescale/fec_main.c", "fec_timeout"],
["drivers/net/ethernet/freescale/fec_mpc52xx.c", "mpc52xx_fec_tx_timeout"],
["drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c", "fs_timeout"],
["drivers/net/ethernet/freescale/gianfar.c", "gfar_timeout"],
["drivers/net/ethernet/freescale/ucc_geth.c", "ucc_geth_timeout"],
["drivers/net/ethernet/fujitsu/fmvj18x_cs.c", "fjn_tx_timeout"],
["drivers/net/ethernet/google/gve/gve_main.c", "gve_tx_timeout"],
["drivers/net/ethernet/hisilicon/hip04_eth.c", "hip04_timeout"],
["drivers/net/ethernet/hisilicon/hix5hd2_gmac.c", "hix5hd2_net_timeout"],
["drivers/net/ethernet/hisilicon/hns/hns_enet.c", "hns_nic_net_timeout"],
["drivers/net/ethernet/hisilicon/hns3/hns3_enet.c", "hns3_nic_net_timeout"],
["drivers/net/ethernet/huawei/hinic/hinic_main.c", "hinic_tx_timeout"],
["drivers/net/ethernet/i825xx/82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/ether1.c", "ether1_timeout"],
["drivers/net/ethernet/i825xx/lib82596.c", "i596_tx_timeout"],
["drivers/net/ethernet/i825xx/sun3_82586.c", "sun3_82586_timeout"],
["drivers/net/ethernet/ibm/ehea/ehea_main.c", "ehea_tx_watchdog"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/emac/core.c", "emac_tx_timeout"],
["drivers/net/ethernet/ibm/ibmvnic.c", "ibmvnic_tx_timeout"],
["drivers/net/ethernet/intel/e100.c", "e100_tx_timeout"],
["drivers/net/ethernet/intel/e1000/e1000_main.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/e1000e/netdev.c", "e1000_tx_timeout"],
["drivers/net/ethernet/intel/fm10k/fm10k_netdev.c", "fm10k_tx_timeout"],
["drivers/net/ethernet/intel/i40e/i40e_main.c", "i40e_tx_timeout"],
["drivers/net/ethernet/intel/iavf/iavf_main.c", "iavf_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/ice/ice_main.c", "ice_tx_timeout"],
["drivers/net/ethernet/intel/igb/igb_main.c", "igb_tx_timeout"],
["drivers/net/ethernet/intel/igbvf/netdev.c", "igbvf_tx_timeout"],
["drivers/net/ethernet/intel/ixgb/ixgb_main.c", "ixgb_tx_timeout"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c", "adapter->netdev->netdev_ops->ndo_tx_timeout(adapter->netdev);"],
["drivers/net/ethernet/intel/ixgbe/ixgbe_main.c", "ixgbe_tx_timeout"],
["drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c", "ixgbevf_tx_timeout"],
["drivers/net/ethernet/jme.c", "jme_tx_timeout"],
["drivers/net/ethernet/korina.c", "korina_tx_timeout"],
["drivers/net/ethernet/lantiq_etop.c", "ltq_etop_tx_timeout"],
["drivers/net/ethernet/marvell/mv643xx_eth.c", "mv643xx_eth_tx_timeout"],
["drivers/net/ethernet/marvell/pxa168_eth.c", "pxa168_eth_tx_timeout"],
["drivers/net/ethernet/marvell/skge.c", "skge_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/marvell/sky2.c", "sky2_tx_timeout"],
["drivers/net/ethernet/mediatek/mtk_eth_soc.c", "mtk_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx4/en_netdev.c", "mlx4_en_tx_timeout"],
["drivers/net/ethernet/mellanox/mlx5/core/en_main.c", "mlx5e_tx_timeout"],
["drivers/net/ethernet/micrel/ks8842.c", "ks8842_tx_timeout"],
["drivers/net/ethernet/micrel/ksz884x.c", "netdev_tx_timeout"],
["drivers/net/ethernet/microchip/enc28j60.c", "enc28j60_tx_timeout"],
["drivers/net/ethernet/microchip/encx24j600.c", "encx24j600_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.h", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/sonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/jazzsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/macsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/natsemi/natsemi.c", "ns_tx_timeout"],
["drivers/net/ethernet/natsemi/ns83820.c", "ns83820_tx_timeout"],
["drivers/net/ethernet/natsemi/xtsonic.c", "sonic_tx_timeout"],
["drivers/net/ethernet/neterion/s2io.h", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/s2io.c", "s2io_tx_watchdog"],
["drivers/net/ethernet/neterion/vxge/vxge-main.c", "vxge_tx_watchdog"],
["drivers/net/ethernet/netronome/nfp/nfp_net_common.c", "nfp_net_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/nvidia/forcedeth.c", "nv_tx_timeout"],
["drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c", "pch_gbe_tx_timeout"],
["drivers/net/ethernet/packetengines/hamachi.c", "hamachi_tx_timeout"],
["drivers/net/ethernet/packetengines/yellowfin.c", "yellowfin_tx_timeout"],
["drivers/net/ethernet/pensando/ionic/ionic_lif.c", "ionic_tx_timeout"],
["drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c", "netxen_tx_timeout"],
["drivers/net/ethernet/qlogic/qla3xxx.c", "ql3xxx_tx_timeout"],
["drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c", "qlcnic_tx_timeout"],
["drivers/net/ethernet/qualcomm/emac/emac.c", "emac_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_spi.c", "qcaspi_netdev_tx_timeout"],
["drivers/net/ethernet/qualcomm/qca_uart.c", "qcauart_netdev_tx_timeout"],
["drivers/net/ethernet/rdc/r6040.c", "r6040_tx_timeout"],
["drivers/net/ethernet/realtek/8139cp.c", "cp_tx_timeout"],
["drivers/net/ethernet/realtek/8139too.c", "rtl8139_tx_timeout"],
["drivers/net/ethernet/realtek/atp.c", "tx_timeout"],
["drivers/net/ethernet/realtek/r8169_main.c", "rtl8169_tx_timeout"],
["drivers/net/ethernet/renesas/ravb_main.c", "ravb_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/renesas/sh_eth.c", "sh_eth_tx_timeout"],
["drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c", "sxgbe_tx_timeout"],
["drivers/net/ethernet/seeq/ether3.c", "ether3_timeout"],
["drivers/net/ethernet/seeq/sgiseeq.c", "timeout"],
["drivers/net/ethernet/sfc/efx.c", "efx_watchdog"],
["drivers/net/ethernet/sfc/falcon/efx.c", "ef4_watchdog"],
["drivers/net/ethernet/sgi/ioc3-eth.c", "ioc3_timeout"],
["drivers/net/ethernet/sgi/meth.c", "meth_tx_timeout"],
["drivers/net/ethernet/silan/sc92031.c", "sc92031_tx_timeout"],
["drivers/net/ethernet/sis/sis190.c", "sis190_tx_timeout"],
["drivers/net/ethernet/sis/sis900.c", "sis900_tx_timeout"],
["drivers/net/ethernet/smsc/epic100.c", "epic_tx_timeout"],
["drivers/net/ethernet/smsc/smc911x.c", "smc911x_timeout"],
["drivers/net/ethernet/smsc/smc9194.c", "smc_timeout"],
["drivers/net/ethernet/smsc/smc91c92_cs.c", "smc_tx_timeout"],
["drivers/net/ethernet/smsc/smc91x.c", "smc_timeout"],
["drivers/net/ethernet/stmicro/stmmac/stmmac_main.c", "stmmac_tx_timeout"],
["drivers/net/ethernet/sun/cassini.c", "cas_tx_timeout"],
["drivers/net/ethernet/sun/ldmvsw.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/niu.c", "niu_tx_timeout"],
["drivers/net/ethernet/sun/sunbmac.c", "bigmac_tx_timeout"],
["drivers/net/ethernet/sun/sungem.c", "gem_tx_timeout"],
["drivers/net/ethernet/sun/sunhme.c", "happy_meal_tx_timeout"],
["drivers/net/ethernet/sun/sunqe.c", "qe_tx_timeout"],
["drivers/net/ethernet/sun/sunvnet.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.c", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/sun/sunvnet_common.h", "sunvnet_tx_timeout_common"],
["drivers/net/ethernet/synopsys/dwc-xlgmac-net.c", "xlgmac_tx_timeout"],
["drivers/net/ethernet/ti/cpmac.c", "cpmac_tx_timeout"],
["drivers/net/ethernet/ti/cpsw.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.c", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/cpsw_priv.h", "cpsw_ndo_tx_timeout"],
["drivers/net/ethernet/ti/davinci_emac.c", "emac_dev_tx_timeout"],
["drivers/net/ethernet/ti/netcp_core.c", "netcp_ndo_tx_timeout"],
["drivers/net/ethernet/ti/tlan.c", "tlan_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.h", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_net.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/ps3_gelic_wireless.c", "gelic_net_tx_timeout"],
["drivers/net/ethernet/toshiba/spider_net.c", "spider_net_tx_timeout"],
["drivers/net/ethernet/toshiba/tc35815.c", "tc35815_tx_timeout"],
["drivers/net/ethernet/via/via-rhine.c", "rhine_tx_timeout"],
["drivers/net/ethernet/wiznet/w5100.c", "w5100_tx_timeout"],
["drivers/net/ethernet/wiznet/w5300.c", "w5300_tx_timeout"],
["drivers/net/ethernet/xilinx/xilinx_emaclite.c", "xemaclite_tx_timeout"],
["drivers/net/ethernet/xircom/xirc2ps_cs.c", "xirc_tx_timeout"],
["drivers/net/fjes/fjes_main.c", "fjes_tx_retry"],
["drivers/net/slip/slip.c", "sl_tx_timeout"],
["include/linux/usb/usbnet.h", "usbnet_tx_timeout"],
["drivers/net/usb/aqc111.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/asix_devices.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88172a.c", "usbnet_tx_timeout"],
["drivers/net/usb/ax88179_178a.c", "usbnet_tx_timeout"],
["drivers/net/usb/catc.c", "catc_tx_timeout"],
["drivers/net/usb/cdc_mbim.c", "usbnet_tx_timeout"],
["drivers/net/usb/cdc_ncm.c", "usbnet_tx_timeout"],
["drivers/net/usb/dm9601.c", "usbnet_tx_timeout"],
["drivers/net/usb/hso.c", "hso_net_tx_timeout"],
["drivers/net/usb/int51x1.c", "usbnet_tx_timeout"],
["drivers/net/usb/ipheth.c", "ipheth_tx_timeout"],
["drivers/net/usb/kaweth.c", "kaweth_tx_timeout"],
["drivers/net/usb/lan78xx.c", "lan78xx_tx_timeout"],
["drivers/net/usb/mcs7830.c", "usbnet_tx_timeout"],
["drivers/net/usb/pegasus.c", "pegasus_tx_timeout"],
["drivers/net/usb/qmi_wwan.c", "usbnet_tx_timeout"],
["drivers/net/usb/r8152.c", "rtl8152_tx_timeout"],
["drivers/net/usb/rndis_host.c", "usbnet_tx_timeout"],
["drivers/net/usb/rtl8150.c", "rtl8150_tx_timeout"],
["drivers/net/usb/sierra_net.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc75xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/smsc95xx.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9700.c", "usbnet_tx_timeout"],
["drivers/net/usb/sr9800.c", "usbnet_tx_timeout"],
["drivers/net/usb/usbnet.c", "usbnet_tx_timeout"],
["drivers/net/vmxnet3/vmxnet3_drv.c", "vmxnet3_tx_timeout"],
["drivers/net/wan/cosa.c", "cosa_net_timeout"],
["drivers/net/wan/farsync.c", "fst_tx_timeout"],
["drivers/net/wan/fsl_ucc_hdlc.c", "uhdlc_tx_timeout"],
["drivers/net/wan/lmc/lmc_main.c", "lmc_driver_timeout"],
["drivers/net/wan/x25_asy.c", "x25_asy_timeout"],
["drivers/net/wimax/i2400m/netdev.c", "i2400m_tx_timeout"],
["drivers/net/wireless/intel/ipw2x00/ipw2100.c", "ipw2100_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/hostap/hostap_main.c", "prism2_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/main.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco_usb.c", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/orinoco/orinoco.h", "orinoco_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_dev.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.c", "islpci_eth_tx_timeout"],
["drivers/net/wireless/intersil/prism54/islpci_eth.h", "islpci_eth_tx_timeout"],
["drivers/net/wireless/marvell/mwifiex/main.c", "mwifiex_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.c", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/quantenna/qtnfmac/core.h", "qtnf_netdev_tx_timeout"],
["drivers/net/wireless/rndis_wlan.c", "usbnet_tx_timeout"],
["drivers/net/wireless/wl3501_cs.c", "wl3501_tx_timeout"],
["drivers/net/wireless/zydas/zd1201.c", "zd1201_tx_timeout"],
["drivers/s390/net/qeth_core.h", "qeth_tx_timeout"],
["drivers/s390/net/qeth_core_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l2_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/s390/net/qeth_l3_main.c", "qeth_tx_timeout"],
["drivers/staging/ks7010/ks_wlan_net.c", "ks_wlan_tx_timeout"],
["drivers/staging/qlge/qlge_main.c", "qlge_tx_timeout"],
["drivers/staging/rtl8192e/rtl8192e/rtl_core.c", "_rtl92e_tx_timeout"],
["drivers/staging/rtl8192u/r8192U_core.c", "tx_timeout"],
["drivers/staging/unisys/visornic/visornic_main.c", "visornic_xmit_timeout"],
["drivers/staging/wlan-ng/p80211netdev.c", "p80211knetdev_tx_timeout"],
["drivers/tty/n_gsm.c", "gsm_mux_net_tx_timeout"],
["drivers/tty/synclink.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclink_gt.c", "hdlcdev_tx_timeout"],
["drivers/tty/synclinkmp.c", "hdlcdev_tx_timeout"],
["net/atm/lec.c", "lec_tx_timeout"],
["net/bluetooth/bnep/netdev.c", "bnep_net_timeout"]
);

for my $p (@work) {
my @pair = @$p;
my $file = $pair[0];
my $func = $pair[1];
print STDERR $file , ": ", $func,"\n";
our @ARGV = ($file);
while (<ARGV>) {
if (m/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/) {
print STDERR "found $1+$2 in $file\n";
}
if (s/($func\s*\(struct\s+net_device\s+\*[A-Za-z_]?[A-Za-z-0-9_]*)(\))/$1, unsigned int txqueue$2/) {
print STDERR "$func found in $file\n";
}
print;
}
}

where the list of files and functions is simply from:

git grep ndo_tx_timeout, with manual addition of headers
in the rare cases where the function is from a header,
then manually changing the few places which actually
call ndo_tx_timeout.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Martin Habets <mhabets@solarflare.com>
changes from v9:
fixup a forward declaration
changes from v9:
more leftovers from v3 change
changes from v8:
        fix up a missing direct call to timeout
        rebased on net-next
changes from v7:
fixup leftovers from v3 change
changes from v6:
fix typo in rtl driver
changes from v5:
add missing files (allow any net device argument name)
changes from v4:
add a missing driver header
changes from v3:
        change queue # to unsigned
Changes from v2:
        added headers
Changes from v1:
        Fix errors found by kbuild:
        generalize the pattern a bit, to pick up
        a couple of instances missed by the previous
        version.

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Introduce-XDP-to-ena'
David S. Miller [Fri, 13 Dec 2019 01:14:01 +0000 (17:14 -0800)]
Merge branch 'Introduce-XDP-to-ena'

Sameeh Jubran says:

====================
Introduce XDP to ena

This patchset includes 3 patches:
* XDP_DROP implementation
* XDP_TX implementation
* A fix for an issue which might occur due to the XDP_TX patch. I see fit
  to place it as a standalone patch for clarity.

Difference from v2:
* Fixed the usage of rx headroom (XDP_PACKET_HEADROOM)
* Aligned the page_offset of the packet when passing it to the stack
* Switched to using xdp_frame in xdp xmit queue
* Dropped the print for unsupported commands
* Cosmetic changes

Difference from RFC v1 (XDP_DROP patch):
* Initialized xdp.rxq pointer
* Updated max_mtu on attachment of xdp and removed the check from
  ena_change_mtu()
* Moved the xdp execution from ena_rx_skb() to ena_clean_rx_irq()
* Moved xdp buff (struct xdp_buff) from rx_ring to the local stack
* Started using netlink's extack mechanism to deliver error messages to
  the user
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: Add first_interrupt field to napi struct
Sameeh Jubran [Tue, 10 Dec 2019 13:12:14 +0000 (15:12 +0200)]
net: ena: Add first_interrupt field to napi struct

The first_interrupt field is accessed in ena_intr_msix_io() upon
receiving an interrupt.The rx_ring and tx_ring fields of napi can
be NULL when receiving interrupt for xdp queues. This patch fixes
the issue by moving the field to the ena_napi struct.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: Implement XDP_TX action
Sameeh Jubran [Tue, 10 Dec 2019 13:12:13 +0000 (15:12 +0200)]
net: ena: Implement XDP_TX action

This commit implements the XDP_TX action in the ena driver. We allocate
separate tx queues for the XDP_TX. We currently allow xdp only when
there is enough queues to allocate for xdp.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: implement XDP drop support
Sameeh Jubran [Tue, 10 Dec 2019 13:12:12 +0000 (15:12 +0200)]
net: ena: implement XDP drop support

This commit implements the basic functionality of drop/pass logic in the
ena driver.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ethtool-netlink-interface-preliminary-part'
David S. Miller [Fri, 13 Dec 2019 01:07:06 +0000 (17:07 -0800)]
Merge branch 'ethtool-netlink-interface-preliminary-part'

Michal Kubecek says:

====================
ethtool netlink interface, preliminary part

As Jakub Kicinski suggested in ethtool netlink v7 discussion, this
submission consists only of preliminary patches which raised no objections;
first four patches already have Acked-by or Reviewed-by.

- patch 1 exposes permanent hardware address (as shown by "ethtool -P")
  via rtnetlink
- patch 2 is renames existing netlink helper to a better name
- patch 3 and 4 reorganize existing ethtool code (no functional change)
- patch 5 makes the table of link mode names available as an ethtool string
  set (will be needed for the netlink interface)

Once we get these out of the way, v8 of the first part of the ethtool
netlink interface will follow.

Changes from v2 to v3: fix SPDX licence identifiers (patch 3 and 5).

Changes from v1 to v2: restore build time check that all link modes have
assigned a name (patch 5).
====================

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoethtool: provide link mode names as a string set
Michal Kubecek [Wed, 11 Dec 2019 09:58:34 +0000 (10:58 +0100)]
ethtool: provide link mode names as a string set

Unlike e.g. netdev features, the ethtool ioctl interface requires link mode
table to be in sync between kernel and userspace for userspace to be able
to display and set all link modes supported by kernel. The way arbitrary
length bitsets are implemented in netlink interface, this will be no longer
needed.

To allow userspace to access all link modes running kernel supports, add
table of ethernet link mode names and make it available as a string set to
userspace GET_STRSET requests. Add build time check to make sure names
are defined for all modes declared in enum ethtool_link_mode_bit_indices.

Once the string set is available, make it also accessible via ioctl.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoethtool: move string arrays into common file
Michal Kubecek [Wed, 11 Dec 2019 09:58:29 +0000 (10:58 +0100)]
ethtool: move string arrays into common file

Introduce file net/ethtool/common.c for code shared by ioctl and netlink
ethtool interface. Move name tables of features, RSS hash functions,
tunables and PHY tunables into this file.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoethtool: move to its own directory
Michal Kubecek [Wed, 11 Dec 2019 09:58:24 +0000 (10:58 +0100)]
ethtool: move to its own directory

The ethtool netlink interface is going to be split into multiple files so
that it will be more convenient to put all of them in a separate directory
net/ethtool. Start by moving current ethtool.c with ioctl interface into
this directory and renaming it to ioctl.c.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonetlink: rename nl80211_validate_nested() to nla_validate_nested()
Michal Kubecek [Wed, 11 Dec 2019 09:58:19 +0000 (10:58 +0100)]
netlink: rename nl80211_validate_nested() to nla_validate_nested()

Function nl80211_validate_nested() is not specific to nl80211, it's
a counterpart to nla_validate_nested_deprecated() with strict validation.
For consistency with other validation and parse functions, rename it to
nla_validate_nested().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agortnetlink: provide permanent hardware address in RTM_NEWLINK
Michal Kubecek [Wed, 11 Dec 2019 09:58:14 +0000 (10:58 +0100)]
rtnetlink: provide permanent hardware address in RTM_NEWLINK

Permanent hardware address of a network device was traditionally provided
via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
ethtool netlink interface, rtnetlink is much more suitable for it so let's
add it to the RTM_NEWLINK message.

Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
permanent address is all zeros (i.e. device driver did not fill it). As
permanent address is not modifiable, reject userspace requests containing
IFLA_PERM_ADDRESS attribute.

Note: we already provide permanent hardware address for bond slaves;
unfortunately we cannot drop that attribute for backward compatibility
reasons.

v5 -> v6: only add the attribute if permanent address is not zero

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'unix-Show-number-of-scm-files-in-fdinfo'
David S. Miller [Fri, 13 Dec 2019 01:04:55 +0000 (17:04 -0800)]
Merge branch 'unix-Show-number-of-scm-files-in-fdinfo'

Kirill Tkhai says:

====================
unix: Show number of scm files in fdinfo

v2: Pass correct argument to locked in patch [2/2].

Unix sockets like a block box. You never know what is pending there:
there may be a file descriptor holding a mount or a block device,
or there may be whole universes with namespaces, sockets with receive
queues full of sockets etc.

The patchset makes number of pending scm files be visible in fdinfo.
This may be useful to determine, that socket should be investigated
or which task should be killed to put a reference counter on a resourse.

$cat /proc/[pid]/fdinfo/[unix_sk_fd] | grep scm_fds
scm_fds: 1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agounix: Show number of pending scm files of receive queue in fdinfo
Kirill Tkhai [Mon, 9 Dec 2019 10:03:46 +0000 (13:03 +0300)]
unix: Show number of pending scm files of receive queue in fdinfo

Unix sockets like a block box. You never know what is stored there:
there may be a file descriptor holding a mount or a block device,
or there may be whole universes with namespaces, sockets with receive
queues full of sockets etc.

The patch adds a little debug and accounts number of files (not recursive),
which is in receive queue of a unix socket. Sometimes this is useful
to determine, that socket should be investigated or which task should
be killed to put reference counter on a resourse.

v2: Pass correct argument to lockdep

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]
Kirill Tkhai [Mon, 9 Dec 2019 10:03:40 +0000 (13:03 +0300)]
net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]

This adds .show_fdinfo to socket_file_ops, so protocols will be able
to print their specific data in fdinfo.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'vsock-add-local-transport-support'
David S. Miller [Wed, 11 Dec 2019 23:01:23 +0000 (15:01 -0800)]
Merge branch 'vsock-add-local-transport-support'

Stefano Garzarella says:

====================
vsock: add local transport support

v2:
 - style fixes [Dave]
 - removed RCU sync and changed 'the_vsock_loopback' in a global
   static variable [Stefan]
 - use G2H transport when local transport is not loaded and remote cid
   is VMADDR_CID_LOCAL [Stefan]
 - rebased on net-next

v1: https://patchwork.kernel.org/cover/11251735/

This series introduces a new transport (vsock_loopback) to handle
local communication.
This could be useful to test vsock core itself and to allow developers
to test their applications without launching a VM.

Before this series, vmci and virtio transports allowed this behavior,
but only in the guest.
We are moving the loopback handling in a new transport, because it
might be useful to provide this feature also in the host or when
no H2G/G2H transports (hyperv, virtio, vmci) are loaded.

The user can use the loopback with the new VMADDR_CID_LOCAL (that
replaces VMADDR_CID_RESERVED) in any condition.
Otherwise, if the G2H transport is loaded, it can also use the guest
local CID as previously supported by vmci and virtio transports.
If G2H transport is not loaded, the user can also use VMADDR_CID_HOST
for local communication.

Patch 1 is a cleanup to build virtio_transport_common without virtio
Patch 2 adds the new VMADDR_CID_LOCAL, replacing VMADDR_CID_RESERVED
Patch 3 adds a new feature flag to register a loopback transport
Patch 4 adds the new vsock_loopback transport based on the loopback
        implementation of virtio_transport
Patch 5 implements the logic to use the local transport for loopback
        communication
Patch 6 removes the loopback from virtio_transport
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock/virtio: remove loopback handling
Stefano Garzarella [Tue, 10 Dec 2019 10:43:07 +0000 (11:43 +0100)]
vsock/virtio: remove loopback handling

We can remove the loopback handling from virtio_transport,
because now the vsock core is able to handle local communication
using the new vsock_loopback device.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: use local transport when it is loaded
Stefano Garzarella [Tue, 10 Dec 2019 10:43:06 +0000 (11:43 +0100)]
vsock: use local transport when it is loaded

Now that we have a transport that can handle the local communication,
we can use it when it is loaded.

A socket will use the local transport (loopback) when the remote
CID is:
- equal to VMADDR_CID_LOCAL
- or equal to transport_g2h->get_local_cid(), if transport_g2h
  is loaded (this allows us to keep the same behavior implemented
  by virtio and vmci transports)
- or equal to VMADDR_CID_HOST, if transport_g2h is not loaded

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: add vsock_loopback transport
Stefano Garzarella [Tue, 10 Dec 2019 10:43:05 +0000 (11:43 +0100)]
vsock: add vsock_loopback transport

This patch adds a new vsock_loopback transport to handle local
communication.
This transport is based on the loopback implementation of
virtio_transport, so it uses the virtio_transport_common APIs
to interface with the vsock core.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: add local transport support in the vsock core
Stefano Garzarella [Tue, 10 Dec 2019 10:43:04 +0000 (11:43 +0100)]
vsock: add local transport support in the vsock core

This patch allows to register a transport able to handle
local communication (loopback).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock: add VMADDR_CID_LOCAL definition
Stefano Garzarella [Tue, 10 Dec 2019 10:43:03 +0000 (11:43 +0100)]
vsock: add VMADDR_CID_LOCAL definition

The VMADDR_CID_RESERVED (1) was used by VMCI, but now it is not
used anymore, so we can reuse it for local communication
(loopback) adding the new well-know CID: VMADDR_CID_LOCAL.

Cc: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agovsock/virtio_transport_common: remove unused virtio header includes
Stefano Garzarella [Tue, 10 Dec 2019 10:43:02 +0000 (11:43 +0100)]
vsock/virtio_transport_common: remove unused virtio header includes

We can remove virtio header includes, because virtio_transport_common
doesn't use virtio API, but provides common functions to interface
virtio/vhost transports with the af_vsock core, and to handle
the protocol.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'sfp-slow-to-probe-copper'
David S. Miller [Wed, 11 Dec 2019 20:46:13 +0000 (12:46 -0800)]
Merge branch 'sfp-slow-to-probe-copper'

Russell King says:

====================
Add support for slow-to-probe-PHY copper SFP modules

This series, following on from the previous adding SFP+ copper support,
adds support for a range of Copper SFP modules, made by a variety of
companies, all of which have a Marvell 88E1111 PHY on them, but take
far longer than the Marvell spec'd 15ms to start communicating on the
I2C bus.

Researching the Champion One 1000SFPT module reveals that TX_DISABLE is
routed through a MAX1971 switching regulator and reset IC which adds a
175ms delay to releasing the 88E1111 reset.

It is not known whether other modules use a similar setup, but there
are a range of modules that are slow for the Marvell PHY to appear.

This patch series adds support for these modules by repeatedly trying
to probe the PHY for up to 600ms.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: re-attempt probing for phy
Russell King [Mon, 9 Dec 2019 14:16:11 +0000 (14:16 +0000)]
net: sfp: re-attempt probing for phy

Some 1000BASE-T PHY modules take a while for the PHY to wake up.
Retry the probe a number of times before deciding that the module has
no PHY.

Tested with:
 Sourcephotonics SPGBTXCNFC - PHY takes less than 50ms to respond.
 Champion One 1000SFPT - PHY takes about 200ms to respond.
 Mikrotik S-RJ01 - no PHY

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: error handling for phy probe
Russell King [Mon, 9 Dec 2019 14:16:05 +0000 (14:16 +0000)]
net: sfp: error handling for phy probe

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: rename sm_retries
Russell King [Mon, 9 Dec 2019 14:16:00 +0000 (14:16 +0000)]
net: sfp: rename sm_retries

Rename sm_retries as sm_fault_retries, as this is what this member is
tracking.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: use a definition for the fault recovery attempts
Russell King [Mon, 9 Dec 2019 14:15:55 +0000 (14:15 +0000)]
net: sfp: use a definition for the fault recovery attempts

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'sfp-copper-modules'
David S. Miller [Wed, 11 Dec 2019 19:53:42 +0000 (11:53 -0800)]
Merge branch 'sfp-copper-modules'

Russell King says:

====================
Add support for SFP+ copper modules

This series adds support for Copper SFP+ modules with Clause 45 PHYs.
Specifically the patches:

1. drop support for the probably never tested 100BASE-*X modules.
2. drop EEPROM ID from sfp_select_interface()
3. add more compliance code definitions from SFF-8024, renaming the
   existing definitions.
4. add module start/stop methods so phylink knows when a module is
   about to become active. The module start method is called after
   we have probed for a PHY on the module.
5. move start/stop of module PHY down into phylink using the new
   module start/stop methods.
6. add support for Clause 45 I2C accesses, tested with Methode DM7052.
   Other modules appear to use the same protocol, but slight
   differences, but I do not have those modules to test with.
   (if someone does, please holler!)
7. rearrange how we attach to PHYs so that we can support Clause 45
   PHYs with indeterminant interface modes.  (Clause 45 PHYs appear
   to like to change their PHY interface mode depending on the
   negotiated speed.)
8. add support for phylink to connect to a clause 45 PHY on a SFP
   module.
9. split the link_an_mode between the configured value and the
   currently selected mode value; some clause 45 PHYs have no
   capability to provide in-band negotiation.
10. split the link configuration on SFP module insertion in phylink
    so we can use it in other code paths.
11. delay MAC configuration for copper modules without a PHY to the
    module start method - after any module PHY has been probed.  If
    the module has a PHY, then we setup the MAC when the PHY is
    detected.
12. the Broadcom 84881 PHY does not support in-band negotiation even
    though it uses SGMII and 2500BASE-X.  Having the MAC operating
    with in-band negotiation enabled, even with AN bypass enabled,
    results in no link - Broadcom say that the host MAC must always
    be forced.
13. add support for the Broadcom 84881 PHY found on the Methode
    DM7052 module.
14. add support to SFP to probe for a Clause 45 PHY on copper SFP+
    modules.

v3: now bisectable!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: add support for Clause 45 PHYs
Russell King [Wed, 11 Dec 2019 10:57:01 +0000 (10:57 +0000)]
net: sfp: add support for Clause 45 PHYs

Some SFP+ modules have a Clause 45 PHY onboard, which is accessible via
the normal I2C address.  Detect 10G BASE-T PHYs which may have an
accessible PHY and probe for it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: add Broadcom BCM84881 PHY driver
Russell King [Wed, 11 Dec 2019 10:56:56 +0000 (10:56 +0000)]
net: phy: add Broadcom BCM84881 PHY driver

Add a rudimentary Clause 45 driver for the BCM84881 PHY, found on
Methode DM7052 SFPs.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: make Broadcom BCM84881 based SFPs work
Russell King [Wed, 11 Dec 2019 10:56:50 +0000 (10:56 +0000)]
net: phylink: make Broadcom BCM84881 based SFPs work

The Broadcom BCM84881 does not appear to send the SGMII control word
when operating in SGMII mode, which causes network adapters to fail
to link with the PHY, or decide to operate at fixed 1G speed, even if
the PHY negotiated 100M.

Work around this by detecting the Broadcom BCM84881 and switch to phy
mode rather than inband mode.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: delay MAC configuration for copper SFP modules
Russell King [Wed, 11 Dec 2019 10:56:45 +0000 (10:56 +0000)]
net: phylink: delay MAC configuration for copper SFP modules

Knowing whether we need to delay the MAC configuration because a module
may have a PHY is useful to phylink to allow NBASE-T modules to work on
systems supporting no more than 2.5G speeds.

This commit allows us to delay such configuration until after the PHY
has been probed by recording the parsed capabilities, and if the module
may have a PHY, doing no more until the module_start() notification is
called.  At that point, we either have a PHY, or we don't.

We move the PHY-based setup a little later, and use the PHYs support
capabilities rather than the EEPROM parsed capabilities to determine
whether we can support the PHY.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: split phylink_sfp_module_insert()
Russell King [Wed, 11 Dec 2019 10:56:40 +0000 (10:56 +0000)]
net: phylink: split phylink_sfp_module_insert()

Split out the configuration step from phylink_sfp_module_insert() so
we can re-use this later.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: split link_an_mode configured and current settings
Russell King [Wed, 11 Dec 2019 10:56:35 +0000 (10:56 +0000)]
net: phylink: split link_an_mode configured and current settings

Split link_an_mode between the configured setting and the current
operating setting.  This is an important distinction to make when we
need to configure PHY mode for a plugged SFP+ module that does not
use in-band signalling.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: support Clause 45 PHYs on SFP+ modules
Russell King [Wed, 11 Dec 2019 10:56:30 +0000 (10:56 +0000)]
net: phylink: support Clause 45 PHYs on SFP+ modules

Some SFP+ modules have Clause 45 PHYs embedded on them, which need a
little more handling in order to ensure that they are correctly setup,
as they switch the PHY link mode according to the negotiated speed.

With Clause 22 PHYs, we assumed that they would operate in SGMII mode,
but this assumption is now false.  Adapt phylink to support Clause 45
PHYs on SFP+ modules.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phylink: re-split __phylink_connect_phy()
Russell King [Wed, 11 Dec 2019 10:56:25 +0000 (10:56 +0000)]
net: phylink: re-split __phylink_connect_phy()

In order to support Clause 45 PHYs on SFP+ modules, which have an
indeterminant phy interface mode, we need to be able to call
phylink_bringup_phy() with a different interface mode to that used when
binding the PHY. Reduce __phylink_connect_phy() to an attach operation,
and move the call to phylink_bringup_phy() to its call sites.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: mdio-i2c: add support for Clause 45 accesses
Russell King [Wed, 11 Dec 2019 10:56:20 +0000 (10:56 +0000)]
net: mdio-i2c: add support for Clause 45 accesses

Some SFP+ modules have PHYs on them just like SFP modules do, except
they are Clause 45 PHYs.  The I2C protocol used to access them is
modified slightly in order to send the device address and 16-bit
register index.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: move phy_start()/phy_stop() to phylink
Russell King [Wed, 11 Dec 2019 10:56:14 +0000 (10:56 +0000)]
net: sfp: move phy_start()/phy_stop() to phylink

Move phy_start() and phy_stop() into the module_start and module_stop
notifications in phylink, rather than having them in the SFP code.
This gives phylink responsibility for controlling the PHY, rather
than having SFP start and stop the PHY state machine.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: add module start/stop upstream notifications
Russell King [Wed, 11 Dec 2019 10:56:09 +0000 (10:56 +0000)]
net: sfp: add module start/stop upstream notifications

When dealing with some copper modules, we can't positively know the
module capabilities are until we have probed the PHY. Without the full
capabilities, we may end up failing a module that we could otherwise
drive with a restricted set of capabilities.

An example of this would be a module with a NBASE-T PHY plugged into
a host that supports phy interface modes 2500BASE-X and SGMII. The
PHY supports 10GBASE-R, 5000BASE-X, 2500BASE-X, SGMII interface modes,
which means a subset of the capabilities are compatible with the host.

However, reading the module EEPROM leads us to believe that the module
only supports ethtool link mode 10GBASE-T, which is incompatible with
the host - and thus results in the module being rejected.

This patch adds an extra notification which are triggered after the
SFP module's PHY probe, and a corresponding notification just before
the PHY is removed.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: add more extended compliance codes
Russell King [Wed, 11 Dec 2019 10:56:04 +0000 (10:56 +0000)]
net: sfp: add more extended compliance codes

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.  Split these constants from the enum, and
rename them to indicate that they're defined by SFF-8024.

Add and use updated SFF-8024 extended compliance code definitions for
10GBASE-T, 5GBASE-T and 2.5GBASE-T modules.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: derive interface mode from ethtool link modes
Russell King [Wed, 11 Dec 2019 10:55:59 +0000 (10:55 +0000)]
net: sfp: derive interface mode from ethtool link modes

We don't need the EEPROM ID to derive the phy interface mode as we can
derive it merely from the ethtool link modes.  Remove the EEPROM ID
argument to sfp_select_interface().

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sfp: remove incomplete 100BASE-FX and 100BASE-LX support
Russell King [Wed, 11 Dec 2019 10:55:54 +0000 (10:55 +0000)]
net: sfp: remove incomplete 100BASE-FX and 100BASE-LX support

The 100BASE-FX and 100BASE-LX support assumes a PHY is present; this
is probably an incorrect assumption. In any case, sfp_parse_support()
will fail such a module. Let's stop pretending we support these
modules.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agocxgb4: add support for high priority filters
Shahjada Abul Husain [Tue, 10 Dec 2019 10:55:33 +0000 (16:25 +0530)]
cxgb4: add support for high priority filters

T6 has a separate region known as high priority filter region
that allows classifying packets going through ULD path. So,
query firmware for HPFILTER resources and enable the high
priority offload filter support when it is available.

Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: remove variable 'tc_max_sized_frame' set but not used
Chen Wandun [Tue, 10 Dec 2019 10:24:50 +0000 (18:24 +0800)]
enetc: remove variable 'tc_max_sized_frame' set but not used

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs:
drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable]

Fixes: c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonfp: add support for TLV device stats
Jakub Kicinski [Tue, 10 Dec 2019 05:04:01 +0000 (21:04 -0800)]
nfp: add support for TLV device stats

Device stats are currently hard coded in the PCI BAR0 layout.
Add a ability to read them from the TLV area instead.
Names for the stats are maintained by the driver, and their
meaning documented. This allows us to more easily add and
remove device stats.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotcp: Cleanup duplicate initialization of sk->sk_state.
Kuniyuki Iwashima [Tue, 10 Dec 2019 02:41:48 +0000 (02:41 +0000)]
tcp: Cleanup duplicate initialization of sk->sk_state.

When a TCP socket is created, sk->sk_state is initialized twice as
TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is
always called after the sock_init_data(), so it is not necessary to update
sk->sk_state in the tcp_init_sock().

Before v2.1.8, the code of the two functions was in the inet_create(). In
the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of
initialization of sk->state was duplicated.

Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoenetc: add software timestamping
Michael Walle [Tue, 10 Dec 2019 00:15:37 +0000 (01:15 +0100)]
enetc: add software timestamping

Provide a software TX timestamp and add it to the ethtool query
interface.

skb_tx_timestamp() is also needed if one would like to use PHY
timestamping.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'tipc-introduce-variable-window-congestion-control'
David S. Miller [Wed, 11 Dec 2019 01:31:15 +0000 (17:31 -0800)]
Merge branch 'tipc-introduce-variable-window-congestion-control'

Jon Maloy says:

====================
tipc: introduce variable window congestion control

We improve thoughput greatly by introducing a variety of the Reno
congestion control algorithm at the link level.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: introduce variable window congestion control
Jon Maloy [Mon, 9 Dec 2019 23:52:46 +0000 (00:52 +0100)]
tipc: introduce variable window congestion control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- We introduce a 'slow start theshold' variable, initially set to
  the maximum window size.

- We let a link start at the minimum congestion window, i.e. in slow
  start mode, and then let is grow rapidly (+1 per rceived ACK) until
  it reaches the slow start threshold and enters congestion avoidance
  mode.

- In congestion avoidance mode we increment the congestion window for
  each window-size number of acked packets, up to a possible maximum
  equal to the configured maximum window.

- For each non-duplicate NACK received, we drop back to fast recovery
  mode, by setting the both the slow start threshold to and the
  congestion window to (current_congestion_window / 2).

- If the timeout handler finds that the transmit queue has not moved
  since the previous timeout, it drops the link back to slow start
  and forces a probe containing the last sent sequence number to the
  sent to the peer, so that this can discover the stale situation.

This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: eliminate more unnecessary nacks and retransmissions
Jon Maloy [Mon, 9 Dec 2019 23:52:45 +0000 (00:52 +0100)]
tipc: eliminate more unnecessary nacks and retransmissions

When we increase the link tranmsit window we often observe the following
scenario:

1) A STATE message bypasses a sequence of traffic packets and arrives
   far ahead of those to the receiver. STATE messages contain a
   'peers_nxt_snt' field to indicate which was the last packet sent
   from the peer. This mechanism is intended as a last resort for the
   receiver to detect missing packets, e.g., during very low traffic
   when there is no packet flow to help early loss detection.
3) The receiving link compares the 'peer_nxt_snt' field to its own
   'rcv_nxt', finds that there is a gap, and immediately sends a
   NACK message back to the peer.
4) When this NACKs arrives at the sender, all the requested
   retransmissions are performed, since it is a first-time request.

Just like in the scenario described in the previous commit this leads
to many redundant retransmissions, with decreased throughput as a
consequence.

We fix this by adding two more conditions before we send a NACK in
this sitution. First, the deferred queue must be empty, so we cannot
assume that the potential packet loss has already been detected by
other means. Second, we check the 'peers_snd_nxt' field only in probe/
probe_reply messages, thus turning this into a true mechanism of last
resort as it was really meant to be.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agotipc: eliminate gap indicator from ACK messages
Jon Maloy [Mon, 9 Dec 2019 23:52:44 +0000 (00:52 +0100)]
tipc: eliminate gap indicator from ACK messages

When we increase the link send window we sometimes observe the
following scenario:

1) A packet #N arrives out of order far ahead of a sequence of older
   packets which are still under way. The packet is added to the
   deferred queue.
2) The missing packets arrive in sequence, and for each 16th of them
   an ACK is sent back to the receiver, as it should be.
3) When building those ACK messages, it is checked if there is a gap
   between the link's 'rcv_nxt' and the first packet in the deferred
   queue. This is always the case until packet number #N-1 arrives, and
   a 'gap' indicator is added, effectively turning them into NACK
   messages.
4) When those NACKs arrive at the sender, all the requested
   retransmissions are done, since it is a first-time request.

This sometimes leads to a huge amount of redundant retransmissions,
causing a drop in max throughput. This problem gets worse when we
in a later commit introduce variable window congestion control,
since it drops the link back to 'fast recovery' much more often
than necessary.

We now fix this by not sending any 'gap' indicator in regular ACK
messages. We already have a mechanism for sending explicit NACKs
in place, and this is sufficient to keep up the packet flow.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoppp: Adjust indentation into ppp_async_input
Nathan Chancellor [Mon, 9 Dec 2019 22:38:59 +0000 (15:38 -0700)]
ppp: Adjust indentation into ppp_async_input

Clang warns:

../drivers/net/ppp/ppp_async.c:877:6: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
                                ap->rpkt = skb;
                                ^
../drivers/net/ppp/ppp_async.c:875:5: note: previous statement is here
                                if (!skb)
                                ^
1 warning generated.

This warning occurs because there is a space before the tab on this
line. Clean up this entire block's indentation so that it is consistent
with the Linux kernel coding style and clang no longer warns.

Fixes: 6722e78c9005 ("[PPP]: handle misaligned accesses")
Link: https://github.com/ClangBuiltLinux/linux/issues/800
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: smc911x: Adjust indentation in smc911x_phy_configure
Nathan Chancellor [Mon, 9 Dec 2019 21:50:27 +0000 (14:50 -0700)]
net: smc911x: Adjust indentation in smc911x_phy_configure

Clang warns:

../drivers/net/ethernet/smsc/smc911x.c:939:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
         if (!lp->ctl_rfduplx)
         ^
../drivers/net/ethernet/smsc/smc911x.c:936:2: note: previous statement
is here
        if (lp->ctl_rspeed != 100)
        ^
1 warning generated.

This warning occurs because there is a space after the tab on this line.
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.

Fixes: 0a0c72c9118c ("[PATCH] RE: [PATCH 1/1] net driver: Add support for SMSC LAN911x line of ethernet chips")
Link: https://github.com/ClangBuiltLinux/linux/issues/796
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: tulip: Adjust indentation in {dmfe, uli526x}_init_module
Nathan Chancellor [Mon, 9 Dec 2019 21:16:23 +0000 (14:16 -0700)]
net: tulip: Adjust indentation in {dmfe, uli526x}_init_module

Clang warns:

../drivers/net/ethernet/dec/tulip/uli526x.c:1812:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
        switch (mode) {
        ^
../drivers/net/ethernet/dec/tulip/uli526x.c:1809:2: note: previous
statement is here
        if (cr6set)
        ^
1 warning generated.

../drivers/net/ethernet/dec/tulip/dmfe.c:2217:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
        switch(mode) {
        ^
../drivers/net/ethernet/dec/tulip/dmfe.c:2214:2: note: previous
statement is here
        if (cr6set)
        ^
1 warning generated.

This warning occurs because there is a space before the tab on these
lines. Remove them so that the indentation is consistent with the Linux
kernel coding style and clang no longer warns.

While we are here, adjust the default block in dmfe_init_module to have
a proper break between the label and assignment and add a space between
the switch and opening parentheses to avoid a checkpatch warning.

Fixes: e1c3e5014040 ("[PATCH] initialisation cleanup for ULI526x-net-driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/795
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'dp83867-fix-fifo-depth'
David S. Miller [Tue, 10 Dec 2019 04:19:10 +0000 (20:19 -0800)]
Merge branch 'dp83867-fix-fifo-depth'

Dan Murphy says:

====================
Fix Tx/Rx FIFO depth for DP83867

The DP83867 supports both the RGMII and SGMII modes.  The Tx and Rx FIFO depths
are configurable in these modes but may not applicable for both modes.

When the device is configured for RGMII mode the Tx FIFO depth is applicable
and for SGMII mode both Tx and Rx FIFO depth settings are applicable.  When
the driver was originally written only the RGMII device was available and there
were no standard fifo-depth DT properties.

The patchset converts the special ti,fifo-depth property to the standard
tx-fifo-depth property while still allowing the ti,fifo-depth property to be
set as to maintain backward compatibility.

In addition to this change the rx-fifo-depth property support was added and only
written when the device is configured for SGMII mode.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: phy: dp83867: Add rx-fifo-depth and tx-fifo-depth
Dan Murphy [Mon, 9 Dec 2019 20:10:25 +0000 (14:10 -0600)]
net: phy: dp83867: Add rx-fifo-depth and tx-fifo-depth

This code changes the TI specific ti,fifo-depth to the common
tx-fifo-depth property.  The tx depth is applicable for both RGMII and
SGMII modes of operation.

rx-fifo-depth was added as well but this is only applicable for SGMII
mode.

So in summary
if RGMII mode write tx fifo depth only
if SGMII mode write both rx and tx fifo depths

If the property is not populated in the device tree then set the value
to the default values.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>