David S. Miller [Tue, 6 Aug 2019 19:37:56 +0000 (12:37 -0700)]
Merge branch 'drop_monitor-Various-improvements-and-cleanups'
Ido Schimmel says:
====================
drop_monitor: Various improvements and cleanups
This patchset performs various improvements and cleanups in drop monitor
with no functional changes intended. There are no changes in these
patches relative to the RFC I sent two weeks ago [1].
A followup patchset will extend drop monitor with a packet alert mode in
which the dropped packet is notified to user space instead of just a
summary of recent drops. Subsequent patchsets will add the ability to
monitor hardware originated drops via drop monitor.
[1] https://patchwork.ozlabs.org/cover/
1135226/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:56 +0000 (16:19 +0300)]
drop_monitor: Use pre_doit / post_doit hooks
Each operation from user space should be protected by the global drop
monitor mutex. Use the pre_doit / post_doit hooks to take / release the
lock instead of doing it explicitly in each function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:55 +0000 (16:19 +0300)]
drop_monitor: Add extack support
Add various extack messages to make drop_monitor more user friendly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:54 +0000 (16:19 +0300)]
drop_monitor: Avoid multiple blank lines
Remove multiple blank lines which are visually annoying and useless.
This suppresses the "Please don't use multiple blank lines" checkpatch
messages.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:53 +0000 (16:19 +0300)]
drop_monitor: Document scope of spinlock
While 'per_cpu_dm_data' is a per-CPU variable, its 'skb' and
'send_timer' fields can be accessed concurrently by the CPU sending the
netlink notification to user space from the workqueue and the CPU
tracing kfree_skb(). This spinlock is meant to protect against that.
Document its scope and suppress the checkpatch message "spinlock_t
definition without comment".
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:52 +0000 (16:19 +0300)]
drop_monitor: Rename and document scope of mutex
The 'trace_state_mutex' does not only protect the global 'trace_state'
variable, but also the global 'hw_stats_list'.
Subsequent patches are going add more operations from user space to
drop_monitor and these all need to be mutually exclusive.
Rename 'trace_state_mutex' to the more fitting 'net_dm_mutex' name and
document its scope.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 6 Aug 2019 13:19:51 +0000 (16:19 +0300)]
drop_monitor: Use correct error code
The error code 'ENOTSUPP' is reserved for use with NFS. Use 'EOPNOTSUPP'
instead.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 6 Aug 2019 13:06:09 +0000 (15:06 +0200)]
net: dsa: ksz: Drop NET_DSA_TAG_KSZ9477
This Kconfig option is unused, drop it.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 6 Aug 2019 13:06:08 +0000 (15:06 +0200)]
net: dsa: ksz: Merge ksz_priv.h into ksz_common.h
Merge the two headers into one, no functional change.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 6 Aug 2019 13:06:07 +0000 (15:06 +0200)]
net: dsa: ksz: Remove dead code and fix warnings
Remove ksz_port_cleanup(), which is unused. Add missing include
"ksz_common.h", which fixes the following warning when built with
make ... W=1
drivers/net/dsa/microchip/ksz_common.c:23:6: warning: no previous prototype for ‘...’ [-Wmissing-prototypes]
Note that the order of the headers cannot be swapped, as that would
trigger missing forward declaration errors, which would indicate the
way forward is to merge the two headers into one.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Tristram Ha <Tristram.Ha@microchip.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 2 Aug 2019 06:17:51 +0000 (02:17 -0400)]
cnic: Explicitly initialize all reference counts to 0.
The driver is relying on zero'ed allocated memory and does not
explicitly call atomic_set() to initialize the ref counts to 0. Add
these atomic_set() calls so that it will be more straight forward
to convert atomic ref counts to refcount_t.
Reported-by: Chuhong Yuan <hslester96@gmail.com>
Cc: Rasesh Mody <rmody@marvell.com>
Cc: <GR-Linux-NIC-Dev@marvell.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 22:18:08 +0000 (15:18 -0700)]
ipv6: have a single rcu unlock point in __ip6_rt_update_pmtu
Simplify the unlock path in __ip6_rt_update_pmtu by using a
single point where rcu_read_unlock is called.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Aug 2019 17:50:05 +0000 (10:50 -0700)]
Merge tag 'mlx5-updates-2019-08-01' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-08-01
Misc updates for mlx5 netdev driver:
1) Ingress rate support for E-Switch vports from Eli.
2) Gavi introduces flow counters bulk allocation and pool,
To improve the performance of flow counter acquisition.
3) From Tariq, micro improvements for tx path
4) From Shay, small improvement for XDP TX MPWQE inline flow.
5) Aya provides some cleanups for tx devlink health reporters.
6) Saeed, refactor checksum handling into a single function.
7) Tonghao, allows dropping specific tunnel packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 5 Aug 2019 10:52:11 +0000 (11:52 +0100)]
][next] selftests: nettest: fix spelling mistake: "potocol" -> "protocol"
There is a spelling mistake in an error messgae. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Aug 2019 17:42:05 +0000 (10:42 -0700)]
Merge branch 'net-l3-l4-functional-tests'
David Ahern says:
====================
net: Add functional tests for L3 and L4
This is a port the functional test cases created during the development
of the VRF feature. It covers various permutations of icmp, tcp and udp
for IPv4 and IPv6 including negative tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:48 +0000 (11:56 -0700)]
selftests: Add use case section to fcnal-test
Add use case section to fcnal-test.
Initial test is VRF based with a bridge and vlans. The commands
stem from bug reports fixed by:
a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev")
cd6428988bf4 ("netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slave")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:47 +0000 (11:56 -0700)]
selftests: Add ipv6 netfilter tests to fcnal-test
Add IPv6 netfilter tests to send tcp reset or icmp unreachable for a
port. Initial tests are VRF only.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:46 +0000 (11:56 -0700)]
selftests: Add ipv4 netfilter tests to fcnal-test
Add netfilter tests to send tcp reset or icmp unreachable for a port.
Initial tests are VRF only.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:45 +0000 (11:56 -0700)]
selftests: Add ipv6 runtime tests to fcnal-test
Add IPv6 runtime tests where passive (no traffic flowing) and active
(with traffic) sockets are expected to be reset on device deletes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:44 +0000 (11:56 -0700)]
selftests: Add ipv4 runtime tests to fcnal-test
Add runtime tests where passive (no traffic flowing) and active (with
traffic) sockets are expected to be reset on device deletes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:43 +0000 (11:56 -0700)]
selftests: Add ipv6 address bind tests to fcnal-test
Add IPv6 address bind tests to fcnal-test.sh. Verifies socket binding to
local addresses for raw, tcp and udp including device and VRF cases.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:42 +0000 (11:56 -0700)]
selftests: Add ipv4 address bind tests to fcnal-test
Add address bind tests to fcnal-test.sh. Verifies socket binding to
local addresses for raw, tcp and udp including device and VRF cases.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:41 +0000 (11:56 -0700)]
selftests: Add ipv6 udp tests to fcnal-test
Add IPv6 udp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.udp_l3mdev_accept set to 0 and 1.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:40 +0000 (11:56 -0700)]
selftests: Add ipv4 udp tests to fcnal-test
Add udp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.udp_l3mdev_accept set to 0 and 1.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:39 +0000 (11:56 -0700)]
selftests: Add ipv6 tcp tests to fcnal-test
Add IPv6 tcp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.tcp_l3mdev_accept set to 0 and 1.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:38 +0000 (11:56 -0700)]
selftests: Add ipv4 tcp tests to fcnal-test
Add tcp tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures for both clients and servers. Includes permutations with
net.ipv4.tcp_l3mdev_accept set to 0 and 1.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:37 +0000 (11:56 -0700)]
selftests: Add ipv6 ping tests to fcnal-test
Add IPv6 ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.
Setup includes unreachable routes and fib rules blocking traffic.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:36 +0000 (11:56 -0700)]
selftests: Add ipv4 ping tests to fcnal-test
Add ping tests to fcnal-test.sh. Covers the permutations of directly
connected addresses, routed destinations, VRF and non-VRF, and expected
failures.
Setup includes unreachable routes and fib rules blocking traffic.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:35 +0000 (11:56 -0700)]
selftests: Setup for functional tests for fib and socket lookups
Initial commit for functional test suite for fib and socket lookups.
This commit contains the namespace setup, networking config, test options
and other basic infrastructure.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Aug 2019 18:56:34 +0000 (11:56 -0700)]
selftests: Add nettest
Add nettest - a simple program with an implementation for various networking
APIs. nettest is used for tcp, udp and raw functional tests for both IPv4
and IPv6.
Point of this command versus existing utilities:
- controlled implementation of the APIs and the order in which they
are called,
- ability to verify ingress device, local and remote addresses,
- timeout for controlled test length,
- ability to discriminate a timeout from a system call failure, and
- simplicity with test scripts.
The command returns:
0 on success,
1 for any system call failure, and
2 on timeout.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Aug 2019 17:33:01 +0000 (10:33 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-08-01
This series for fm10k, by Jake Keller, reduces the scope of local variables
where possible.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Aug 2019 01:22:18 +0000 (18:22 -0700)]
Merge branch 'enetc-PCIe-MDIO'
Claudiu Manoil says:
====================
enetc: Add mdio bus driver for the PCIe MDIO endpoint
First patch fixes a sparse issue and cleans up accessors to avoid
casting to __iomem. The second one cleans up the Makefile, to make
it easier to add new entries.
Third patch just registers the PCIe endpoint device containing
the MDIO registers as a standalone MDIO bus driver, to provide
an alternative way to control the MDIO bus. The same code used
by the ENETC ports (eth controllers) to manage MDIO via local
registers applies and is reused.
Bindings are provided for the new MDIO node, similarly to ENETC
port nodes bindings.
Last patch enables the ENETC port 1 and its RGMII PHY on the
LS1028A QDS board, where the MDIO muxing configuration relies
on the MDIO support provided in the first patch.
Changes since v0:
v1 - fixed mdio bus allocation
v2 - cleaned up accessors to avoid casting
v3 - fixed spelling (mostly commit message)
v4 - fixed err path check blunder
v5 - fixed loadble module build, provided separate kbuild module
for the driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 1 Aug 2019 11:52:53 +0000 (14:52 +0300)]
arm64: dts: fsl: ls1028a: Enable eth port1 on the ls1028a QDS board
LS1028a has one Ethernet management interface. On the QDS board, the
MDIO signals are multiplexed to either on-board AR8035 PHY device or
to 4 PCIe slots allowing for SGMII cards.
To enable the Ethernet ENETC Port 1, which can only be connected to a
RGMII PHY, the multiplexer needs to be configured to route the MDIO to
the AR8035 PHY. The MDIO/MDC routing is controlled by bits 7:4 of FPGA
board config register 0x54, and value 0 selects the on-board RGMII PHY.
The FPGA board config registers are accessible on the i2c bus, at address
0x66.
The PF3 MDIO PCIe integrated endpoint device allows for centralized access
to the MDIO bus. Add the corresponding devicetree node and set it to be
the MDIO bus parent.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 1 Aug 2019 11:52:52 +0000 (14:52 +0300)]
dt-bindings: net: fsl: enetc: Add bindings for the central MDIO PCIe endpoint
The on-chip PCIe root complex that integrates the ENETC ethernet
controllers also integrates a PCIe endpoint for the MDIO controller
providing for centralized control of the ENETC mdio bus.
Add bindings for this "central" MDIO Integrated PCIe Endpoint.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 1 Aug 2019 11:52:51 +0000 (14:52 +0300)]
enetc: Add mdio bus driver for the PCIe MDIO endpoint
ENETC ports can manage the MDIO bus via local register
interface. However there's also a centralized way
to manage the MDIO bus, via the MDIO PCIe endpoint
device integrated by the same root complex that also
integrates the ENETC ports (eth controllers).
Depending on board design and use case, centralized
access to MDIO may be better than using local ENETC
port registers. For instance, on the LS1028A QDS board
where MDIO muxing is required. Also, the LS1028A on-chip
switch doesn't have a local MDIO register interface.
The current patch registers the above PCIe endpoint as a
separate MDIO bus and provides a driver for it by re-using
the code used for local MDIO access. It also allows the
ENETC port PHYs to be managed by this driver if the local
"mdio" node is missing from the ENETC port node.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 1 Aug 2019 11:52:50 +0000 (14:52 +0300)]
enetc: Clean up makefile
Clean up overcomplicated makefile to make it more maintainable.
Basically, there's a set of common objects shared between
the PF and VF driver modules. This can be implemented in a
simpler way, without conditionals, less repetition, allowing
also for easier updates in the future.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 1 Aug 2019 11:52:49 +0000 (14:52 +0300)]
enetc: Clean up local mdio bus allocation
What's needed is basically a pointer to the mdio registers.
This is one way to store it inside bus->priv allocated space,
without upsetting sparse.
Reworked accessors to avoid __iomem casting.
Used devm_* variant to further clean up the init error /
remove paths.
Fixes following sparse warning:
warning: incorrect type in assignment (different address spaces)
expected void *priv
got struct enetc_mdio_regs [noderef] <asn:2>*[assigned] regs
Fixes: ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Aug 2019 00:58:53 +0000 (17:58 -0700)]
Merge branch 'net-dsa-mv88e6xxx-add-support-for-MV88E6220'
Hubert Feurstein says:
====================
net: dsa: mv88e6xxx: add support for MV88E6220
This patch series adds support for the MV88E6220 chip to the mv88e6xxx driver.
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins.
Furthermore, PTP support is added to the MV88E6250 family.
v2:
- insert all 6220 entries in correct numerical order
- introduce invalid_port_mask
- move ptp_cc_mult* to ptp_ops and restored original ptp_adjfine code
- added Andrews Reviewed-By to patch 2 and 4
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:51 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: add PTP support for MV88E6250 family
This adds PTP support for the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:50 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: order ptp structs numerically ascending
As it is done for all the other structs within this driver.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:49 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: setup message port is not supported in the 6250 familiy
The MV88E6250 family doesn't support the MV88E6XXX_PORT_CTL1_MESSAGE_PORT
bit.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:48 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: introduce invalid_port_mask in mv88e6xxx_info
With this it is possible to mark certain chip ports as invalid. This is
required for example for the MV88E6220 (which is in general a MV88E6250
with 7 ports) but the ports 2-4 are not routed to pins.
If a user configures an invalid port, an error is returned.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:47 +0000 (10:23 +0200)]
dt-bindings: net: dsa: marvell: add 6220 model to the 6250 family
The MV88E6220 is part of the MV88E6250 family.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hubert Feurstein [Wed, 31 Jul 2019 08:23:46 +0000 (10:23 +0200)]
net: dsa: mv88e6xxx: add support for MV88E6220
The MV88E6220 is almost the same as MV88E6250 except that the ports 2-4 are
not routed to pins. So the usable ports are 0, 1, 5 and 6.
Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Aug 2019 00:56:37 +0000 (17:56 -0700)]
Merge branch 'net-phy-Add-AST2600-MDIO-support'
Andrew Jeffery says:
====================
net: phy: Add AST2600 MDIO support
v2 of the ASPEED MDIO series addresses comments from Rob on the devicetree
bindings and Andrew on the driver itself.
v1 of the series can be found here:
http://patchwork.ozlabs.org/cover/
1138140/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Jeffery [Wed, 31 Jul 2019 05:39:59 +0000 (15:09 +0930)]
net: ftgmac100: Select ASPEED MDIO driver for the AST2600
Ensures we can talk to a PHY via MDIO on the AST2600, as the MDIO
controller is now separate from the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Jeffery [Wed, 31 Jul 2019 05:39:58 +0000 (15:09 +0930)]
net: ftgmac100: Add support for DT phy-handle property
phy-handle is necessary for the AST2600 which separates the MDIO
controllers from the MAC.
I've tried to minimise the intrusion of supporting the AST2600 to the
FTGMAC100 by leaving in place the existing MDIO support for the embedded
MDIO interface. The AST2400 and AST2500 continue to be supported this
way, as it avoids breaking/reworking existing devicetrees.
The AST2600 support by contrast requires the presence of the phy-handle
property in the MAC devicetree node to specify the appropriate PHY to
associate with the MAC. In the event that someone wants to specify the
MDIO bus topology under the MAC node on an AST2400 or AST2500, the
current auto-probe approach is done conditional on the absence of an
"mdio" child node of the MAC.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Jeffery [Wed, 31 Jul 2019 05:39:57 +0000 (15:09 +0930)]
net: phy: Add mdio-aspeed
The AST2600 design separates the MDIO controllers from the MAC, which is
where they were placed in the AST2400 and AST2500. Further, the register
interface is reworked again, so now we have three possible different
interface implementations, however this driver only supports the
interface provided by the AST2600. The AST2400 and AST2500 will continue
to be supported by the MDIO support embedded in the FTGMAC100 driver.
The hardware supports both C22 and C45 mode, but for the moment only C22
support is implemented.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Jeffery [Wed, 31 Jul 2019 05:39:56 +0000 (15:09 +0930)]
dt-bindings: net: Add aspeed, ast2600-mdio binding
The AST2600 splits out the MDIO bus controller from the MAC into its own
IP block and rearranges the register layout. Add a new binding to
describe the new hardware.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Tue, 30 Jul 2019 14:23:18 +0000 (16:23 +0200)]
tipc: reduce risk of wakeup queue starvation
In commit
365ad353c256 ("tipc: reduce risk of user starvation during
link congestion") we allowed senders to add exactly one list of extra
buffers to the link backlog queues during link congestion (aka
"oversubscription"). However, the criteria for when to stop adding
wakeup messages to the input queue when the overload abates is
inaccurate, and may cause starvation problems during very high load.
Currently, we stop adding wakeup messages after 10 total failed attempts
where we find that there is no space left in the backlog queue for a
certain importance level. The counter for this is accumulated across all
levels, which may lead the algorithm to leave the loop prematurely,
although there may still be plenty of space available at some levels.
The result is sometimes that messages near the wakeup queue tail are not
added to the input queue as they should be.
We now introduce a more exact algorithm, where we keep adding wakeup
messages to a level as long as the backlog queue has free slots for
the corresponding level, and stop at the moment there are no more such
slots or when there are no more wakeup messages to dequeue.
Fixes: 365ad35 ("tipc: reduce risk of user starvation during link congestion")
Reported-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Mon, 8 Jul 2019 23:12:28 +0000 (16:12 -0700)]
fm10k: reduce scope of the ring variable
Reduce the scope of the ring local variable in the fm10k_assign_l2_accel
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_netdev.c:1447]: (style) The scope of the variable 'ring' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:27 +0000 (16:12 -0700)]
fm10k: reduce the scope of the result local variable
Reduce the scope of the result local variable in the
fm10k_iov_msg_lport_state_pf function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pf.c:1435]: (style) The scope of the variable 'result' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:26 +0000 (16:12 -0700)]
fm10k: reduce the scope of the local msg variable
The msg variable in the fm10k_mbx_validate_msg_size and
fm10k_sm_mbx_transmit functions is only used within the do {} loop
scope. Reduce its scope only to where it is used.
This was detected by cppcheck, and resolves the following warnings
produced by that tool:
[fm10k_mbx.c:299]: (style) The scope of the variable 'msg' can be reduced.
[fm10k_mbx.c:2004]: (style) The scope of the variable 'msg' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:25 +0000 (16:12 -0700)]
fm10k: reduce the scope of the local i variable
Reduce the scope of the local loop variable in the
fm10k_check_hang_subtask function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[driver/fm10k_pci.c:852]: (style) The scope of the variable 'i' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:24 +0000 (16:12 -0700)]
fm10k: reduce the scope of the err variable
Reduce the scope of the local variable err in the fm10k_detach_subtask
function.
This was detected by cppcheck and resolves the following warning
produced by that tool:
[fm10k_pci.c:403]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:23 +0000 (16:12 -0700)]
fm10k: reduce the scope of the tx_buffer variable
The tx_buffer local variable in the function fm10k_clean_tx_ring is not
used except inside a smaller block scope. Reduce the scope to its point
of use.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_netdev.c:179]: (style) The scope of the variable 'tx_buffer' can
be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:22 +0000 (16:12 -0700)]
fm10k: reduce the scope of the q_idx local variable
Reduce the scope of the q_idx local variable in the fm10k_cache_ring_qos
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_main.c:2016]: (style) The scope of the variable 'q_idx' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:21 +0000 (16:12 -0700)]
fm10k: reduce the scope of local err variable
Reduce the scope of the local err variable in the fm10k_iov_alloc_data
function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_iov.c:426]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:20 +0000 (16:12 -0700)]
fm10k: reduce the scope of qv local variable
Reduce the scope of the qv vector pointer local variable in the
fm10k_set_coalesce function.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:658]: (style) The scope of the variable 'qv' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:19 +0000 (16:12 -0700)]
fm10k: reduce scope of *p local variable
Reduce the scope of the char *p local variable to only the block where
it is used.
This was detected by cppcheck and resolves the following style warning
produced by that tool:
[fm10k_ethtool.c:229]: (style) The scope of the variable 'p' can be
reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 8 Jul 2019 23:12:18 +0000 (16:12 -0700)]
fm10k: reduce scope of the err variable
Reduce the scope of the err local variable in the fm10k_dcbnl_ieee_setets
function.
This was detected using cppcheck, and resolves the following style
warning:
[fm10k_dcbnl.c:37]: (style) The scope of the variable 'err' can be reduced.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Thu, 1 Aug 2019 20:43:09 +0000 (16:43 -0400)]
Merge branch 'net-dsa-mv88e6xxx-avoid-some-redundant-VTU-operations'
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: avoid some redundant VTU operations
The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a
single entry and uses a boolean to eventually initialize a fresh one.
However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext
is simple enough to call it directly. Doing so makes the code easier to read,
especially for the return code expected by switchdev to honor software VLANs.
In addition to not loading the VTU again when an entry is already correctly
programmed, this also allows to avoid programming the broadcast entries
again when updating a port's membership, from e.g. tagged to untagged.
This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct
calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add
and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention.
In case the port's membership is already correctly programmed in hardware,
the following debug message may be printed:
[ 745.989884] mv88e6085
2188000.ethernet-1:00: p4: already a member of VLAN 42
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 1 Aug 2019 18:36:37 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_add
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and
_mv88e6xxx_port_vlan_add is the only function requiring the preparation
of a new VLAN entry.
To simplify things up, remove the mv88e6xxx_vtu_get wrapper and
explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework
also avoids programming the broadcast entries again when changing a
port's membership, e.g. from tagged to untagged.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 1 Aug 2019 18:36:36 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_del
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read.
Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del
and the return value expected by switchdev in case of software VLANs.
At the same time, rename the helper using an old underscore convention.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 1 Aug 2019 18:36:35 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: call vtu_getnext directly in db load/purge
mv88e6xxx_vtu_getnext is simple enough to call it directly in the
mv88e6xxx_port_db_load_purge function and explicit the return code
expected by switchdev for software VLANs when an hardware VLAN does
not exist.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 1 Aug 2019 18:36:34 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: explicit entry passed to vtu_getnext
mv88e6xxx_vtu_getnext interprets two members from the input
mv88e6xxx_vtu_entry structure: the (excluded) vid member to start
the iteration from, and the valid argument specifying whether the VID
must be written or not (only required once at the start of a loop).
Explicit the assignation of these two fields right before calling
mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 1 Aug 2019 18:36:33 +0000 (14:36 -0400)]
net: dsa: mv88e6xxx: lock mutex in vlan_prepare
Lock the mutex in the mv88e6xxx_port_vlan_prepare function
called by the DSA stack, instead of doing it in the internal
mv88e6xxx_port_check_hw_vlan helper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Thu, 1 Aug 2019 08:40:59 +0000 (16:40 +0800)]
net/mlx5e: Allow dropping specific tunnel packets
In some case, we don't want to allow specific tunnel packets
to host that can avoid to take up high CPU (e.g network attacks).
But other tunnel packets which not matched in hardware will be
sent to host too.
$ tc filter add dev vxlan_sys_4789 \
protocol ip chain 0 parent ffff: prio 1 handle 1 \
flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \
enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \
action tunnel_key unset pipe action drop
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Mon, 24 Jun 2019 17:33:52 +0000 (20:33 +0300)]
net/mlx5e: TX reporter cleanup
Remove redundant include files.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Mon, 24 Jun 2019 16:34:42 +0000 (19:34 +0300)]
net/mlx5e: Set tx reporter only on successful creation
When failing to create tx reporter, don't set the reporter's pointer.
Creating a reporter is not mandatory for driver load, avoid
garbage/error pointer.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Wed, 3 Jul 2019 06:16:52 +0000 (09:16 +0300)]
net/mlx5e: Fix mlx5e_tx_reporter_create return value
Return error when failing to create a reporter in devlink. Since
NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer
can't be NULL and can only hold an error in bad path.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Fri, 3 May 2019 22:12:46 +0000 (15:12 -0700)]
net/mlx5e: Rx, checksum handling refactoring
Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is
supposed to fixup SKB checksum if needed. And rename
mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup().
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Mon, 1 Jul 2019 09:08:08 +0000 (12:08 +0300)]
net/mlx5e: Tx, Soften inline mode VLAN dependencies
If capable, use zero inline mode in TX WQE for non-VLAN packets.
For VLAN ones, keep the enforcement of at least L2 inline mode,
unless the WQE VLAN insertion offload cap is on.
Performance:
Tested single core packet rate of 64Bytes.
NIC: ConnectX-5
CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
pktgen:
Before: 12.46 Mpps
After: 14.65 Mpps (+17.5%)
XDP_TX:
The MPWQE flow is not affected, as it already has this optimization.
So we test with priv-flag xdp_tx_mpwqe: off.
Before: 9.90 Mpps
After: 10.20 Mpps (+3%)
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Tested-by: Noam Stolero <noams@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 14 Jul 2019 14:50:51 +0000 (17:50 +0300)]
net/mlx5e: XDP, Slight enhancement for WQE fetch function
Instead of passing an output param, let function return the
WQE pointer.
In addition, pass &pi so it gets its value in the function,
and save the redundant assignment that comes after it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shay Agroskin [Sun, 12 May 2019 15:28:27 +0000 (18:28 +0300)]
net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left
In MPWQE mode, when transmitting packets with XDP, a packet that is smaller
than a certain size (set to 256 bytes) would be sent inline within its WQE
TX descriptor (mem-copied), in case the hardware tx queue is congested
beyond a pre-defined water-mark.
If a MPWQE cannot contain an additional inline packet, we close this
MPWQE session, and send the packet inlined within the next MPWQE.
To save some MPWQE session close+open operations, we don't open MPWQE
sessions that are contiguously smaller than certain size (set to the
HW MPWQE maximum size). If there isn't enough contiguous room in the
send queue, we fill it with NOPs and wrap the send queue index around.
This way, qualified packets are always sent inline.
Perf tests:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
XDP_TX:
With 24 channels:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 113.6Mpps | 96.3Mpps | 84% |
| after | 115Mpps | 99.5Mpps | 86% |
With one channel:
| ------ | bounced packets | inlined packets | inline ratio |
| before | 6.7Mpps | 0pps | 0% |
| after | 6.8Mpps | 0pps | 0% |
As we can see, there is improvement in both inline ratio and overall
packet rate for 24 channels. Also, we see no degradation for the
one-channel case.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Thu, 11 Jul 2019 08:20:22 +0000 (11:20 +0300)]
net/mlx5e: Tx, Strict the room needed for SQ edge NOPs
We use NOPs to populate the WQ fragment edge if the WQE does not fit
in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ).
The upper bound on the needed number of NOPs is one WQEBB less than
the largest possible WQE, for otherwise the WQE would certainly fit.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Thu, 27 Jun 2019 17:53:03 +0000 (20:53 +0300)]
net/mlx5: Add flow counter pool
Add a pool of flow counters, based on flow counter bulks, removing the
need to allocate a new counter via a costly FW command during the flow
creation process. The time it takes to acquire/release a flow counter
is cut from ~50 [us] to ~50 [ns].
The pool is part of the mlx5 driver instance, and provides flow
counters for aging flows. mlx5_fc_create() was modified to provide
counters for aging flows from the pool by default, and
mlx5_destroy_fc() was modified to release counters back to the pool
for later reuse. If bulk allocation is not supported or fails, and for
non-aging flows, the fallback behavior is to allocate and free
individual counters.
The pool is comprised of three lists of flow counter bulks, one of
fully used bulks, one of partially used bulks, and one of unused
bulks. Counters are provided from the partially used bulks first, to
help limit bulk fragmentation.
The pool maintains a threshold, and strives to maintain the amount of
available counters below it. The pool is increased in size when a
counter acquisition request is made and there are no available
counters, and it is decreased in size when the last counter in a bulk
is released and there are more available counters than the threshold.
All pool size changes are done in the context of the
acquiring/releasing process.
The value of the threshold is directly correlated to the amount of
used counters the pool is providing, while constrained by a hard
maximum, and is recalculated every time a bulk is allocated/freed.
This ensures that the pool only consumes large amounts of memory for
available counters if the pool is being used heavily. When fully
populated and at the hard maximum, the buffer of available counters
consumes ~40 [MB].
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Thu, 27 Jun 2019 10:58:56 +0000 (13:58 +0300)]
net/mlx5: Add flow counter bulk infrastructure
Add infrastructure to track bulks of flow counters, providing
the means to allocate and deallocate bulks, and to acquire and
release individual counters from the bulks.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Wed, 8 May 2019 08:44:56 +0000 (11:44 +0300)]
net/mlx5: E-Switch, add ingress rate support
Use the scheduling elements to implement ingress rate limiter on an
eswitch ports ingress traffic. Since the ingress of eswitch port is the
egress of VF port, we control eswitch ingress by controlling VF egress.
Configuration is done using the ports' representor net devices.
Please note that burst size configuration is not supported by devices
ConnectX-5 and earlier generations.
Configuration examples:
tc:
tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k
ovs:
ovs-vsctl set interface eth0 ingress_policing_rate=1000
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 1 Aug 2019 18:21:20 +0000 (11:21 -0700)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch.
1) Eli improves the handling of the support for QoS element type
2) Gavi refactors and prepares mlx5 flow counters for bulk allocation
support
3) Parav, refactors and improves E-Switch load/unload flows
4) Saeed, two misc cleanups
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:13:12 +0000 (21:13 +0000)]
net/mlx5: E-switch, Tide up eswitch config sequence
Currently for PF and ECPF vports, representors are created before
their eswitch hardware ports are initialized in below flow.
mlx5_eswitch_enable()
esw_offloads_init()
esw_offloads_load_all_reps()
[..]
esw_enable_vport()
However for VFs, vports are initialized before creating their
respective netdev represnetors in event handling context.
Similarly while disabling eswitch, first hardware vports are disabled,
followed by destroying their representors.
Here while underlying vports gets destroyed but its respective user
facing netdevice can still exist on which user can continue to perform
more offload operations.
Instead, its more accurate to do
enable_eswitch switchdev mode:
1. perform FDB tables initialization
2. initialize hw vport
3. create and publish representor for this vport
disable_eswitch switchdev mode:
1. destroy user facing representor for the vport
2. disable hw vport
3. perform FDB tables cleanup
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:13:10 +0000 (21:13 +0000)]
net/mlx5: E-Switch, Remove redundant mc_promisc NULL check
mc_promisc pointer points to an instance of struct esw_mc_addr allocated
as part of the esw structure.
Hence it cannot be NULL.
Removed such redundant check and assign where it is actually used.
While at it, add comment around legacy mode fields and move mc_promisc
close to other legacy mode structures to improve code redability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Mon, 29 Jul 2019 21:13:08 +0000 (21:13 +0000)]
net/mlx5: E-Switch, remove redundant error handling
We don't need to handle error flow of esw_create_legacy_table() in the
same branch, it is already being handled directly after the if statement,
for both legacy and switchdev modes in one place.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:13:06 +0000 (21:13 +0000)]
net/mlx5: E-switch, Introduce helper function to enable/disable vports
vports needs to be enabled in switchdev and legacy mode.
In switchdev mode, vports should be enabled after initializing
the FDB tables and before creating their represntors so that
representor works on an initialized vport object.
Prepare a helper function which can be called when enabling either of
the eswitch modes.
Similarly, have disable vports helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:13:04 +0000 (21:13 +0000)]
net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports
First enable TSAR Qos hardware block in device before enabling its
user vports.
This refactor is needed so that vports can be enabled before their
representor netdevice can be created.
While at it, esw_create_tsar() returns error code which was used only to
print error. However esw_create_tsar() already prints warning if it hits
an error.
Hence, remove the redundant warning.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:13:02 +0000 (21:13 +0000)]
net/mlx5: E-switch, Combine metadata enable/disable functionality
Except bit toggling code, rest of the code is same to enable/disable
metadata passing functionality.
Hence, combine them to single function and control using enable flag.
Also instead of checking metadata supported at multiple places,
fold into the helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Mon, 29 Jul 2019 21:13:00 +0000 (21:13 +0000)]
net/mlx5: E-Switch, Verify support QoS element type
Check if firmware supports the requested element type before
attempting to create the element type.
In addition, explicitly specify the request element type and tsar type.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Mon, 29 Jul 2019 21:12:58 +0000 (21:12 +0000)]
net/mlx5: Make load_one() and unload_one() symmetric
Currently mlx5_load_one() perform device registration using
mlx5_register_device(). But mlx5_unload_one() doesn't unregister.
Make them symmetric by doing device unregistration in
mlx5_unload_one().
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Mon, 29 Jul 2019 21:12:56 +0000 (21:12 +0000)]
net/mlx5: Fix offset of tisc bits reserved field
First reserved field is off by one instead of reserved_at_1 it should be
reserved_at_2, fix that.
Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Mon, 29 Jul 2019 21:12:54 +0000 (21:12 +0000)]
net/mlx5: Add flow counter bulk allocation hardware bits and command
Add a handle to invoke the new FW capability of allocating a bulk of
flow counters.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Mon, 29 Jul 2019 21:12:52 +0000 (21:12 +0000)]
net/mlx5: Refactor and optimize flow counter bulk query
Towards introducing the ability to allocate bulks of flow counters,
refactor the flow counter bulk query process, removing functions and
structs whose names indicated being used for flow counter bulk
allocation FW commands, despite them actually only being used to
support bulk querying, and migrate their functionality to correctly
named functions in their natural location, fs_counters.c.
Additionally, optimize the bulk query process by:
* Extracting the memory used for the query to mlx5_fc_stats so
that it is only allocated once, and not for each bulk query.
* Querying all the counters in one function call.
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Thu, 1 Aug 2019 17:41:43 +0000 (13:41 -0400)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-07-31
This series contains updates to ice driver only.
Paul adds support for reporting what the link partner is advertising for
flow control settings.
Jake fixes the hardware statistics register which is prone to rollover
since the statistic registers are either 32 or 40 bits wide, depending
on which register is being read. So use a 64 bit software statistic to
store off the hardware statistics to track past when it rolls over.
Fixes an issue with the locking of the control queue, where locks were
being destroyed at run time.
Tony fixes an issue that was created when interrupt tracking was
refactored and the call to ice_vsi_setup_vector_base() was removed from
the PF VSI instead of the VF VSI. Adds a check before trying to
configure a port to ensure that media is attached.
Brett fixes an issue in the receive queue configuration where prefena
(Prefetch Enable) was being set to 0 which caused the hardware to only
fetch descriptors when there are none free in the cache for a received
packet. Updates the driver to only bump the receive tail once per
napi_poll call, instead of the current model of bumping the tail up to 4
times per napi_poll call. Adds statistics for receive drops at the port
level to ethtool/netlink. Cleans up duplicate code in the allocation of
receive buffer code.
Akeem updates the driver to ensure that VFs stay disabled until the
setup or reset is completed. Modifies the driver to use the allocated
number of transmit queues per VSI to set up the scheduling tree versus
using the total number of available transmit queues. Also fix the
driver to update the total number of configured queues, after a
successful VF request to change its number of queues before updating the
corresponding VSI for that VF. Cleaned up unnecessary flags that are no
longer needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Aug 2019 17:32:13 +0000 (13:32 -0400)]
Merge branch 'net-hns3-some-code-optimizations-bugfixes-features'
Huazhong Tan says:
====================
net: hns3: some code optimizations & bugfixes & features
This patch-set includes code optimizations, bugfixes and features for
the HNS3 ethernet controller driver.
[patch 01/12] adds support for reporting link change event.
[patch 02/12] adds handler for NCSI error.
[patch 03/12] fixes bug related to debugfs.
[patch 04/12] adds a code optimization for setting ring parameters.
[patch 05/12 - 09/12] adds some cleanups.
[patch 10/12 - 12/12] adds some patches related to reset issue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 1 Aug 2019 03:55:45 +0000 (11:55 +0800)]
net: hns3: activate reset timer when calling reset_event
When calling hclge_reset_event() within HCLGE_RESET_INTERVAL,
it returns directly now. If no one call it again, then the
error which needs a reset to fix it can not be fixed.
So this patch activates the reset timer for this case, and
adds checking in the end of the reset procedure to make this
error fixed earlier.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 1 Aug 2019 03:55:44 +0000 (11:55 +0800)]
net: hns3: clear reset interrupt status in hclge_irq_handle()
Currently, the reset interrupt is cleared in the reset task, which
is too late. Since, when the hardware finish the previous reset,
it can begin to do a new global/IMP reset, if this new coming reset
type is same as the previous one, the driver will clear them together,
then driver can not get that there is another reset, but the hardware
still wait for the driver to deal with the second one.
So this patch clears PF's reset interrupt status in the
hclge_irq_handle(), the hardware waits for handshaking from
driver before doing reset, so the driver and hardware deal with reset
one by one.
BTW, when VF doing global/IMP reset, it reads PF's reset interrupt
register to get that whether PF driver's re-initialization is done,
since VF's re-initialization should be done after PF's. So we add
a new command and a register bit to do that. When VF receive reset
interrupt, it sets up this bit, and PF finishes re-initialization
send command to clear this bit, then VF do re-initialization.
Fixes: 4ed340ab8f49 ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 1 Aug 2019 03:55:43 +0000 (11:55 +0800)]
net: hns3: fix some reset handshake issue
Currently, the driver sets handshake status to tell the hardware
that the driver have downed the netdev and it can continue with
reset process. The driver will clear the handshake status when
re-initializing the CMDQ, and does not recover this status
when reset fail, which may cause the hardware to wait for
the handshake status to be set and not being able to continue
with reset process.
So this patch delays clearing handshake status just before UP,
and recovers this status when reset fail.
BTW, this patch adds a new function hclge(vf)_reset_handshake() to
deal with the reset handshake issue, and renames
HCLGE(VF)_NIC_CMQ_ENABLE to HCLGE(VF)_NIC_SW_RST_RDY which
represents this register bit more accurately.
Fixes: ada13ee3db7b ("net: hns3: add handshake with hardware while doing reset")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guojia Liao [Thu, 1 Aug 2019 03:55:42 +0000 (11:55 +0800)]
net: hns3: rename a member in struct hclge_mac_ethertype_idx_rd_cmd
The member 'mac_add' defined in hclge_mac_ethertype_idx_rd_cmd
means MAC address, so 'mac_addr' is a better name for it.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Thu, 1 Aug 2019 03:55:41 +0000 (11:55 +0800)]
net: hns3: simplify hclge_cmd_query_error()
The 4th and 5th parameter of hclge_cmd_query_error is useless, so this
patch removes them.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 1 Aug 2019 03:55:40 +0000 (11:55 +0800)]
net: hns3: minior error handling change for hclge_tm_schd_info_init
When hclge_tm_schd_info_update calls hclge_tm_schd_info_init to
initialize the schedule info, hdev->tm_info.num_pg and
hdev->tx_sch_mode is not changed, which makes the checking in
hclge_tm_schd_info_init unnecessary.
So this patch moves the hdev->tm_info.num_pg and hdev->tx_sch_mode
checking into hclge_tm_schd_init and changes the return type of
hclge_tm_schd_info_init from int to void.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 1 Aug 2019 03:55:39 +0000 (11:55 +0800)]
net: hns3: minor cleanup in hns3_clean_rx_ring
The unused_count variable is used to indicate how many
RX BD need attaching new buffer in hns3_clean_rx_ring,
and the clean_count variable has the similar meaning.
This patch removes the clean_count variable and use
unused_count to uniformly indicate the RX BD that need
attaching new buffer.
This patch also clean up some coding style related to
variable assignment in hns3_clean_rx_ring.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>