David S. Miller [Wed, 18 Mar 2020 23:46:20 +0000 (16:46 -0700)]
Merge branch 'mlxsw-spectrum_cnt-Expose-counter-resources'
Ido Schimmel says:
====================
mlxsw: spectrum_cnt: Expose counter resources
Jiri says:
Capacity and utilization of existing flow and RIF counters are currently
unavailable to be seen by the user. Use the existing devlink resources
API to expose the information:
$ sudo devlink resource show pci/0000:00:10.0 -v
pci/0000:00:10.0:
name kvd resource_path /kvd size 524288 unit entry dpipe_tables none
name span_agents resource_path /span_agents size 8 occ 0 unit entry dpipe_tables none
name counters resource_path /counters size 79872 occ 44 unit entry dpipe_tables none
resources:
name flow resource_path /counters/flow size 61440 occ 4 unit entry dpipe_tables none
name rif resource_path /counters/rif size 18432 occ 40 unit entry dpipe_tables none
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:57 +0000 (15:48 +0200)]
selftests: mlxsw: Add tc action hw_stats tests
Add tests for mlxsw hw_stats types.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:56 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Expose devlink resource occupancy for counters
Implement occupancy counting for counters and expose over devlink
resource API.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:55 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Consolidate subpools initialization
Put all init operations related to subpools into
mlxsw_sp_counter_sub_pools_init().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:54 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Move config validation along with resource register
Move the validation of subpools configuration, to avoid possible over
commitment to resource registration. Add WARN_ON to indicate bug
in the code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:53 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Expose subpool sizes over devlink resources
Implement devlink resources support for counter pools. Move the subpool
sizes calculations into the new resources register function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:52 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Add entry_size_res_id for each subpool and use it to query entry size
Add new field to subpool struct that would indicate which
resource id should be used to query the entry size for
the subpool from the device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:51 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Move sub_pools under per-instance pool struct
Currently, the global static array of subpools is used. Make it
per-instance as multiple instances of the mlxsw driver can have
different values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:50 +0000 (15:48 +0200)]
selftests: spectrum-2: Adjust tc_flower_scale limit according to current counter count
With the change that made the code to query counter bank size from device
instead of using hard-coded value, the number of available counters
changed for Spectrum-2. Adjust the limit in the selftests.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 18 Mar 2020 13:48:49 +0000 (15:48 +0200)]
mlxsw: spectrum_cnt: Query bank size from FW resources
The bank size is different between Spectrum versions. Also it is
a resource that can be queried. So instead of hard coding the value in
code, query it from the firmware.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 18 Mar 2020 10:54:51 +0000 (16:24 +0530)]
cxgb4: rework TC filter rule insertion across regions
Chelsio NICs have 3 filter regions, in following order of priority:
1. High Priority (HPFILTER) region (Highest Priority).
2. HASH region.
3. Normal FILTER region (Lowest Priority).
Currently, there's a 1-to-1 mapping between the prio value passed
by TC and the filter region index. However, it's possible to have
multiple TC rules with the same prio value. In this case, if a region
is exhausted, no attempt is made to try inserting the rule in the
next available region.
So, rework and remove the 1-to-1 mapping. Instead, dynamically select
the region to insert the filter rule, as long as the new rule's prio
value doesn't conflict with existing rules across all the 3 regions.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 18 Mar 2020 09:33:22 +0000 (10:33 +0100)]
netfilter: revert introduction of egress hook
This reverts the following commits:
8537f78647c0 ("netfilter: Introduce egress hook")
5418d3881e1f ("netfilter: Generalize ingress hook")
b030f194aed2 ("netfilter: Rename ingress hook include file")
>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.
[0] https://lore.kernel.org/netdev/cover.
1583927267.git.lukas@wunner.de/
[1] https://lore.kernel.org/netdev/
20200318.011152.
72770718915606186.davem@davemloft.net/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 23:33:36 +0000 (16:33 -0700)]
Merge branch 's390-qeth-next'
Julian Wiedmann says:
====================
s390/qeth: updates 2020-03-18
please apply the following patch series for qeth to netdev's net-next
tree.
This consists of three parts:
1) support for __GFP_MEMALLOC,
2) several ethtool enhancements (.set_channels, SW Timestamping),
3) the usual cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:55 +0000 (13:54 +0100)]
s390/qeth: use dev->reg_state
To check whether a netdevice has already been registered, look at
NETREG_REGISTERED to replace some hacks I added a while ago.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:54 +0000 (13:54 +0100)]
s390/qeth: remove gratuitous NULL checks
qeth_do_ioctl() is only reached through our own net_device_ops, so we
can trust that dev->ml_priv still contains what we put there earlier.
qeth_bridgeport_an_set() is an internal function that doesn't require
such sanity checks.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:53 +0000 (13:54 +0100)]
s390/qeth: add phys_to_virt() translation for AOB
Data addresses in the AOB are absolute, and need to be translated before
being fed into kmem_cache_free(). Currently this phys_to_virt() is a no-op.
Also see commit
2db01da8d25f ("s390/qdio: fill SBALEs with absolute addresses").
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:52 +0000 (13:54 +0100)]
s390/qeth: don't report hard-coded driver version
Versions are meaningless for an in-kernel driver.
Instead use the UTS_RELEASE that is set by ethtool_get_drvinfo().
Cc: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:51 +0000 (13:54 +0100)]
s390/qeth: add SW timestamping support for IQD devices
This adds support for SOF_TIMESTAMPING_TX_SOFTWARE.
No support for non-IQD devices, since they orphan the skb in their xmit
path.
To play nice with TX bulking, set the timestamp when the buffer that
contains the skb(s) is actually flushed out to HW.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:50 +0000 (13:54 +0100)]
s390/qeth: balance the TX queue selection for IQD devices
For ucast traffic, qeth_iqd_select_queue() falls back to
netdev_pick_tx(). This will potentially use skb_tx_hash() to distribute
the flow over all active TX queues - so txq 0 is a valid selection, and
qeth_iqd_select_queue() needs to check for this and put it on some other
queue. As a result, the distribution for ucast flows is unbalanced and
hits QETH_IQD_MIN_UCAST_TXQ heavier than the other queues.
Open-coding a custom variant of skb_tx_hash() isn't an option, since
netdev_pick_tx() also gives us eg. access to XPS. But we can pull a
little trick: add a single TC class that excludes the mcast txq, and
thus encourage skb_tx_hash() to not pick the mcast txq.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:49 +0000 (13:54 +0100)]
s390/qeth: allow configuration of TX queues for IQD devices
Similar to the support for z/VM NICs, but we need to take extra care
about the dedicated mcast queue:
1. netdev_pick_tx() is unaware of this limitation and might select the
mcast txq. Catch this.
2. require at least _two_ TX queues - one for ucast, one for mcast.
3. when reducing the number of TX queues, there's a potential race
where netdev_cap_txqueue() over-rules the selected txq index and
falls back to index 0. This would place ucast traffic on the mcast
queue, and result in TX errors.
So for IQD, reject a reduction while the interface is running.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:48 +0000 (13:54 +0100)]
s390/qeth: allow configuration of TX queues for z/VM NICs
Add support for ETHTOOL_SCHANNELS to change the count of active
TX queues.
Since all TX queue structs are pre-allocated and -registered, we just
need to trivially adjust dev->real_num_tx_queues.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:47 +0000 (13:54 +0100)]
s390/qeth: remove prio-queueing support for z/VM NICs
z/VM NICs don't offer HW QoS for TX rings. So just use netdev_pick_tx()
to distribute the connections equally over all enabled TX queues.
We start with just 1 enabled TX queue (this matches the typical
configuration without prio-queueing). A follow-on patch will allow users
to enable additional TX queues.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:46 +0000 (13:54 +0100)]
s390/qeth: use memory reserves in TX slow path
When falling back to an allocation from the HW header cache, check if
the skb is eligible for using memory reserves.
This only makes a difference if the cache is empty and needs to be
refilled.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Mar 2020 12:54:45 +0000 (13:54 +0100)]
s390/qeth: use memory reserves to back RX buffers
Use dev_alloc_page() for backing the RX buffers with pages. This way we
pick up __GFP_MEMALLOC.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 06:51:31 +0000 (23:51 -0700)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey.
2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi.
Follow up patch to clean up this new mode, from Dan Carpenter.
3) Add support for geneve tunnel options, from Xin Long.
4) Make sets built-in and remove modular infrastructure for sets,
from Florian Westphal.
5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing.
6) Statify nft_pipapo_get, from Chen Wandun.
7) Use C99 flexible-array member, from Gustavo A. R. Silva.
8) More descriptive variable names for bitwise, from Jeremy Sowden.
9) Four patches to add tunnel device hardware offload to the flowtable
infrastructure, from wenxu.
10) pipapo set supports for 8-bit grouping, from Stefano Brivio.
11) pipapo can switch between nibble and byte grouping, also from
Stefano.
12) Add AVX2 vectorized version of pipapo, from Stefano Brivio.
13) Update pipapo to be use it for single ranges, from Stefano.
14) Add stateful expression support to elements via control plane,
eg. counter per element.
15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal.
15) Add new egress hook, from Lukas Wunner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 17 Mar 2020 14:53:34 +0000 (15:53 +0100)]
mptcp: move msk state update to subflow_syn_recv_sock()
After commit
58b09919626b ("mptcp: create msk early"), the
msk socket is already available at subflow_syn_recv_sock()
time. Let's move there the state update, to mirror more
closely the first subflow state.
The above will also help multiple subflow supports.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 05:51:16 +0000 (22:51 -0700)]
Merge branch 'net-add-phylink-support-for-PCS'
Russell King says:
====================
net: add phylink support for PCS
This series adds support for IEEE 802.3 register set compliant PCS
for phylink. In order to do this, we:
1. convert BUG_ON() in existing accessors to WARN_ON_ONCE() and return
an error.
2. add accessors for modifying a MDIO device register, and use them in
phylib, rather than duplicating the code from phylib.
3. add support for decoding the advertisement from clause 22 compatible
register sets for clause 37 advertisements and SGMII advertisements.
4. add support for clause 45 register sets for 10GBASE-R PCS.
These have been tested on the LX2160A Clearfog-CX platform.
v2: eliminate use of BUG_ON() in the accessors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Mar 2020 14:52:41 +0000 (14:52 +0000)]
net: phylink: pcs: add 802.3 clause 45 helpers
Implement helpers for PCS accessed via the MII bus using 802.3 clause
45 cycles for 10GBASE-R. Only link up/down is supported, 10G full
duplex is assumed.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Mar 2020 14:52:36 +0000 (14:52 +0000)]
net: phylink: pcs: add 802.3 clause 22 helpers
Implement helpers for PCS accessed via the MII bus using 802.3 clause
22 cycles, conforming to 802.3 clause 37 and Cisco SGMII specifications
for the advertisement word.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Mar 2020 14:52:31 +0000 (14:52 +0000)]
net: mdiobus: add APIs for modifying a MDIO device register
Add APIs for modifying a MDIO device register, similar to the existing
phy_modify() group of functions, but at mdiobus level instead. Adapt
__phy_modify_changed() to use the new mdiobus level helper.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Mar 2020 14:52:26 +0000 (14:52 +0000)]
net: mdiobus: avoid BUG_ON() in mdiobus accessors
Avoid using BUG_ON() in the mdiobus accessors, prefering instead to use
WARN_ON_ONCE() and returning an error.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 05:47:13 +0000 (22:47 -0700)]
Merge branch 'net-bridge-vlan-options-add-support-for-tunnel-mapping'
Nikolay Aleksandrov says:
====================
net: bridge: vlan options: add support for tunnel mapping
In order to bring the new vlan API on par with the old one and be able
to completely migrate to the new one we need to support vlan tunnel mapping
and statistics. This patch-set takes care of the former by making it a
vlan option. There are two notable issues to deal with:
- vlan range to tunnel range mapping
* The tunnel ids are globally unique for the vlan code and a vlan can
be mapped to one tunnel, so the old API took care of ranges by
taking the starting tunnel id value and incrementally mapping
vlan id(i) -> tunnel id(i). This set takes the same approach and
uses one new attribute - BRIDGE_VLANDB_ENTRY_TUNNEL_ID. If used
with a vlan range then it's the starting tunnel id to map.
- tunnel mapping removal
* Since there are no reserved/special tunnel ids defined, we can't
encode mapping removal within the new attribute, in order to be
able to remove a mapping we add a vlan flag which makes the new
tunnel option remove the mapping
The rest is pretty straight-forward, in fact we directly re-use the old
code for manipulating tunnels by just mapping the command (set/del). In
order to be able to keep detecting vlan ranges we check that the current
vlan has a tunnel and it's extending the current vlan range end's tunnel
id.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 17 Mar 2020 12:08:36 +0000 (14:08 +0200)]
net: bridge: vlan options: add support for tunnel mapping set/del
This patch adds support for manipulating vlan/tunnel mappings. The
tunnel ids are globally unique and are one per-vlan. There were two
trickier issues - first in order to support vlan ranges we have to
compute the current tunnel id in the following way:
- base tunnel id (attr) + current vlan id - starting vlan id
This is in line how the old API does vlan/tunnel mapping with ranges. We
already have the vlan range present, so it's redundant to add another
attribute for the tunnel range end. It's simply base tunnel id + vlan
range. And second to support removing mappings we need an out-of-band way
to tell the option manipulating function because there are no
special/reserved tunnel id values, so we use a vlan flag to denote the
operation is tunnel mapping removal.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 17 Mar 2020 12:08:35 +0000 (14:08 +0200)]
net: bridge: vlan options: add support for tunnel id dumping
Add a new option - BRIDGE_VLANDB_ENTRY_TUNNEL_ID which is used to dump
the tunnel id mapping. Since they're unique per vlan they can enter a
vlan range if they're consecutive, thus we can calculate the tunnel id
range map simply as: vlan range end id - vlan range start id. The
starting point is the tunnel id in BRIDGE_VLANDB_ENTRY_TUNNEL_ID. This
is similar to how the tunnel entries can be created in a range via the
old API (a vlan range maps to a tunnel range).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 17 Mar 2020 12:08:34 +0000 (14:08 +0200)]
net: bridge: vlan tunnel: constify bridge and port arguments
The vlan tunnel code changes vlan options, it shouldn't touch port or
bridge options so we can constify the port argument. This would later help
us to re-use these functions from the vlan options code.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 17 Mar 2020 12:08:33 +0000 (14:08 +0200)]
net: bridge: vlan options: rename br_vlan_opts_eq to br_vlan_opts_eq_range
It is more appropriate name as it shows the intent of why we need to
check the options' state. It also allows us to give meaning to the two
arguments of the function: the first is the current vlan (v_curr) being
checked if it could enter the range ending in the second one (range_end).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 04:37:25 +0000 (21:37 -0700)]
Merge branch 'stmmac-100GB-Enterprise-MAC-support'
Jose Abreu says:
====================
net: stmmac: 100GB Enterprise MAC support
Adds the support for Enterprise MAC IP version which allows operating
speeds up to 100GB.
Patch 1/4, adds the support in XPCS for XLGMII interface that is used in
this kind of Enterprise MAC IPs.
Patch 2/4, adds the XLGMII interface support in stmmac.
Patch 3/4, adds the HW specific support for Enterprise MAC.
We end in patch 4/4, by updating stmmac documentation to mention the
support for this new IP version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 17 Mar 2020 09:18:53 +0000 (10:18 +0100)]
Documentation: networking: stmmac: Mention new XLGMAC support
Add the Enterprise MAC support to the list of supported IP versions and
the newly added XLGMII interface support.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 17 Mar 2020 09:18:52 +0000 (10:18 +0100)]
net: stmmac: Add support for Enterprise MAC version
Adds the support for Enterprise MAC IP version which is very similar to
XGMAC. It's so similar that we just need to check the device id and add
new speeds definitions and some minor callbacks.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 17 Mar 2020 09:18:51 +0000 (10:18 +0100)]
net: stmmac: Add XLGMII support
Add XLGMII support for stmmac including the list of speeds and defines
for them.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 17 Mar 2020 09:18:50 +0000 (10:18 +0100)]
net: phy: xpcs: Add XLGMII support
Add XLGMII support for XPCS. This does not include Autoneg feature.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 04:18:25 +0000 (21:18 -0700)]
Merge branch 'ionic-bits-and-bytes'
Shannon Nelson says:
====================
ionic bits and bytes
These are a few little updates to the ionic driver while we are in between
other feature work. While these are mostly Fixes, they are almost all low
priority and needn't be promoted to net. The one higher need is patch 1,
but it is fixing something that hasn't made it out of net-next yet.
v3: allow decode of unknown transciever and use type
codes from sfp.h
v2: add Fixes tags to patches 1-4, and a little
description for patch 5
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 17 Mar 2020 03:22:10 +0000 (20:22 -0700)]
ionic: add decode for IONIC_RC_ENOSUPP
Add decoding for a new firmware error code.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 17 Mar 2020 03:22:09 +0000 (20:22 -0700)]
ionic: print data for unknown xcvr type
If we don't recognize the transceiver type, set the xcvr type
and data length such that ethtool can at least print the first
256 bytes and the reader can figure out why the transceiver
is not recognized.
While we're here, we can update the phy_id type values to use
the enum values in sfp.h.
Fixes: 4d03e00a2140 ("ionic: Add initial ethtool support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 17 Mar 2020 03:22:08 +0000 (20:22 -0700)]
ionic: remove adminq napi instance
Remove the adminq's napi struct when tearing down
the adminq.
Fixes: 1d062b7b6f64 ("ionic: Add basic adminq support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 17 Mar 2020 03:22:07 +0000 (20:22 -0700)]
ionic: deinit rss only if selected
Don't bother de-initing RSS if it wasn't selected.
Fixes: aa3198819bea ("ionic: Add RSS support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 17 Mar 2020 03:22:06 +0000 (20:22 -0700)]
ionic: stop devlink warn on mgmt device
If we don't set a port type, the devlink code will eventually
print a WARN in the kernel log. Because the mgmt device is
not really a useful port, don't register it as a devlink port.
Fixes: b3f064e9746d ("ionic: add support for device id 0x1004")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 04:16:35 +0000 (21:16 -0700)]
Merge branch 'net_sched-allow-use-of-hrtimer-slack'
Eric Dumazet says:
====================
net_sched: allow use of hrtimer slack
Packet schedulers have used hrtimers with exact expiry times.
Some of them can afford having a slack, in order to reduce
the number of timer interrupts and feed bigger batches
to increase efficiency.
FQ for example does not care if throttled packets are
sent with an additional (small) delay.
Original observation of having maybe too many interrupts
was made by Willem de Bruijn.
v2: added strict netlink checking (Jakub Kicinski)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2020 02:12:51 +0000 (19:12 -0700)]
net_sched: sch_fq: enable use of hrtimer slack
Add a new attribute to control the fq qdisc hrtimer slack.
Default is set to 10 usec.
When/if packets are throttled, fq set up an hrtimer that can
lead to one interrupt per packet in the throttled queue.
By using a timer slack, we allow better use of timer interrupts,
by giving them a chance to call multiple timer callbacks
at each hardware interrupt.
Also, giving a slack allows FQ to dequeue batches of packets
instead of a single one, thus increasing xmit_more efficiency.
This has no negative effect on the rate a TCP flow can sustain,
since each TCP flow maintains its own precise vtime (tp->tcp_wstamp_ns)
v2: added strict netlink checking (as feedback from Jakub Kicinski)
Tested:
1000 concurrent flows all using paced packets.
1,000,000 packets sent per second.
Before the patch :
$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0
60726784 23628
3485992 0 0 138 1 977 535 0 12 87 0 0
0 0 0
60714700 23628
3485628 0 0 0 0
1568827 26462 0 22 78 0 0
1 0 0
60716012 23628
3485656 0 0 0 0
1570034 26216 0 22 78 0 0
0 0 0
60722420 23628
3485492 0 0 0 0
1567230 26424 0 22 78 0 0
0 0 0
60727484 23628
3485556 0 0 0 0
1568220 26200 0 22 78 0 0
2 0 0
60718900 23628
3485380 0 0 0 40
1564721 26630 0 22 78 0 0
2 0 0
60718096 23628
3485332 0 0 0 0
1562593 26432 0 22 78 0 0
0 0 0
60719608 23628
3485064 0 0 0 0
1563806 26238 0 22 78 0 0
1 0 0
60722876 23628
3485236 0 0 0 130
1565874 26566 0 22 78 0 0
1 0 0
60722752 23628
3484908 0 0 0 0
1567646 26247 0 22 78 0 0
After the patch, slack of 10 usec, we can see a reduction of interrupts
per second, and a small decrease of reported cpu usage.
$ vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0
60722564 23628
3484728 0 0 133 1 696 545 0 13 87 0 0
1 0 0
60722568 23628
3484824 0 0 0 0 977278 25469 0 20 80 0 0
0 0 0
60716396 23628
3484764 0 0 0 0 979997 25326 0 20 80 0 0
0 0 0
60713844 23628
3484960 0 0 0 0 981394 25249 0 20 80 0 0
2 0 0
60720468 23628
3484916 0 0 0 0 982860 25062 0 20 80 0 0
1 0 0
60721236 23628
3484856 0 0 0 0 982867 25100 0 20 80 0 0
1 0 0
60722400 23628
3484456 0 0 0 8 982698 25303 0 20 80 0 0
0 0 0
60715396 23628
3484428 0 0 0 0 981777 25176 0 20 80 0 0
0 0 0
60716520 23628
3486544 0 0 0 36 978965 27857 0 21 79 0 0
0 0 0
60719592 23628
3486516 0 0 0 22 977318 25106 0 20 80 0 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2020 02:12:50 +0000 (19:12 -0700)]
net_sched: do not reprogram a timer about to expire
qdisc_watchdog_schedule_range_ns() can use the newly added slack
and avoid rearming the hrtimer a bit earlier than the current
value. This patch has no effect if delta_ns parameter
is zero.
Note that this means the max slack is potentially doubled.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 17 Mar 2020 02:12:49 +0000 (19:12 -0700)]
net_sched: add qdisc_watchdog_schedule_range_ns()
Some packet schedulers might want to add a slack
when programming hrtimers. This can reduce number
of interrupts and increase batch sizes and thus
give good xmit_more savings.
This commit adds qdisc_watchdog_schedule_range_ns()
helper, with an extra delta_ns parameter.
Legacy qdisc_watchdog_schedule_n() becomes an inline
passing a zero slack.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 04:12:40 +0000 (21:12 -0700)]
Merge branch 'nfp-type'
Jakub Kicinski says:
====================
net: rename flow_action stats and set NFP type
Jiri, I hope this is okay with you, I just dropped the "type" from
the helper and value names, and now things should be able to fit
on a line, within 80 characters.
Second patch makes the NFP able to offload DELAYED stats, which
is the type it supports.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 17 Mar 2020 01:42:12 +0000 (18:42 -0700)]
nfp: allow explicitly selected delayed stats
NFP flower offload uses delayed stats. Kernel recently gained
the ability to specify stats types. Make nfp accept DELAYED
stats, not just the catch all "any".
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 17 Mar 2020 01:42:11 +0000 (18:42 -0700)]
net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*
flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.
Remove the "types" part from the new flow_action helpers
and enum values.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 03:58:22 +0000 (20:58 -0700)]
Merge branch 'net-phy-improve-phy_driver-callback-handle_interrupt'
Heiner Kallweit says:
====================
net: phy: improve phy_driver callback handle_interrupt
did_interrupt() clears the interrupt, therefore handle_interrupt() can
not check which event triggered the interrupt. To overcome this
constraint and allow more flexibility for customer interrupt handlers,
let's decouple handle_interrupt() from parts of the phylib interrupt
handling. Custom interrupt handlers now have to implement the
did_interrupt() functionality in handle_interrupt() if needed.
Fortunately we have just one custom interrupt handler so far (in the
mscc PHY driver), convert it to the changed API and make use of the
benefits.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 16 Mar 2020 21:33:31 +0000 (22:33 +0100)]
net: phy: mscc: consider interrupt source in interrupt handler
Trigger the respective interrupt handler functionality only if the
related interrupt source bit is set.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 16 Mar 2020 21:32:33 +0000 (22:32 +0100)]
net: phy: improve phy_driver callback handle_interrupt
did_interrupt() clears the interrupt, therefore handle_interrupt() can
not check which event triggered the interrupt. To overcome this
constraint and allow more flexibility for customer interrupt handlers,
let's decouple handle_interrupt() from parts of the phylib interrupt
handling. Custom interrupt handlers now have to implement the
did_interrupt() functionality in handle_interrupt() if needed.
Fortunately we have just one custom interrupt handler so far (in the
mscc PHY driver), convert it to the changed API.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Mar 2020 03:56:58 +0000 (20:56 -0700)]
Merge branch 'ethtool-consolidate-irq-coalescing-last-part'
Jakub Kicinski says:
====================
ethtool: consolidate irq coalescing - last part
Convert remaining drivers following the groundwork laid in a recent
patch set [1] and continued in [2], [3], [4], [5]. The aim of
the effort is to consolidate irq coalescing parameter validation
in the core.
This set is the sixth and last installment. It converts the remaining
8 drivers in drivers/net/ethernet. The last patch makes declaring
supported IRQ coalescing parameters a requirement.
[1] https://lore.kernel.org/netdev/
20200305051542.991898-1-kuba@kernel.org/
[2] https://lore.kernel.org/netdev/
20200306010602.
1620354-1-kuba@kernel.org/
[3] https://lore.kernel.org/netdev/
20200310021512.
1861626-1-kuba@kernel.org/
[4] https://lore.kernel.org/netdev/
20200311223302.
2171564-1-kuba@kernel.org/
[5] https://lore.kernel.org/netdev/
20200313040803.
2367590-1-kuba@kernel.org/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:12 +0000 (13:47 -0700)]
net: ethtool: require drivers to set supported_coalesce_params
Now that all in-tree drivers have been updated we can
make the supported_coalesce_params mandatory.
To save debugging time in case some driver was missed
(or is out of tree) add a warning when netdev is registered
with set_coalesce but without supported_coalesce_params.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:11 +0000 (13:47 -0700)]
net: axienet: let core reject the unsupported coalescing parameters
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver already correctly rejected all unsupported
parameters. No functional changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:10 +0000 (13:47 -0700)]
net: ll_temac: let core reject the unsupported coalescing parameters
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver already correctly rejected all unsupported
parameters. No functional changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:09 +0000 (13:47 -0700)]
net: davinci_emac: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:08 +0000 (13:47 -0700)]
net: cpsw: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:07 +0000 (13:47 -0700)]
net: tehuti: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:06 +0000 (13:47 -0700)]
net: dwc-xlgmac: let core reject the unsupported coalescing parameters
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver already correctly rejected all unsupported
parameters.
While at it remove unnecessary zeroing on get.
No functional changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:05 +0000 (13:47 -0700)]
net: socionext: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 16 Mar 2020 20:47:04 +0000 (13:47 -0700)]
net: sfc: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.
This driver did not previously reject unsupported parameters.
The check for use_adaptive_tx_coalesce will now be done by
the core.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Wunner [Wed, 11 Mar 2020 11:59:03 +0000 (12:59 +0100)]
netfilter: Introduce egress hook
Commit
e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key") introduced the ability to
classify packets on ingress.
Allow the same on egress. Position the hook immediately before a packet
is handed to tc and then sent out on an interface, thereby mirroring the
ingress order. This order allows marking packets in the netfilter
egress hook and subsequently using the mark in tc. Another benefit of
this order is consistency with a lot of existing documentation which
says that egress tc is performed after netfilter hooks.
Egress hooks already exist for the most common protocols, such as
NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
they are executed earlier during packet processing. However for more
exotic protocols, there is currently no provision to apply netfilter on
egress. A common workaround is to enslave the interface to a bridge and
use ebtables, or to resort to tc. But when the ingress hook was
introduced, consensus was that users should be given the choice to use
netfilter or tc, whichever tool suits their needs best:
https://lore.kernel.org/netdev/
20150430153317.GA3230@salvia/
This hook is also useful for NAT46/NAT64, tunneling and filtering of
locally generated af_packet traffic such as dhclient.
There have also been occasional user requests for a netfilter egress
hook in the past, e.g.:
https://www.spinics.net/lists/netfilter/msg50038.html
Performance measurements with pktgen surprisingly show a speedup rather
than a slowdown with this commit:
* Without this commit:
Result: OK:
34240933(
c34238375+d2558) usec,
100000000 (60byte,0frags)
2920481pps 1401Mb/sec (1401830880bps) errors: 0
* With this commit:
Result: OK:
33997299(
c33994193+d3106) usec,
100000000 (60byte,0frags)
2941410pps 1411Mb/sec (1411876800bps) errors: 0
* Without this commit + tc egress:
Result: OK:
39022386(
c39019547+d2839) usec,
100000000 (60byte,0frags)
2562631pps 1230Mb/sec (1230062880bps) errors: 0
* With this commit + tc egress:
Result: OK:
37604447(
c37601877+d2570) usec,
100000000 (60byte,0frags)
2659259pps 1276Mb/sec (1276444320bps) errors: 0
* With this commit + nft egress:
Result: OK:
41436689(
c41434088+d2600) usec,
100000000 (60byte,0frags)
2413320pps 1158Mb/sec (1158393600bps) errors: 0
Tested on a bare-metal Core i7-3615QM, each measurement was performed
three times to verify that the numbers are stable.
Commands to perform a measurement:
modprobe pktgen
echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n
100000000
Commands for testing tc egress:
tc qdisc add dev lo clsact
tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32
Commands for testing nft egress:
nft add table netdev t
nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
nft add rule netdev t co ip daddr 4.3.2.1/32 drop
All testing was performed on the loopback interface to avoid distorting
measurements by the packet handling in the low-level Ethernet driver.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lukas Wunner [Wed, 11 Mar 2020 11:59:02 +0000 (12:59 +0100)]
netfilter: Generalize ingress hook
Prepare for addition of a netfilter egress hook by generalizing the
ingress hook introduced by commit
e687ad60af09 ("netfilter: add
netfilter ingress hook after handle_ing() under unique static key").
In particular, rename and refactor the ingress hook's static inlines
such that they can be reused for an egress hook.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lukas Wunner [Wed, 11 Mar 2020 11:59:01 +0000 (12:59 +0100)]
netfilter: Rename ingress hook include file
Prepare for addition of a netfilter egress hook by renaming
<linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>.
The egress hook also necessitates a refactoring of the include file,
but that is done in a separate commit to ease reviewing.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Tue, 17 Mar 2020 01:26:55 +0000 (18:26 -0700)]
Merge branch 'tcp-fix-stretch-ACK-bugs-in-congestion-control-modules'
Pengcheng Yang says:
====================
tcp: fix stretch ACK bugs in congestion control modules
"stretch ACKs" (caused by LRO, GRO, delayed ACKs or middleboxes)
can cause serious performance shortfalls in common congestion
control algorithms. Neal Cardwell submitted a series of patches
starting with commit
e73ebb0881ea ("tcp: stretch ACK fixes prep")
to handle stretch ACKs and fixed stretch ACK bugs in Reno and
CUBIC congestion control algorithms.
This patch series continues to fix bic, scalable, veno and yeah
congestion control algorithms to handle stretch ACKs.
Changes in v2:
- Provide [PATCH 0/N] to describe the modifications of this patch series
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Mon, 16 Mar 2020 06:35:11 +0000 (14:35 +0800)]
tcp: fix stretch ACK bugs in Yeah
Change Yeah to properly handle stretch ACKs in additive
increase mode by passing in the count of ACKed packets
to tcp_cong_avoid_ai().
In addition, we re-implemented the scalable path using
tcp_cong_avoid_ai() and removed the pkts_acked variable.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Mon, 16 Mar 2020 06:35:10 +0000 (14:35 +0800)]
tcp: fix stretch ACK bugs in Veno
Change Veno to properly handle stretch ACKs in additive
increase mode by passing in the count of ACKed packets
to tcp_cong_avoid_ai().
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Mon, 16 Mar 2020 06:35:09 +0000 (14:35 +0800)]
tcp: stretch ACK fixes in Veno prep
No code logic has been changed in this patch.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Mon, 16 Mar 2020 06:35:08 +0000 (14:35 +0800)]
tcp: fix stretch ACK bugs in Scalable
Change Scalable to properly handle stretch ACKs in additive
increase mode by passing in the count of ACKed packets to
tcp_cong_avoid_ai().
In addition, because we are now precisely accounting for
stretch ACKs, including delayed ACKs, we can now change
TCP_SCALABLE_AI_CNT to 100.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pengcheng Yang [Mon, 16 Mar 2020 06:35:07 +0000 (14:35 +0800)]
tcp: fix stretch ACK bugs in BIC
Changes BIC to properly handle stretch ACKs in additive
increase mode by passing in the count of ACKed packets
to tcp_cong_avoid_ai().
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer [Fri, 13 Mar 2020 13:25:19 +0000 (14:25 +0100)]
sfc: fix XDP-redirect in this driver
XDP-redirect is broken in this driver sfc. XDP_REDIRECT requires
tailroom for skb_shared_info when creating an SKB based on the
redirected xdp_frame (both in cpumap and veth).
The fix requires some initial explaining. The driver uses RX page-split
when possible. It reserves the top 64 bytes in the RX-page for storing
dma_addr (struct efx_rx_page_state). It also have the XDP recommended
headroom of XDP_PACKET_HEADROOM (256 bytes). As it doesn't reserve any
tailroom, it can still fit two standard MTU (1500) frames into one page.
The sizeof struct skb_shared_info in 320 bytes. Thus drivers like ixgbe
and i40e, reduce their XDP headroom to 192 bytes, which allows them to
fit two frames with max 1536 bytes into a 4K page (192+1536+320=2048).
The fix is to reduce this drivers headroom to 128 bytes and add the 320
bytes tailroom. This account for reserved top 64 bytes in the page, and
still fit two frame in a page for normal MTUs.
We must never go below 128 bytes of headroom for XDP, as one cacheline
is for xdp_frame area and next cacheline is reserved for metadata area.
Fixes: eb9a36be7f3e ("sfc: perform XDP processing on received packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Elder [Mon, 16 Mar 2020 22:51:21 +0000 (17:51 -0500)]
remoteproc: clean up notification config
Rearrange the config files for remoteproc and IPA to fix their
interdependencies.
First, have CONFIG_QCOM_Q6V5_MSS select QCOM_Q6V5_IPA_NOTIFY so the
notification code is built regardless of whether IPA needs it.
Next, represent QCOM_IPA as being dependent on QCOM_Q6V5_MSS rather
than setting its value to match QCOM_Q6V5_COMMON (which is selected
by QCOM_Q6V5_MSS).
Drop all dependencies from QCOM_Q6V5_IPA_NOTIFY. The notification
code will be built whenever QCOM_Q6V5_MSS is set, and it has no other
dependencies.
Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madhuparna Bhowmik [Mon, 16 Mar 2020 17:13:52 +0000 (22:43 +0530)]
net: kcm: kcmproc.c: Fix RCU list suspicious usage warning
This path fixes the suspicious RCU usage warning reported by
kernel test robot.
net/kcm/kcmproc.c:#RCU-list_traversed_in_non-reader_section
There is no need to use list_for_each_entry_rcu() in
kcm_stats_seq_show() as the list is always traversed under
knet->mutex held.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Zengkai [Mon, 16 Mar 2020 13:05:24 +0000 (21:05 +0800)]
qede: remove some unused code in function qede_selftest_receive_traffic
Remove set but not used variables 'sw_comp_cons' and 'hw_comp_cons'
to fix gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic:
drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:20:
warning: variable sw_comp_cons set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/qlogic/qede/qede_ethtool.c: In function qede_selftest_receive_traffic:
drivers/net/ethernet/qlogic/qede/qede_ethtool.c:1569:6:
warning: variable hw_comp_cons set but not used [-Wunused-but-set-variable]
After removing 'hw_comp_cons',the memory barrier 'rmb()' and its comments become useless,
so remove them as well.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 16 Mar 2020 08:03:25 +0000 (09:03 +0100)]
net: sched: set the hw_stats_type in pedit loop
For a single pedit action, multiple offload entries may be used. Set the
hw_stats_type to all of them.
Fixes: 44f865801741 ("sched: act: allow user to specify type of HW stats for a filter")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Mar 2020 09:10:10 +0000 (02:10 -0700)]
Merge branch 'net-stmmac-Use-readl_poll_timeout-to-simplify-the-code'
Dejin Zheng says:
====================
net: stmmac: Use readl_poll_timeout() to simplify the code
This patch sets just for replace the open-coded loop to the
readl_poll_timeout() helper macro for simplify the code in
stmmac driver.
v2 -> v3:
- return whatever error code by readl_poll_timeout() returned.
v1 -> v2:
- no changed. I am a newbie and sent this patch a month
ago (February 6th). So far, I have not received any comments or
suggestion. I think it may be lost somewhere in the world, so
resend it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Mon, 16 Mar 2020 02:32:54 +0000 (10:32 +0800)]
net: stmmac: use readl_poll_timeout() function in dwmac4_dma_reset()
The dwmac4_dma_reset() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Mon, 16 Mar 2020 02:32:53 +0000 (10:32 +0800)]
net: stmmac: use readl_poll_timeout() function in init_systime()
The init_systime() function use an open coded of readl_poll_timeout().
Replace the open coded handling with the proper function.
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sat, 14 Mar 2020 10:51:20 +0000 (18:51 +0800)]
chcr: remove set but not used variable 'status'
drivers/crypto/chelsio/chcr_ktls.c: In function chcr_ktls_cpl_set_tcb_rpl:
drivers/crypto/chelsio/chcr_ktls.c:662:11: warning:
variable status set but not used [-Wunused-but-set-variable]
commit
8a30923e1598 ("cxgb4/chcr: Save tx keys and handle HW response")
involved this unused variable, remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Era Mayflower [Mon, 9 Mar 2020 19:47:02 +0000 (19:47 +0000)]
macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)
Netlink support of extended packet number cipher suites,
allows adding and updating XPN macsec interfaces.
Added support in:
* Creating interfaces with GCM-AES-XPN-128 and GCM-AES-XPN-256 suites.
* Setting and getting 64bit packet numbers with of SAs.
* Setting (only on SA creation) and getting ssci of SAs.
* Setting salt when installing a SAK.
Added 2 cipher suite identifiers according to 802.1AE-2018 table 14-1:
* MACSEC_CIPHER_ID_GCM_AES_XPN_128
* MACSEC_CIPHER_ID_GCM_AES_XPN_256
In addition, added 2 new netlink attribute types:
* MACSEC_SA_ATTR_SSCI
* MACSEC_SA_ATTR_SALT
Depends on: macsec: Support XPN frame handling - IEEE 802.1AEbw.
Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Era Mayflower [Mon, 9 Mar 2020 19:47:01 +0000 (19:47 +0000)]
macsec: Support XPN frame handling - IEEE 802.1AEbw
Support extended packet number cipher suites (802.1AEbw) frames handling.
This does not include the needed netlink patches.
* Added xpn boolean field to `struct macsec_secy`.
* Added ssci field to `struct_macsec_tx_sa` (802.1AE figure 10-5).
* Added ssci field to `struct_macsec_rx_sa` (802.1AE figure 10-5).
* Added salt field to `struct macsec_key` (802.1AE 10.7 NOTE 1).
* Created pn_t type for easy access to lower and upper halves.
* Created salt_t type for easy access to the "ssci" and "pn" parts.
* Created `macsec_fill_iv_xpn` function to create IV in XPN mode.
* Support in PN recovery and preliminary replay check in XPN mode.
In addition, according to IEEE 802.1AEbw figure 10-5, the PN of incoming
frame can be 0 when XPN cipher suite is used, so fixed the function
`macsec_validate_skb` to fail on PN=0 only if XPN is off.
Signed-off-by: Era Mayflower <mayflowerera@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Mar 2020 00:11:13 +0000 (17:11 -0700)]
Merge branch 'net-dsa-improve-serdes-integration'
Russell King says:
====================
net: dsa: improve serdes integration
Depends on "net: mii clause 37 helpers".
Andrew Lunn mentioned that the Serdes PCS found in Marvell DSA switches
does not automatically update the switch MACs with the link parameters.
Currently, the DSA code implements a work-around for this.
This series improves the Serdes integration, making use of the recent
phylink changes to support split MAC/PCS setups. One noticable
improvement for userspace is that ethtool can now report the link
partner's advertisement.
This repost has no changes compared to the previous posting; however,
the regression Andrew had found which exists even without this patch
set has now been fixed by Andrew and merged into the net-next tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:16:03 +0000 (10:16 +0000)]
net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down
Use the status of the PHY_DETECT bit to determine whether we need to
force the MAC settings in mac_link_up() and mac_link_down().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:58 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: remove port_link_state functions
The port_link_state method is only used by mv88e6xxx_port_setup_mac(),
which is now only called during port setup, rather than also being
called via phylink's mac_config method.
Remove this now unnecessary optimisation, which allows us to remove the
port_link_state methods as well.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:53 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: combine port_set_speed and port_set_duplex
Setting the speed independently of duplex makes little sense; the two
parameters result from negotiation or fixed setup, and may have inter-
dependencies. Moreover, they are always controlled via the same
register - having them split means we have to read-modify-write this
register twice.
Combine the two operations into a single port_set_speed_duplex()
operation. Not only is this more efficient, it reduces the size of the
code as well.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:48 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: fix Serdes link changes
phylink_mac_change() is supposed to be called with a 'false' argument
if the link has gone down since it was last reported up; this is to
ensure that link events along with renegotiation events are always
correctly reported to userspace.
Read the BMSR once when we have an interrupt, and report the link
latched status to phylink via phylink_mac_change(). phylink will deal
automatically with re-reading the link state once it has processed the
link-down event.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:43 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: extend phylink to Serdes PHYs
Extend the mv88e6xxx phylink implementation down to Serdes PHYs, which
handle the PCS layer of such links.
- Implement phylink PCS link state reading, so that we can provide
ethtool with the linkmodes and link speed in the expected manner.
Note: this will only be called for in-band negotiation, which is
only supported by the serdes interfaces.
- Implement phylink PCS configuration, so that the in-band AN and
advertisement can be configured.
- Implement phylink PCS negotiation restart, so that the in-band AN
can be restarted.
- Implement phylink PCS link up, so that when operating out-of-band,
the Serdes can be configured for the appropriate fixed speed mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:38 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: configure interface settings in mac_config
Only configure the interface settings in mac_config(), leaving the
speed and duplex settings to mac_link_up to deal with.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:33 +0000 (10:15 +0000)]
net: dsa: mv88e6xxx: use BMCR definitions for serdes control register
The SGMII/1000base-X serdes register set is a clause 22 register set
offset at 0x2000 in the PHYXS device. Rather than inventing our own
defintions, use those that already exist, and name the register
MV88E6390_SGMII_BMCR. Also remove the unused MV88E6390_SGMII_STATUS
definitions.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:15:28 +0000 (10:15 +0000)]
net: dsa: warn if phylink_mac_link_state returns error
Issue a warning to the kernel log if phylink_mac_link_state() returns
an error. This should not occur, but let's make it visible.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 16 Mar 2020 00:10:14 +0000 (17:10 -0700)]
Merge branch 'net-mii-clause-37-helpers'
Russell King says:
====================
net: mii clause 37 helpers
This is a re-post of two patches that are common to two series that
I've sent in recent weeks; I'm re-posting them separately in the hope
that they can be merged. No changes from either of the previous
postings.
These patches:
1. convert the existing (unused) mii_lpa_to_ethtool_lpa_x() function
to a linkmode variant.
2. add a helper for clause 37 advertisements, supporting both the
1000baseX and defacto 2500baseX variants. Note that ethtool does
not support half duplex for either of these, and we make no effort
to do so.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:09:58 +0000 (10:09 +0000)]
net: mii: add linkmode_adv_to_mii_adv_x()
Add a helper to convert a linkmode advertisement to a clause 37
advertisement value for 1000base-x and 2500base-x.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Sat, 14 Mar 2020 10:09:53 +0000 (10:09 +0000)]
net: mii: convert mii_lpa_to_ethtool_lpa_x() to linkmode variant
Add a LPA to linkmode decoder for 1000BASE-X protocols; this decoder
only provides the modify semantics similar to other such decoders.
This replaces the unused mii_lpa_to_ethtool_lpa_x() helper.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 11 Mar 2020 19:52:01 +0000 (20:52 +0100)]
netfilter: conntrack: re-visit sysctls in unprivileged namespaces
since commit
b884fa46177659 ("netfilter: conntrack: unify sysctl handling")
conntrack no longer exposes most of its sysctls (e.g. tcp timeouts
settings) to network namespaces that are not owned by the initial user
namespace.
This patch exposes all sysctls even if the namespace is unpriviliged.
compared to a 4.19 kernel, the newly visible and writeable sysctls are:
net.netfilter.nf_conntrack_acct
net.netfilter.nf_conntrack_timestamp
.. to allow to enable accouting and timestamp extensions.
net.netfilter.nf_conntrack_events
.. to turn off conntrack event notifications.
net.netfilter.nf_conntrack_checksum
.. to disable checksum validation.
net.netfilter.nf_conntrack_log_invalid
.. to enable logging of packets deemed invalid by conntrack.
newly visible sysctls that are only exported as read-only:
net.netfilter.nf_conntrack_count
.. current number of conntrack entries living in this netns.
net.netfilter.nf_conntrack_max
.. global upperlimit (maximum size of the table).
net.netfilter.nf_conntrack_buckets
.. size of the conntrack table (hash buckets).
net.netfilter.nf_conntrack_expect_max
.. maximum number of permitted expectations in this netns.
net.netfilter.nf_conntrack_helper
.. conntrack helper auto assignment.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>