Tariq Toukan [Fri, 5 Jul 2019 15:30:11 +0000 (18:30 +0300)]
net/mlx5: Accel, Expose accel wrapper for IPsec FPGA function
Do not directly call fpga version of IPsec function from main.c.
Wrap it by an accel version, and call the wrapper.
This will allow deprecating the FPGA IPsec stubs in downstream
patch.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Jul 2019 23:24:27 +0000 (16:24 -0700)]
Merge tag 'mlx5-updates-2019-07-04-v2' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-update-2019-07-04
This series adds mlx5 support for devlink fw versions query.
1) Implement the required low level firmware commands
2) Implement the devlink knobs and callbacks for fw versions query.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastian Andrzej Siewior [Thu, 4 Jul 2019 15:38:02 +0000 (17:38 +0200)]
nfp: Use spinlock_t instead of struct spinlock
For spinlocks the type spinlock_t should be used instead of "struct
spinlock".
Use spinlock_t for spinlock's definition.
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: oss-drivers@netronome.com
Cc: netdev@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilias Apalodimas [Thu, 4 Jul 2019 14:11:09 +0000 (17:11 +0300)]
net: netsec: Sync dma for device on buffer allocation
Quoting Arnd,
We have to do a sync_single_for_device /somewhere/ before the
buffer is given to the device. On a non-cache-coherent machine with
a write-back cache, there may be dirty cache lines that get written back
after the device DMA's data into it (e.g. from a previous memset
from before the buffer got freed), so you absolutely need to flush any
dirty cache lines on it first.
Since the coherency is configurable in this device make sure we cover
all configurations by explicitly syncing the allocated buffer for the
device before refilling it's descriptors
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Jul 2019 22:39:39 +0000 (15:39 -0700)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: some cleanups & bugfixes
This patch-set includes cleanups and bugfixes for
the HNS3 ethernet controller driver.
[patch 1/9] fixes VF's broadcast promisc mode not enabled after
initializing.
[patch 2/9] adds hints for fibre port not support flow control.
[patch 3/9] fixes a port capbility updating issue.
[patch 4/9 - 9/9] adds some cleanups for HNS3 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 4 Jul 2019 14:04:28 +0000 (22:04 +0800)]
net: hns3: set maximum length to resp_data_len for exceptional case
If HCLGE_MBX_MAX_RESP_DATA_SIZE > HCLGE_MBX_MAX_RESP_DATA_SIZE,
the memcpy will cause out of memory. So this patch just set
resp_data_len to the maximum length for this case.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Thu, 4 Jul 2019 14:04:27 +0000 (22:04 +0800)]
net: hns3: bitwise operator should use unsigned type
There are some bitwise operator used signed type, this patch fixes
them with unsigned type.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 4 Jul 2019 14:04:26 +0000 (22:04 +0800)]
net: hns3: add default value for tc_size and tc_offset
This patch adds default value for tc_size and tc_offset, or it may
get random value and used later.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Thu, 4 Jul 2019 14:04:25 +0000 (22:04 +0800)]
net: hns3: check msg_data before memcpy in hclgevf_send_mbx_msg
The value of msg_data may be NULL in some cases, which will cause
errors reported by some compiler.
So this patch adds a check to fix it.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 4 Jul 2019 14:04:24 +0000 (22:04 +0800)]
net: hns3: set default value for param "type" in hclgevf_bind_ring_to_vector
The value of param type is always not changed in
hclgevf_bind_ring_to_vector, move the assignment to
front of "for {}" can reduce the redundant assignment.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 4 Jul 2019 14:04:23 +0000 (22:04 +0800)]
net: hns3: add all IMP return code
Currently, the HNS3 driver just defines part of IMP return code,
This patch supplements all the remaining IMP return code, and adds
a function to convert this code to the error number.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Thu, 4 Jul 2019 14:04:22 +0000 (22:04 +0800)]
net: hns3: fix port capbility updating issue
Currently, the driver queries the media port information, and
updates the port capability periodically. But it sets an error
mac->speed_type value, which stops update port capability.
Fixes: 88d10bd6f730 ("net: hns3: add support for multiple media type")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Thu, 4 Jul 2019 14:04:21 +0000 (22:04 +0800)]
net: hns3: fix flow control configure issue for fibre port
Flow control autoneg is unsupported for fibre port. It takes no
effect for flow control when restart autoneg. This patch fixes
it, return -EOPNOTSUPP when user tries to enable flow control
autoneg.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Thu, 4 Jul 2019 14:04:20 +0000 (22:04 +0800)]
net: hns3: enable broadcast promisc mode when initializing VF
For revision 0x20, the broadcast promisc is enabled by firmware,
it's unnecessary to enable it when initializing VF.
For revision 0x21, it's necessary to enable broadcast promisc mode
when initializing or re-initializing VF, otherwise, it will be
unable to send and receive promisc packets.
Fixes: f01f5559cac8 ("net: hns3: don't allow vf to enable promisc mode")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 4 Jul 2019 09:03:26 +0000 (17:03 +0800)]
net: remove unused parameter from skb_checksum_try_convert
the check parameter is never used
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Jul 2019 22:28:57 +0000 (15:28 -0700)]
Merge branch 'mlxsw-Enable-disable-PTP-shapers'
Ido Schimmel says:
====================
mlxsw: Enable/disable PTP shapers
Shalom says:
In order to get more accurate hardware time stamping in Spectrum-1, the
driver needs to apply a shaper on the port for speeds lower than 40Gbps.
This shaper is called a PTP shaper and it is applied on hierarchy 0,
which is the port hierarchy. This shaper may affect the shaper rates of
all hierarchies.
This patchset adds the ability to enable or disable the PTP shaper on
the port in two scenarios:
1. When the user wants to enable/disable the hardware time stamping
2. When the port is brought up or down (including port speed change)
Patch #1 adds the QEEC.ptps field that is used for enabling or disabling
the PTP shaper on a port.
Patch #2 adds a note about disabling the PTP shaper when calling to
mlxsw_sp_port_ets_maxrate_set().
Patch #3 adds the QPSC register that is responsible for configuring the
PTP shaper parameters per speed.
Patch #4 sets the PTP shaper parameters during the ptp_init().
Patch #5 adds new operation for getting the port's speed.
Patch #6 enables/disables the PTP shaper when turning on or off the
hardware time stamping.
Patch #7 enables/disables the PTP shaper when the port's status has
changed (including port speed change).
Patch #8 applies the PTP shaper enable/disable logic by filling the PTP
shaper parameters array.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:40 +0000 (10:07 +0300)]
mlxsw: spectrum_ptp: Apply the PTP shaper enable/disable logic
Apply by filling the PTP shaper parameters array.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:39 +0000 (10:07 +0300)]
mlxsw: spectrum: Set up PTP shaper when port status has changed
When getting port up down event (PUDE), change the PTP shaper
configuration based on hardware time stamping on/off and the port's
speed.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:38 +0000 (10:07 +0300)]
mlxsw: spectrum_ptp: Enable/disable PTP shaper on a port when getting HWTSTAMP on/off
In order to get more accurate hardware time stamping, the driver needs to
enable PTP shaper on the port, for speeds lower than 40 Gbps.
Enable the PTP shaper on the port when the user turns on the hardware
time stamping, and disable it when the user turns off the hardware time
stamping.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:37 +0000 (10:07 +0300)]
mlxsw: spectrum: Add new operation for getting the port's speed
New operation for getting the port's speed as part of port-type-speed
operations.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:36 +0000 (10:07 +0300)]
mlxsw: spectrum_ptp: Set the PTP shaper parameters
Set the PTP shaper parameters during the ptp_init(). For different
speeds, there are different parameters.
When the port's speed changes and PTP shaper is enabled, the firmware
changes the ETS shaper values according to the PTP shaper parameters for
this new speed.
The PTP shaper parameters array is left empty for now, will be filled in
a follow-up patch.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:35 +0000 (10:07 +0300)]
mlxsw: reg: Add QoS PTP Shaper Configuration Register
The QPSC allows advanced configuration of the PTP shapers.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:34 +0000 (10:07 +0300)]
mlxsw: spectrum: Add note about the PTP shaper
Add note about disabling the PTP shaper when calling to
mlxsw_sp_port_ets_maxrate_set().
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 4 Jul 2019 07:07:33 +0000 (10:07 +0300)]
mlxsw: reg: Add ptps field in QoS ETS Element Configuration Register
The PTP Shaper field is used for enabling and disabling of port-rate based
shaper which is slightly lower than port rate.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 4 Jul 2019 03:37:45 +0000 (03:37 +0000)]
net: socionext: remove set but not used variable 'pkts'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_clean_tx_dring':
drivers/net/ethernet/socionext/netsec.c:637:15: warning:
variable 'pkts' set but not used [-Wunused-but-set-variable]
It is not used since commit
ba2b232108d3 ("net: netsec: add XDP support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Thu, 4 Jul 2019 02:59:06 +0000 (08:29 +0530)]
net: ethernet: allwinner: Remove unneeded memset
Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
__GFP_ZERO flag
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuqian Huang [Thu, 4 Jul 2019 02:36:40 +0000 (10:36 +0800)]
net/ethernet: using dev_get_drvdata directly
Several drivers cast a struct device pointer to a struct
platform_device pointer only to then call platform_get_drvdata().
To improve readability, these constructs can be simplified
by using dev_get_drvdata() directly.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuqian Huang [Thu, 4 Jul 2019 02:36:33 +0000 (10:36 +0800)]
net/can: using dev_get_drvdata directly
Several drivers cast a struct device pointer to a struct
platform_device pointer only to then call platform_get_drvdata().
To improve readability, these constructs can be simplified
by using dev_get_drvdata() directly.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Jul 2019 22:01:15 +0000 (15:01 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2019-07-05
1) A lot of work to remove indirections from the xfrm code.
From Florian Westphal.
2) Fix a WARN_ON with ipv6 that triggered because of a
forgotten break statement. From Florian Westphal.
3) Remove xfrmi_init_net, it is not needed.
From Li RongQing.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shay Agroskin [Tue, 2 Jul 2019 23:55:11 +0000 (23:55 +0000)]
net/mlx5: Added devlink info callback
The callback is invoked using 'devlink dev info <pci>' command and returns
the running and pending firmware version of the HCA and the name of the
kernel driver.
If there is a pending firmware version (a new version is burned but the
HCA still runs with the previous) it is returned as the stored
firmware version. Otherwise, the running version is returned for this
field.
Output example:
$ devlink dev info pci/0000:00:06.0
pci/0000:00:06.0:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000009
running:
fw.version 16.26.0100
stored:
fw.version 16.26.0100
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shay Agroskin [Tue, 2 Jul 2019 23:55:09 +0000 (23:55 +0000)]
net/mlx5: Added fw version query command
Using the MCQI and MCQS registers, we query the running and pending
fw version of the HCA.
The MCQS is queried with sequentially increasing component index, until
a component of type BOOT_IMG is found. Querying this component's version
using the MCQI register yields the running and pending fw version of the
HCA.
Querying MCQI for the pending fw version should be done only after
validating that such fw version exists. This is done my checking
'component update state' field in MCQS output.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 4 Jul 2019 20:40:32 +0000 (16:40 -0400)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux
Misc updates from mlx5-next branch:
1) Add the required HW definitions and structures for upcoming TLS
support.
2) Add support for MCQI and MCQS hardware registers for fw version query.
3) Added hardware bits and structures definitions for sub-functions
4) Small code cleanup and improvement for PF pci driver.
5) Bluefield (ECPF) updates and refactoring for better E-Switch
management on ECPF embedded CPU NIC:
5.1) Consolidate querying eswitch number of VFs
5.2) Register event handler at the correct E-Switch init stage
5.3) Setup PF's inline mode and vlan pop when the ECPF is the
E-Swtich manager ( the host PF is basically a VF ).
5.4) Handle Vport UC address changes in switchdev mode.
6) Cleanup the rep and netdev reference when unloading IB rep.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
i# All conflicts fixed but you are still merging.
David S. Miller [Thu, 4 Jul 2019 19:48:21 +0000 (12:48 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-03
The following pull-request contains BPF updates for your *net-next* tree.
There is a minor merge conflict in mlx5 due to
8960b38932be ("linux/dim:
Rename externally used net_dim members") which has been pulled into your
tree in the meantime, but resolution seems not that bad ... getting current
bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed
just so he's aware of the resolution below:
** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
static int mlx5e_open_cq(struct mlx5e_channel *c,
struct dim_cq_moder moder,
struct mlx5e_cq_param *param,
struct mlx5e_cq *cq)
=======
int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder,
struct mlx5e_cq_param *param, struct mlx5e_cq *cq)
>>>>>>>
e5a3e259ef239f443951d401db10db7d426c9497
Resolution is to take the second chunk and rename net_dim_cq_moder into
dim_cq_moder. Also the signature for mlx5e_open_cq() in ...
drivers/net/ethernet/mellanox/mlx5/core/en.h +977
... and in mlx5e_open_xsk() ...
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64
... needs the same rename from net_dim_cq_moder into dim_cq_moder.
** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c:
<<<<<<< HEAD
int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix));
struct dim_cq_moder icocq_moder = {0, 0};
struct net_device *netdev = priv->netdev;
struct mlx5e_channel *c;
unsigned int irq;
=======
struct net_dim_cq_moder icocq_moder = {0, 0};
>>>>>>>
e5a3e259ef239f443951d401db10db7d426c9497
Take the second chunk and rename net_dim_cq_moder into dim_cq_moder
as well.
Let me know if you run into any issues. Anyway, the main changes are:
1) Long-awaited AF_XDP support for mlx5e driver, from Maxim.
2) Addition of two new per-cgroup BPF hooks for getsockopt and
setsockopt along with a new sockopt program type which allows more
fine-grained pass/reject settings for containers. Also add a sock_ops
callback that can be selectively enabled on a per-socket basis and is
executed for every RTT to help tracking TCP statistics, both features
from Stanislav.
3) Follow-up fix from loops in precision tracking which was not propagating
precision marks and as a result verifier assumed that some branches were
not taken and therefore wrongly removed as dead code, from Alexei.
4) Fix BPF cgroup release synchronization race which could lead to a
double-free if a leaf's cgroup_bpf object is released and a new BPF
program is attached to the one of ancestor cgroups in parallel, from Roman.
5) Support for bulking XDP_TX on veth devices which improves performance
in some cases by around 9%, from Toshiaki.
6) Allow for lookups into BPF devmap and improve feedback when calling into
bpf_redirect_map() as lookup is now performed right away in the helper
itself, from Toke.
7) Add support for fq's Earliest Departure Time to the Host Bandwidth
Manager (HBM) sample BPF program, from Lawrence.
8) Various cleanups and minor fixes all over the place from many others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
René van Dorst [Wed, 3 Jul 2019 18:42:04 +0000 (20:42 +0200)]
net: ethernet: mediatek: Fix overlapping capability bits.
Both MTK_TRGMII_MT7621_CLK and MTK_PATH_BIT are defined as bit 10.
This can causes issues on non-MT7621 devices which has the
MTK_PATH_BIT(MTK_ETH_PATH_GMAC1_RGMII) and MTK_TRGMII capability set.
The wrong TRGMII setup code can be executed. The current wrongly executed
code doesn’t do any harm on MT7623 and the TRGMII setup for the MT7623
SOC side is done in MT7530 driver So it wasn’t noticed in the test.
Move all capability bits in one enum so that they are all unique and easy
to expand in the future.
Because mtk_eth_path enum is merged in to mkt_eth_capabilities, the
variable path value is no longer between 0 to number of paths,
mtk_eth_path_name can’t be used anymore in this form. Convert the
mtk_eth_path_name array to a function to lookup the pathname.
The old code walked thru the mtk_eth_path enum, which is also merged
with mkt_eth_capabilities. Expand array mtk_eth_muxc so it can store the
name and capability bit of the mux. Convert the code so it can walk thru
the mtk_eth_muxc array.
Fixes: 8efaa653a8a5 ("net: ethernet: mediatek: Add MT7621 TRGMII mode support")
Signed-off-by: René van Dorst <opensource@vdorst.com>
v1->v2:
- Move all capability bits in one enum, suggested by Willem de Bruijn
- Convert the mtk_eth_path_name array to a function to lookup the pathname
- Expand array mtk_eth_muxc so it can also store the name and capability
bit of the mux
- Updated commit message
Signed-off-by: David S. Miller <davem@davemloft.net>
Weifeng Voon [Wed, 3 Jul 2019 16:59:10 +0000 (00:59 +0800)]
net: stmmac: Enable dwmac4 jumbo frame more than 8KiB
Enable GMAC v4.xx and beyond to support 16KiB buffer.
Signed-off-by: Weifeng Voon <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Bernat [Tue, 2 Jul 2019 17:43:54 +0000 (19:43 +0200)]
bonding: add an option to specify a delay between peer notifications
Currently, gratuitous ARP/ND packets are sent every `miimon'
milliseconds. This commit allows a user to specify a custom delay
through a new option, `peer_notif_delay'.
Like for `updelay' and `downdelay', this delay should be a multiple of
`miimon' to avoid managing an additional work queue. The configuration
logic is copied from `updelay' and `downdelay'. However, the default
value cannot be set using a module parameter: Netlink or sysfs should
be used to configure this feature.
When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
observe the 500 ms delay is respected:
20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
In bond_mii_monitor(), I have tried to keep the lock logic readable.
The change is due to the fact we cannot rely on a notification to
lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
only triggered once every N times, while we need to decrement the
counter each time.
iproute2 also needs to be updated to be able to specify this new
attribute through `ip link'.
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 4 Jul 2019 12:36:51 +0000 (13:36 +0100)]
net: ethernet: sun: remove redundant assignment to variable err
The variable err is being assigned with a value that is never
read and it is being updated in the next statement with a new value.
The assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Zhang [Tue, 2 Jul 2019 10:02:30 +0000 (13:02 +0300)]
net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS
qp modification can be used to change the counter of a QP.
Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Colin Ian King [Wed, 3 Jul 2019 16:50:37 +0000 (17:50 +0100)]
gve: fix -ENOMEM null check on a page allocation
Currently the check to see if a page is allocated is incorrect
and is checking if the pointer page is null, not *page as
intended. Fix this.
Addresses-Coverity: ("Dereference before null check")
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 3 Jul 2019 20:51:54 +0000 (13:51 -0700)]
Merge branch 'net-ICW-sendmsg-recvmsg'
Paolo Abeni says:
====================
net: use ICW for sk_proto->{send,recv}msg
This series extends ICW usage to one of the few remaining spots in fast-path
still hitting per packet retpoline overhead, namely the sk_proto->{send,recv}msg
calls.
The first 3 patches in this series refactor the existing code so that applying
the ICW macros is straight-forward: we demux inet_{recv,send}msg in ipv4 and
ipv6 variants so that each of them can easily select the appropriate TCP or UDP
direct call. While at it, a new helper is created to avoid excessive code
duplication, and the current ICWs for inet_{recv,send}msg are adjusted
accordingly.
The last 2 patches really introduce the new ICW use-case, respectively for the
ipv6 and the ipv4 code path.
This gives up to 5% performance improvement under UDP flood, and smaller but
measurable gains for TCP RR workloads.
v1 -> v2:
- drop inet6_{recv,send}msg declaration from header file,
prefer ICW macro instead
- avoid unneeded reclaration for udp_sendmsg, as suggested by Willem
====================
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 3 Jul 2019 14:06:56 +0000 (16:06 +0200)]
ipv4: use indirect call wrappers for {tcp, udp}_{recv, send}msg()
This avoids an indirect call per syscall for common ipv4 transports
v1 -> v2:
- avoid unneeded reclaration for udp_sendmsg, as suggested by Willem
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 3 Jul 2019 14:06:55 +0000 (16:06 +0200)]
ipv6: use indirect call wrappers for {tcp, udpv6}_{recv, send}msg()
This avoids an indirect call per syscall for common ipv6 transports
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 3 Jul 2019 14:06:54 +0000 (16:06 +0200)]
net: adjust socket level ICW to cope with ipv6 variant of {recv, send}msg
After the previous patch we have ipv{6,4} variants for {recv,send}msg,
we should use the generic _INET ICW variant to call into the proper
build-in.
This also allows dropping the now unused and rather ugly _INET4 ICW macro
v1 -> v2:
- use ICW macro to declare inet6_{recv,send}msg
- fix a couple of checkpatch offender in the code context
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 3 Jul 2019 14:06:53 +0000 (16:06 +0200)]
ipv6: provide and use ipv6 specific version for {recv, send}msg
This will simplify indirect call wrapper invocation in the following
patch.
No functional change intended, any - out-of-tree - IPv6 user of
inet_{recv,send}msg can keep using the existing functions.
SCTP code still uses the existing version even for ipv6: as this series
will not add ICW for SCTP, moving to the new helper would not give
any benefit.
The only other in-kernel user of inet_{recv,send}msg is
pvcalls_conn_back_read(), but psvcalls explicitly creates only IPv4 socket,
so no need to update that code path, too.
v1 -> v2: drop inet6_{recv,send}msg declaration from header file,
prefer ICW macro instead
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 3 Jul 2019 14:06:52 +0000 (16:06 +0200)]
inet: factor out inet_send_prepare()
The same code is replicated verbatim in multiple places, and the next
patches will introduce an additional user for it. Factor out a
helper and use it where appropriate. No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Tue, 2 Jul 2019 14:12:09 +0000 (17:12 +0300)]
net/mlx5: Properly name the generic WQE control field
A generic WQE control field is used for different purposes
in different cases.
Use union to allow using the proper name in each case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Wed, 3 Apr 2019 10:05:50 +0000 (13:05 +0300)]
net/mlx5: Introduce TLS TX offload hardware bits and structures
Add TLS offload related IFC structs, layouts and enumerations.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Fri, 7 Jun 2019 12:16:58 +0000 (07:16 -0500)]
net/mlx5: Refactor mlx5_esw_query_functions for modularity
Functions change event output data size changes when functions other
than VFs will be enabled in HCA CAP.
With current API, multiple callers needs to align, calculate accurate
size of the output data depending on number on non VF functions enabled
in the device.
Instead of duplicating such math at multiple places, refactor
mlx5_esw_query_functions() to return raw output allocated by itself.
Caller must free the allocated memory using kvfree() as described in the
function comment section.
This hides calcuation within mlx5_esw_query_functions() and provides
simpler API.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Fri, 7 Jun 2019 11:44:17 +0000 (06:44 -0500)]
net/mlx5: E-Switch prepare functions change handler to be modular
Eswitch function change handler will service multiple type of events for
VFs and non VF functions update.
Hence, introduce and use the helper function
esw_vfs_changed_event_handler() for handling change in num VFs to improve
the code readability.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Parav Pandit [Wed, 15 May 2019 05:04:27 +0000 (00:04 -0500)]
net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports().
mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF
vports as well.
Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to
more generic vport.h header file. Such exposure is not desired.
Hence a mlx5_eswitch_get_total_vports() is introduced.
Given that mlx5_eswitch_get_total_vports() API wants to work on const
mlx5_core_dev*, change its helper functions also to accept const *dev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Colin Ian King [Wed, 3 Jul 2019 08:32:14 +0000 (09:32 +0100)]
qlcnic: remove redundant assignment to variable err
The variable err is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 3 Jul 2019 07:53:58 +0000 (08:53 +0100)]
atl1c: remove redundant assignment to variable tpd_req
The variable tpd_req is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru [Wed, 3 Jul 2019 06:01:59 +0000 (23:01 -0700)]
qed: Add support for Timestamping the unicast PTP packets.
This patch adds driver changes to detect/timestamp the unicast PTP packets.
Changes from previous version:
-------------------------------
v2: Defined a macro for unicast ptp param mask.
Please consider applying this to "net-next".
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Tue, 2 Jul 2019 22:46:57 +0000 (15:46 -0700)]
gve: Fix u64_stats_sync to initialize start
u64_stats_fetch_begin needs to initialize start.
Signed-off-by: Catherine Sullivan <csully@google.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 3 Jul 2019 06:16:31 +0000 (23:16 -0700)]
loopback: fix lockdep splat
dev_init_scheduler() and dev_activate() expect the caller to
hold RTNL. Since we don't want blackhole device to be initialized
per ns, we are initializing at init.
[ 3.855027] Call Trace:
[ 3.855034] dump_stack+0x67/0x95
[ 3.855037] lockdep_rcu_suspicious+0xd5/0x110
[ 3.855044] dev_init_scheduler+0xe3/0x120
[ 3.855048] ? net_olddevs_init+0x60/0x60
[ 3.855050] blackhole_netdev_init+0x45/0x6e
[ 3.855052] do_one_initcall+0x6c/0x2fa
[ 3.855058] ? rcu_read_lock_sched_held+0x8c/0xa0
[ 3.855066] kernel_init_freeable+0x1e5/0x288
[ 3.855071] ? rest_init+0x260/0x260
[ 3.855074] kernel_init+0xf/0x180
[ 3.855076] ? rest_init+0x260/0x260
[ 3.855078] ret_from_fork+0x24/0x30
Fixes: 4de83b88c66 ("loopback: create blackhole net device similar to loopack.")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yishai Hadas [Sun, 30 Jun 2019 16:23:28 +0000 (19:23 +0300)]
net/mlx5: Expose device definitions for object events
Expose an extra device definitions for objects events.
It includes: object_type values for legacy objects and generic data
header for any other object.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:27 +0000 (19:23 +0300)]
net/mlx5: Report EQE data upon CQ completion
Report EQE data upon CQ completion to let upper layers use this data.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:26 +0000 (19:23 +0300)]
net/mlx5: Report a CQ error event only when a handler was set
Report a CQ error event only when a handler was set.
This enables mlx5_ib to not set a handler upon CQ creation and use some
other mechanism to get this event as of other events by the
mlx5_eq_notifier_register API.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:25 +0000 (19:23 +0300)]
net/mlx5: mlx5_core_create_cq() enhancements
Enhance mlx5_core_create_cq() to get the command out buffer from the
callers to let them use the output.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:24 +0000 (19:23 +0300)]
net/mlx5: Expose the API to register for ANY event
Expose the API to register for ANY event, mlx5_ib will be able to use
this functionality for its needs.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:23 +0000 (19:23 +0300)]
net/mlx5: Use event mask based on device capabilities
Use the reported device capabilities for the supported user events (i.e.
affiliated and un-affiliated) to set the EQ mask.
As the event mask can be up to 256 defined by 4 entries of u64 change
the applicable code to work accordingly.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Yishai Hadas [Sun, 30 Jun 2019 16:23:22 +0000 (19:23 +0300)]
net/mlx5: Fix mlx5_core_destroy_cq() error flow
The firmware command to destroy a CQ might fail when the object is
referenced by other object and the ref count is managed by the firmware.
To enable a second successful destruction post the first failure need to
change mlx5_eq_del_cq() to be a void function.
As an error in mlx5_eq_del_cq() is quite fatal from the option to
recover, a debug message inside it should be good enougth and it was
changed to be void.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Daniel Borkmann [Wed, 3 Jul 2019 14:52:03 +0000 (16:52 +0200)]
Merge branch 'bpf-tcp-rtt-hook'
Stanislav Fomichev says:
====================
Congestion control team would like to have a periodic callback to
track some TCP statistics. Let's add a sock_ops callback that can be
selectively enabled on a socket by socket basis and is executed for
every RTT. BPF program frequency can be further controlled by calling
bpf_ktime_get_ns and bailing out early.
I run neper tcp_stream and tcp_rr tests with the sample program
from the last patch and didn't observe any noticeable performance
difference.
v2:
* add a comment about second accept() in selftest (Yonghong Song)
* refer to tcp_bpf.readme in sample program (Yonghong Song)
====================
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:14:03 +0000 (09:14 -0700)]
samples/bpf: fix tcp_bpf.readme detach command
Copy-paste, should be detach, not attach.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:14:02 +0000 (09:14 -0700)]
samples/bpf: add sample program that periodically dumps TCP stats
Uses new RTT callback to dump stats every second.
$ mkdir -p /tmp/cgroupv2
$ mount -t cgroup2 none /tmp/cgroupv2
$ mkdir -p /tmp/cgroupv2/foo
$ echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
$ bpftool prog load ./tcp_dumpstats_kern.o /sys/fs/bpf/tcp_prog
$ bpftool cgroup attach /tmp/cgroupv2/foo sock_ops pinned /sys/fs/bpf/tcp_prog
$ bpftool prog tracelog
$ # run neper/netperf/etc
Used neper to compare performance with and without this program attached
and didn't see any noticeable performance impact.
Sample output:
<idle>-0 [015] ..s. 2074.128800: 0: dsack_dups=0 delivered=242526
<idle>-0 [015] ..s. 2074.128808: 0: delivered_ce=0 icsk_retransmits=0
<idle>-0 [015] ..s. 2075.130133: 0: dsack_dups=0 delivered=323599
<idle>-0 [015] ..s. 2075.130138: 0: delivered_ce=0 icsk_retransmits=0
<idle>-0 [005] .Ns. 2076.131440: 0: dsack_dups=0 delivered=404648
<idle>-0 [005] .Ns. 2076.131447: 0: delivered_ce=0 icsk_retransmits=0
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:14:01 +0000 (09:14 -0700)]
selftests/bpf: test BPF_SOCK_OPS_RTT_CB
Make sure the callback is invoked for syn-ack and data packet.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:14:00 +0000 (09:14 -0700)]
bpf/tools: sync bpf.h
Sync new bpf_tcp_sock fields and new BPF_PROG_TYPE_SOCK_OPS RTT callback.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:13:59 +0000 (09:13 -0700)]
bpf: add icsk_retransmits to bpf_tcp_sock
Add some inet_connection_sock fields to bpf_tcp_sock that might be useful
for debugging congestion control issues.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:13:58 +0000 (09:13 -0700)]
bpf: add dsack_dups/delivered{, _ce} to bpf_tcp_sock
Add more fields to bpf_tcp_sock that might be useful for debugging
congestion control issues.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:13:57 +0000 (09:13 -0700)]
bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation
We've added bpf_tcp_sock member to bpf_sock_ops and don't expect
any new tcp_sock fields in bpf_sock_ops. Let's remove
CONVERT_COMMON_TCP_SOCK_FIELDS so bpf_tcp_sock can be independently
extended.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Stanislav Fomichev [Tue, 2 Jul 2019 16:13:56 +0000 (09:13 -0700)]
bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT
Performance impact should be minimal because it's under a new
BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled.
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Priyaranjan Jha <priyarjha@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jiri Benc [Tue, 2 Jul 2019 18:26:51 +0000 (20:26 +0200)]
selftests: bpf: standardize to static __always_inline
The progs for bpf selftests use several different notations to force
function inlining. Standardize to what most of them use,
static __always_inline.
Suggested-by: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
brakmo [Tue, 2 Jul 2019 22:09:52 +0000 (15:09 -0700)]
bpf: Add support for fq's EDT to HBM
Adds support for fq's Earliest Departure Time to HBM (Host Bandwidth
Manager). Includes a new BPF program supporting EDT, and also updates
corresponding programs.
It will drop packets with an EDT of more than 500us in the future
unless the packet belongs to a flow with less than 2 packets in flight.
This is done so each flow has at least 2 packets in flight, so they
will not starve, and also to help prevent delayed ACK timeouts.
It will also work with ECN enabled traffic, where the packets will be
CE marked if their EDT is more than 50us in the future.
The table below shows some performance numbers. The flows are back to
back RPCS. One server sending to another, either 2 or 4 flows.
One flow is a 10KB RPC, the rest are 1MB RPCs. When there are more
than one flow of a given RPC size, the numbers represent averages.
The rate limit applies to all flows (they are in the same cgroup).
Tests ending with "-edt" ran with the new BPF program supporting EDT.
Tests ending with "-hbt" ran on top HBT qdisc with the specified rate
(i.e. no HBM). The other tests ran with the HBM BPF program included
in the HBM patch-set.
EDT has limited value when using DCTCP, but it helps in many cases when
using Cubic. It usually achieves larger link utilization and lower
99% latencies for the 1MB RPCs.
HBM ends up queueing a lot of packets with its default parameter values,
reducing the goodput of the 10KB RPCs and increasing their latency. Also,
the RTTs seen by the flows are quite large.
Aggr 10K 10K 10K 1MB 1MB 1MB
Limit rate drops RTT rate P90 P99 rate P90 P99
Test rate Flows Mbps % us Mbps us us Mbps ms ms
-------- ---- ----- ---- ----- --- ---- ---- ---- ---- ---- ----
cubic 1G 2 904 0.02 108 257 511 539 647 13.4 24.5
cubic-edt 1G 2 982 0.01 156 239 656 967 743 14.0 17.2
dctcp 1G 2 977 0.00 105 324 408 744 653 14.5 15.9
dctcp-edt 1G 2 981 0.01 142 321 417 811 660 15.7 17.0
cubic-htb 1G 2 919 0.00 1825 40 2822 4140 879 9.7 9.9
cubic 200M 2 155 0.30 220 81 532 655 74 283 450
cubic-edt 200M 2 188 0.02 222 87 1035 1095 101 84 85
dctcp 200M 2 188 0.03 111 77 912 939 111 76 325
dctcp-edt 200M 2 188 0.03 217 74 1416 1738 114 76 79
cubic-htb 200M 2 188 0.00 5015 8 14ms 15ms 180 48 50
cubic 1G 4 952 0.03 110 165 516 546 262 38 154
cubic-edt 1G 4 973 0.01 190 111 1034 1314 287 65 79
dctcp 1G 4 951 0.00 103 180 617 905 257 37 38
dctcp-edt 1G 4 967 0.00 163 151 732 1126 272 43 55
cubic-htb 1G 4 914 0.00 3249 13 7ms 8ms 300 29 34
cubic 5G 4 4236 0.00 134 305 490 624 1310 10 17
cubic-edt 5G 4 4865 0.00 156 306 425 759 1520 10 16
dctcp 5G 4 4936 0.00 128 485 221 409 1484 7 9
dctcp-edt 5G 4 4924 0.00 148 390 392 623 1508 11 26
v1 -> v2: Incorporated Andrii's suggestions
v2 -> v3: Incorporated Yonghong's suggestions
v3 -> v4: Removed credit update that is not needed
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Leo Yan [Tue, 2 Jul 2019 10:25:31 +0000 (18:25 +0800)]
bpf, libbpf, smatch: Fix potential NULL pointer dereference
Based on the following report from Smatch, fix the potential NULL
pointer dereference check:
tools/lib/bpf/libbpf.c:3493
bpf_prog_load_xattr() warn: variable dereferenced before check 'attr'
(see line 3483)
3479 int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
3480 struct bpf_object **pobj, int *prog_fd)
3481 {
3482 struct bpf_object_open_attr open_attr = {
3483 .file = attr->file,
3484 .prog_type = attr->prog_type,
^^^^^^
3485 };
At the head of function, it directly access 'attr' without checking
if it's NULL pointer. This patch moves the values assignment after
validating 'attr' and 'attr->file'.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Tue, 2 Jul 2019 15:16:20 +0000 (08:16 -0700)]
libbpf: fix GCC8 warning for strncpy
GCC8 started emitting warning about using strncpy with number of bytes
exactly equal destination size, which is generally unsafe, as can lead
to non-zero terminated string being copied. Use IFNAMSIZ - 1 as number
of bytes to ensure name is always zero-terminated.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov [Fri, 28 Jun 2019 16:24:09 +0000 (09:24 -0700)]
bpf: fix precision tracking
When equivalent state is found the current state needs to propagate precision marks.
Otherwise the verifier will prune the search incorrectly.
There is a price for correctness:
before before broken fixed
cnst spill precise precise
bpf_lb-DLB_L3.o 1923 8128 1863 1898
bpf_lb-DLB_L4.o 3077 6707 2468 2666
bpf_lb-DUNKNOWN.o 1062 1062 544 544
bpf_lxc-DDROP_ALL.o 166729 380712 22629 36823
bpf_lxc-DUNKNOWN.o 174607 440652 28805 45325
bpf_netdev.o 8407 31904 6801 7002
bpf_overlay.o 5420 23569 4754 4858
bpf_lxc_jit.o 39389 359445 50925 69631
Overall precision tracking is still very effective.
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Reported-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Tested-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Petr Machata [Tue, 2 Jul 2019 19:06:47 +0000 (19:06 +0000)]
mlxsw: spectrum_ptp: Fix validation in mlxsw_sp1_ptp_packet_finish()
Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates
whether the corresponding port is still valid. However the condition is
incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to
compare it to skb->dev.
The condition needs to check whether the port is present and skb->dev still
refers to that port (or else is NULL). If that does not hold, bail out.
Add a pair of parentheses to fix the condition.
Fixes: d92e4e6e33c8 ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 2 Jul 2019 18:46:09 +0000 (20:46 +0200)]
r8169: add random MAC address fallback
It was reported that the GPD MicroPC is broken in a way that no valid
MAC address can be read from the network chip. The vendor driver deals
with this by assigning a random MAC address as fallback. So let's do
the same.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 2 Jul 2019 05:59:17 +0000 (07:59 +0200)]
Revert "r8169: improve handling VLAN tag"
This reverts commit
759d095741721888b6ee51afa74e0a66ce65e974.
The patch was based on a misunderstanding. As Al Viro pointed out [0]
it's simply wrong on big endian. So let's revert it.
[0] https://marc.info/?t=
156200975600004&r=1&w=2
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Mon, 1 Jul 2019 22:42:25 +0000 (00:42 +0200)]
net: stmmac: make "snps,reset-delays-us" optional again
Commit
760f1dc2958022 ("net: stmmac: add sanity check to
device_property_read_u32_array call") introduced error checking of the
device_property_read_u32_array() call in stmmac_mdio_reset().
This results in the following error when the "snps,reset-delays-us"
property is not defined in devicetree:
invalid property snps,reset-delays-us
This sanity check made sense until commit
84ce4d0f9f55b4 ("net: stmmac:
initialize the reset delay array") ensured that there are fallback
values for the reset delay if the "snps,reset-delays-us" property is
absent. That was at the cost of making that property mandatory though.
Drop the sanity check for device_property_read_u32_array() and thus make
the "snps,reset-delays-us" property optional again (avoiding the error
message while loading the stmmac driver with a .dtb where the property
is absent).
Fixes: 760f1dc2958022 ("net: stmmac: add sanity check to device_property_read_u32_array call")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 1 Jul 2019 17:48:51 +0000 (10:48 -0700)]
bonding/main: fix NULL dereference in bond_select_active_slave()
A bonding master can be up while best_slave is NULL.
[12105.636318] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[12105.638204] mlx4_en: eth1: Linkstate event 1 -> 1
[12105.648984] IP: bond_select_active_slave+0x125/0x250
[12105.653977] PGD 0 P4D 0
[12105.656572] Oops: 0000 [#1] SMP PTI
[12105.660487] gsmi: Log Shutdown Reason 0x03
[12105.664620] Modules linked in: kvm_intel loop act_mirred uhaul vfat fat stg_standard_ftl stg_megablocks stg_idt stg_hdi stg elephant_dev_num stg_idt_eeprom w1_therm wire i2c_mux_pca954x i2c_mux mlx4_i2c i2c_usb cdc_acm ehci_pci ehci_hcd i2c_iimc mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core [last unloaded: kvm_intel]
[12105.685686] mlx4_core 0000:03:00.0: dispatching link up event for port 2
[12105.685700] mlx4_en: eth2: Linkstate event 2 -> 1
[12105.685700] mlx4_en: eth2: Link Up (linkstate)
[12105.724452] Workqueue: bond0 bond_mii_monitor
[12105.728854] RIP: 0010:bond_select_active_slave+0x125/0x250
[12105.734355] RSP: 0018:
ffffaf146a81fd88 EFLAGS:
00010246
[12105.739637] RAX:
0000000000000003 RBX:
ffff8c62b03c6900 RCX:
0000000000000000
[12105.746838] RDX:
0000000000000000 RSI:
ffffaf146a81fd08 RDI:
ffff8c62b03c6000
[12105.754054] RBP:
ffffaf146a81fdb8 R08:
0000000000000001 R09:
ffff8c517d387600
[12105.761299] R10:
00000000001075d9 R11:
ffffffffaceba92f R12:
0000000000000000
[12105.768553] R13:
ffff8c8240ae4800 R14:
0000000000000000 R15:
0000000000000000
[12105.775748] FS:
0000000000000000(0000) GS:
ffff8c62bfa40000(0000) knlGS:
0000000000000000
[12105.783892] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[12105.789716] CR2:
0000000000000000 CR3:
0000000d0520e001 CR4:
00000000001626f0
[12105.796976] Call Trace:
[12105.799446] [<
ffffffffac31d387>] bond_mii_monitor+0x497/0x6f0
[12105.805317] [<
ffffffffabd42643>] process_one_work+0x143/0x370
[12105.811225] [<
ffffffffabd42c7a>] worker_thread+0x4a/0x360
[12105.816761] [<
ffffffffabd48bc5>] kthread+0x105/0x140
[12105.821865] [<
ffffffffabd42c30>] ? rescuer_thread+0x380/0x380
[12105.827757] [<
ffffffffabd48ac0>] ? kthread_associate_blkcg+0xc0/0xc0
[12105.834266] [<
ffffffffac600241>] ret_from_fork+0x51/0x60
Fixes: e2a7420df2e0 ("bonding/main: convert to using slave printk macros")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Cc: Jarod Wilson <jarod@redhat.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 1 Jul 2019 16:57:19 +0000 (00:57 +0800)]
tipc: remove ub->ubsock checks
Both tipc_udp_enable and tipc_udp_disable are called under rtnl_lock,
ub->ubsock could never be NULL in tipc_udp_disable and cleanup_bearer,
so remove the check.
Also remove the one in tipc_udp_enable by adding "free" label.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Sat, 29 Jun 2019 17:55:08 +0000 (19:55 +0200)]
ipv4: Fix off-by-one in route dump counter without netlink strict checking
In commit
ee28906fd7a1 ("ipv4: Dump route exceptions if requested") I
added a counter of per-node dumped routes (including actual routes and
exceptions), analogous to the existing counter for dumped nodes. Dumping
exceptions means we need to also keep track of how many routes are dumped
for each node: this would be just one route per node, without exceptions.
When netlink strict checking is not enabled, we dump both routes and
exceptions at the same time: the RTM_F_CLONED flag is not used as a
filter. In this case, the per-node counter 'i_fa' is incremented by one
to track the single dumped route, then also incremented by one for each
exception dumped, and then stored as netlink callback argument as skip
counter, 's_fa', to be used when a partial dump operation restarts.
The per-node counter needs to be increased by one also when we skip a
route (exception) due to a previous non-zero skip counter, because it
needs to match the existing skip counter, if we are dumping both routes
and exceptions. I missed this, and only incremented the counter, for
regular routes, if the previous skip counter was zero. This means that,
in case of a mixed dump, partial dump operations after the first one
will start with a mismatching skip counter value, one less than expected.
This means in turn that the first exception for a given node is skipped
every time a partial dump operation restarts, if netlink strict checking
is not enabled (iproute < 5.0).
It turns out I didn't repeat the test in its final version, commit
de755a85130e ("selftests: pmtu: Introduce list_flush_ipv4_exception test
case"), which also counts the number of route exceptions returned, with
iproute2 versions < 5.0 -- I was instead using the equivalent of the IPv6
test as it was before commit
b964641e9925 ("selftests: pmtu: Make
list_flush_ipv6_exception test more demanding").
Always increment the per-node counter by one if we previously dumped
a regular route, so that it matches the current skip counter.
Fixes: ee28906fd7a1 ("ipv4: Dump route exceptions if requested")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
René van Dorst [Sat, 29 Jun 2019 12:24:51 +0000 (14:24 +0200)]
net: ethernet: mediatek: Allow non TRGMII mode with MT7621 DDR2 devices
No reason to error out on a MT7621 device with DDR2 memory when non
TRGMII mode is selected.
Only MT7621 DDR2 clock setup is not supported for TRGMII mode.
But non TRGMII mode doesn't need any special clock setup.
Signed-off-by: René van Dorst <opensource@vdorst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Tue, 2 Jul 2019 14:55:28 +0000 (15:55 +0100)]
rxrpc: Fix uninitialized error code in rxrpc_send_data_packet()
With gcc 4.1:
net/rxrpc/output.c: In function ‘rxrpc_send_data_packet’:
net/rxrpc/output.c:338: warning: ‘ret’ may be used uninitialized in this function
Indeed, if the first jump to the send_fragmentable label is made, and
the address family is not handled in the switch() statement, ret will be
used uninitialized.
Fix this by BUG()'ing as is done in other places in rxrpc where internal
support for future address families will need adding. It should not be
possible to reach this normally as the address families are checked
up-front.
Fixes: 5a924b8951f835b5 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 2 Jul 2019 13:16:42 +0000 (14:16 +0100)]
nfc: st-nci: remove redundant assignment to variable r
The variable r is being initialized with a value that is never
read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xue Chaojing [Mon, 1 Jul 2019 23:40:00 +0000 (23:40 +0000)]
hinic: remove standard netdev stats
This patch removes standard netdev stats in ethtool -S.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 2 Jul 2019 09:12:10 +0000 (11:12 +0200)]
net: stmmac: Re-word Kconfig entry
We support many speeds and it doesn't make much sense to list them all
in the Kconfig. Let's just call it Multi-Gigabit.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 Jul 2019 02:36:35 +0000 (19:36 -0700)]
Merge branch 'Add-gve-driver'
Catherine Sullivan says:
====================
Add gve driver
This patch series adds the gve driver which will support the
Compute Engine Virtual NIC that will be available in the future.
v2:
- Patch 1:
- Remove gve_size_assert.h and use static_assert instead.
- Loop forever instead of bugging if the device won't reset
- Use module_pci_driver
- Patch 2:
- Use be16_to_cpu in the RX Seq No define
- Remove unneeded ndo_change_mtu
- Patch 3:
- No Changes
- Patch 4:
- Instead of checking netif_carrier_ok in ethtool stats, just make sure
v3:
- Patch 1:
- Remove X86 dep
- Patch 2:
- No changes
- Patch 3:
- No changes
- Patch 4:
- Remove unneeded memsets in ethtool stats
v4:
- Patch 1:
- Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
- Explicitly add padding to gve_adminq_set_driver_parameter
- Use static where appropriate
- Patch 2:
- Use u64_stats_sync
- Explicity add padding to gve_adminq_create_rx_queue
- Fix some enianness typing issues found by kbuild
- Use static where appropriate
- Remove unused variables
- Patch 3:
- Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
- Patch 4:
- Use u64_stats_sync
- Use static where appropriate
Warnings reported by:
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Mon, 1 Jul 2019 22:57:55 +0000 (15:57 -0700)]
gve: Add ethtool support
Add support for the following ethtool commands:
ethtool -s|--change devname [msglvl N] [msglevel type on|off]
ethtool -S|--statistics devname
ethtool -i|--driver devname
ethtool -l|--show-channels devname
ethtool -L|--set-channels devname
ethtool -g|--show-ring devname
ethtool --reset devname
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Mon, 1 Jul 2019 22:57:54 +0000 (15:57 -0700)]
gve: Add workqueue and reset support
Add support for the workqueue to handle management interrupts and
support for resets.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Mon, 1 Jul 2019 22:57:53 +0000 (15:57 -0700)]
gve: Add transmit and receive support
Add support for passing traffic.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Mon, 1 Jul 2019 22:57:52 +0000 (15:57 -0700)]
gve: Add basic driver framework for Compute Engine Virtual NIC
Add a driver framework for the Compute Engine Virtual NIC that will be
available in the future.
At this point the only functionality is loading the driver.
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Sagi Shahar <sagis@google.com>
Signed-off-by: Jon Olson <jonolson@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 Jul 2019 02:34:46 +0000 (19:34 -0700)]
Merge branch 'blackhole-device-to-invalidate-dst'
Mahesh Bandewar says:
====================
blackhole device to invalidate dst
When we invalidate dst or mark it "dead", we assign 'lo' to
dst->dev. First of all this assignment is racy and more over,
it has MTU implications.
The standard dev MTU is 1500 while the Loopback MTU is 64k. TCP
code when dereferencing the dst don't check if the dst is valid
or not. TCP when dereferencing a dead-dst while negotiating a
new connection, may use dst device which is 'lo' instead of
using the correct device. Consider the following scenario:
A SYN arrives on an interface and tcp-layer while processing
SYNACK finds a dst and associates it with SYNACK skb. Now before
skb gets passed to L3 for processing, if that dst gets "dead"
(because of the virtual device getting disappeared & then reappeared),
the 'lo' gets assigned to that dst (lo MTU = 64k). Let's assume
the SYN has ADV_MSS set as 9k while the output device through
which this SYNACK is going to go out has standard MTU of 1500.
The MTU check during the route check passes since MIN(9K, 64K)
is 9k and TCP successfully negotiates 9k MSS. The subsequent
data packet; bigger in size gets passed to the device and it
won't be marked as GSO since the assumed MTU of the device is
9k.
This either crashes the NIC and we have seen fixes that went
into drivers to handle this scenario.
8914a595110a ('bnx2x:
disable GSO where gso_size is too big for hardware') and
2b16f048729b ('net: create skb_gso_validate_mac_len()') and
with those fixes TCP eventually recovers but not before
few dropped segments.
Well, I'm not a TCP expert and though we have experienced
these corner cases in our environment, I could not reproduce
this case reliably in my test setup to try this fix myself.
However, Michael Chan <michael.chan@broadcom.com> had a setup
where these fixes helped him mitigate the issue and not cause
the crash.
The idea here is to not alter the data-path with additional
locks or smb()/rmb() barriers to avoid racy assignments but
to create a new device that has really low MTU that has
.ndo_start_xmit essentially a kfree_skb(). Make use of this
device instead of 'lo' when marking the dst dead.
First patch implements the blackhole device and second
patch uses it in IPv4 and IPv6 stack while the third patch
is the self test that ensures the sanity of this device.
v1->v2
fixed the self-test patch to handle the conflict
v2 -> v3
fixed Kconfig text/string.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Mon, 1 Jul 2019 21:39:01 +0000 (14:39 -0700)]
blackhole_dev: add a selftest
Since this is not really a device with all capabilities, this test
ensures that it has *enough* to make it through the data path
without causing unwanted side-effects (read crash!).
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Mon, 1 Jul 2019 21:38:57 +0000 (14:38 -0700)]
blackhole_netdev: use blackhole_netdev to invalidate dst entries
Use blackhole_netdev instead of 'lo' device with lower MTU when marking
dst "dead".
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Mon, 1 Jul 2019 21:38:49 +0000 (14:38 -0700)]
loopback: create blackhole net device similar to loopack.
Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b) MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.
Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Kelam [Sun, 30 Jun 2019 14:29:49 +0000 (19:59 +0530)]
net: ethernet: broadcom: bcm63xx_enet: Remove unneeded memset
Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
__GFP_ZERO flag
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 2 Jul 2019 02:27:08 +0000 (19:27 -0700)]
Merge branch 'net-netsec-Add-XDP-Support'
Ilias Apalodimas says:
====================
net: netsec: Add XDP Support
This is a respin of https://www.spinics.net/lists/netdev/msg526066.html
Since page_pool API fixes are merged into net-next we can now safely use
it's DMA mapping capabilities.
First patch changes the buffer allocation from napi/netdev_alloc_frag()
to page_pool API. Although this will lead to slightly reduced performance
(on raw packet drops only) we can use the API for XDP buffer recycling.
Another side effect is a slight increase in memory usage, due to using a
single page per packet.
The second patch adds XDP support on the driver.
There's a bunch of interesting options that come up due to the single
Tx queue.
Locking is needed(to avoid messing up the Tx queues since ndo_xdp_xmit
and the normal stack can co-exist). We also need to track down the
'buffer type' for TX and properly free or recycle the packet depending
on it's nature.
Changes since RFC:
- Bug fixes from Jesper and Maciej
- Added page pool API to retrieve the DMA direction
Changes since v1:
- Use page_pool_free correctly if xdp_rxq_info_reg() failed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilias Apalodimas [Sat, 29 Jun 2019 05:23:25 +0000 (08:23 +0300)]
net: netsec: add XDP support
The interface only supports 1 Tx queue so locking is introduced on
the Tx queue if XDP is enabled to make sure .ndo_start_xmit and
.ndo_xdp_xmit won't corrupt Tx ring
- Performance (SMMU off)
Benchmark XDP_SKB XDP_DRV
xdp1 291kpps 344kpps
rxdrop 282kpps 342kpps
- Performance (SMMU on)
Benchmark XDP_SKB XDP_DRV
xdp1 167kpps 324kpps
rxdrop 164kpps 323kpps
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>