openwrt/staging/blogic.git
6 years agonet: hns3: Fix ping exited problem when doing lp selftest
Yunsheng Lin [Mon, 3 Sep 2018 10:21:51 +0000 (11:21 +0100)]
net: hns3: Fix ping exited problem when doing lp selftest

When ping is runnig and user executes the loopback selftest, the
ping cmd will stop and exit.

This patch fixes it by using the hns3_nic_net_open/stop to offline
the netdev when doing loopback selftest.

Fixes: c39c4d98dc65 ("net: hns3: Add mac loopback selftest support in hns3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for loopback selftest failed problem
Yunsheng Lin [Mon, 3 Sep 2018 10:21:50 +0000 (11:21 +0100)]
net: hns3: Fix for loopback selftest failed problem

Tqp and mac need to be enabled when doing loopback selftest,
ae_algo->ops->start/stop is used to do the job, there is a
time window between ae_algo->ops->start/stop and loopback setup,
which will cause selftest failed problem when there is frame
coming in during that time window.

This patch fixes it by enabling the tqp and mac during loopback
setup process.

Fixes: c39c4d98dc65 ("net: hns3: Add mac loopback selftest support in hns3 driver")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Implement shutdown ops in hns3 pci driver
Yunsheng Lin [Mon, 3 Sep 2018 10:21:49 +0000 (11:21 +0100)]
net: hns3: Implement shutdown ops in hns3 pci driver

This patch implements shutdown ops in hns3 pci driver, which
unloads the hns3 driver and set the power state to D3hot.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix error of checking used vlan id
Jian Shen [Mon, 3 Sep 2018 10:21:48 +0000 (11:21 +0100)]
net: hns3: Fix error of checking used vlan id

PF uses hdev->vlan_table to manage the port vlan table. In function
hclge_set_vlan_filter_hw(), it checks whether a vlan id has been used,
by foreach all the vport bits. It should use macro HCLGE_VPORT_NUM,
not VLAN_N_VID as the foreach condition.

Fixes: 6c251711b37f ("net: hns3: Disable vf vlan filter when vf vlan table is full")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for multicast failure
Huazhong Tan [Mon, 3 Sep 2018 10:21:47 +0000 (11:21 +0100)]
net: hns3: Fix for multicast failure

When the lower 24 bits of the IPV6 link-local addresses at both
ends are the same, the multicast MAC address for Neigbour Discovery
is the same. The multicast for Neigbour Discovery will fail.

This patch fixes it by including the bonding uplink port in the
multicast group.

Fixes: 46a3df9f9718("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix for vf vlan delete failed problem
Yunsheng Lin [Mon, 3 Sep 2018 10:21:46 +0000 (11:21 +0100)]
net: hns3: Fix for vf vlan delete failed problem

There are only 128 entries in vf vlan table, if user has added
more than 128 vlan, fw will ignore it and disable the vf vlan
table. So when user deletes the vlan entry that has not been
set to vf vlan table, fw will return not found result and driver
treat that as error, which will cause vlan delete failed problem.

This patch fixes it by returning ok when fw returns not found
result.

Fixes: 6c251711b37f ("net: hns3: Disable vf vlan filter when vf vlan table is full")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: correct structure parameter comments for topsrv
Zhenbo Gao [Mon, 3 Sep 2018 06:08:40 +0000 (14:08 +0800)]
tipc: correct structure parameter comments for topsrv

Remove the following obsolete parameter comments of tipc_topsrv struct:
  @rcvbuf_cache
  @tipc_conn_new
  @tipc_conn_release
  @tipc_conn_recvmsg
  @imp
  @type

Add the comments for the missing parameters below of tipc_topsrv struct:
  @awork
  @listener

Remove the unused or duplicated parameter comments of tipc_conn struct:
  @outqueue_lock
  @rx_action

Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Added delayed work for periodically updating the link statistics.
Pradeep Nalla [Sat, 1 Sep 2018 00:44:07 +0000 (17:44 -0700)]
liquidio: Added delayed work for periodically updating the link statistics.

Signed-off-by: Pradeep Nalla <pradeep.nalla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mlx5e-IPoIB-stats'
David S. Miller [Sun, 2 Sep 2018 23:22:43 +0000 (16:22 -0700)]
Merge branch 'mlx5e-IPoIB-stats'

Tariq Toukan says:

====================
mlx5e IPoIB stats

This patchset by Feras contains statistics enhancements and NDO
implementation for the mlx5e IPoIB driver.

Series generated against net-next commit:
2d5c28859839 net: bgmac: remove set but not used variable 'err'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: IPoIB, Use priv stats in completion rx flow
Feras Daoud [Sun, 2 Sep 2018 19:12:10 +0000 (22:12 +0300)]
net/mlx5e: IPoIB, Use priv stats in completion rx flow

Since the RQs are shared between all pkey interfaces, the stats
should be taken from where the per-ring stats are stored instead
of the parent RQ.

Fixes: 4c6c615e3f30 ("net/mlx5e: IPoIB, Add PKEY child interface nic profile")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: IPoIB, Add ndo stats support for IPoIB child devices
Feras Daoud [Sun, 2 Sep 2018 19:12:09 +0000 (22:12 +0300)]
net/mlx5e: IPoIB, Add ndo stats support for IPoIB child devices

Expose RX and TX counters by implementing ndo_get_stats64 operation
for child devices.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: IPoIB, Add ndo stats support for IPoIB netdevices
Feras Daoud [Sun, 2 Sep 2018 19:12:08 +0000 (22:12 +0300)]
net/mlx5e: IPoIB, Add ndo stats support for IPoIB netdevices

Expose RX and TX counters by implementing ndo_get_stats64 operation for
both parent devices.
After this change, all the relevant statistics can be retrieved using
ifconfig.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: IPoIB, Initialize max_opened_tc in mlx5i_init flow
Feras Daoud [Sun, 2 Sep 2018 19:12:07 +0000 (22:12 +0300)]
net/mlx5e: IPoIB, Initialize max_opened_tc in mlx5i_init flow

Enhanced ipoib does not initialize max_opened_tc causing wrong ethtool
statistics. As mlx5e_grp_sw_update_stats relies on this variable, without
this change, the TX statistics will not be updated.

Fixes: 05909babce53 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'Full-phylink-support-for-mv88e6352'
David S. Miller [Sun, 2 Sep 2018 23:16:24 +0000 (16:16 -0700)]
Merge branch 'Full-phylink-support-for-mv88e6352'

Andrew Lunn says:

====================
Full phylink support for mv88e6352

These two patches implement full phylink support for the mv88e6352
family, when using an SFP connected to its SERDES interface. This adds
interrupt support to the SERDES, so that we get interrupts on link
up/down, and then make calls phydev_link_change().

The first patch is a minor bug fix, which does not seem to affect any
current features, so i'm not submitting it for stable. It is however
required for configuring SERDES interrupts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Add SERDES phydev_link_change for 6352
Andrew Lunn [Sun, 2 Sep 2018 16:13:15 +0000 (18:13 +0200)]
net: dsa: mv88e6xxx: Add SERDES phydev_link_change for 6352

The 6352 family has one SERDES interface, which can be used by either
port 4 or port 5. Add interrupt support for the SERDES interface, and
report when the link status changes.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Fix writing to a PHY page.
Andrew Lunn [Sun, 2 Sep 2018 16:13:14 +0000 (18:13 +0200)]
net: dsa: mv88e6xxx: Fix writing to a PHY page.

After changing to the needed page, actually write the value to the
register!

Fixes: 09cb7dfd3f14 ("net: dsa: mv88e6xxx: describe PHY page and SerDes")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: remove useless add operation when init sysctl_max_tw_buckets
Yafang Shao [Sat, 1 Sep 2018 12:21:05 +0000 (20:21 +0800)]
tcp: remove useless add operation when init sysctl_max_tw_buckets

cp_hashinfo.ehash_mask is always an odd number, which is set in function
alloc_large_system_hash(). See bellow,
        if (_hash_mask)
                *_hash_mask = (1 << log2qty) - 1; <<< always odd number

Hence the local variable 'cnt' is a even number, as a result of that it is
no difference to do the incrementation here.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: nixge: Fix Kconfig warning with OF_MDIO
Moritz Fischer [Fri, 31 Aug 2018 20:30:54 +0000 (13:30 -0700)]
net: nixge: Fix Kconfig warning with OF_MDIO

Fix Kconfig warning with OF_MDIO where OF_MDIO was
selected unconditionally instead of only when
OF is actually enabled.

Fixes 7e8d5755be0e ("net: nixge: Add support for 64-bit platforms")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'mvneta-some-small-improvements'
David S. Miller [Sun, 2 Sep 2018 21:13:31 +0000 (14:13 -0700)]
Merge branch 'mvneta-some-small-improvements'

Jisheng Zhang says:

====================
net: mvneta: some small improvements

patch1 removes the NETIF_F_GRO check ourself, because the net subsystem
will handle it for us.

patch2 enables NETIF_F_RXCSUM by default, since the driver and HW
supports the feature.

patch3 is a small optimization, to reduce smp_processor_id() calling
in mvneta_tx_done_gbe.

since v1:
 - based on net-next tree
 - remove the fix patches, since they should be based on net branch.
 - Add Gregory's Reviewed-by tag
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
Jisheng Zhang [Fri, 31 Aug 2018 08:11:09 +0000 (16:11 +0800)]
net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe

In the loop of mvneta_tx_done_gbe(), we call the smp_processor_id()
each time, move the call out of the loop to optimize the code a bit.

Before the patch, the loop looks like(under arm64):

        ldr     x1, [x29,#120]
        ...
        ldr     w24, [x1,#36]
        ...
        bl      0 <_raw_spin_lock>
        str     w24, [x27,#132]
        ...

After the patch, the loop looks like(under arm64):

        ...
        bl      0 <_raw_spin_lock>
        str     w23, [x28,#132]
        ...
where w23 is loaded so be ready before the loop.

>From another side, mvneta_tx_done_gbe() is called from mvneta_poll()
which is in non-preemptible context, so it's safe to call the
smp_processor_id() function once.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvneta: enable NETIF_F_RXCSUM by default
Jisheng Zhang [Fri, 31 Aug 2018 08:10:03 +0000 (16:10 +0800)]
net: mvneta: enable NETIF_F_RXCSUM by default

The code and HW supports NETIF_F_RXCSUM, so let's enable it by default.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvneta: Don't check NETIF_F_GRO ourself
Jisheng Zhang [Fri, 31 Aug 2018 08:09:13 +0000 (16:09 +0800)]
net: mvneta: Don't check NETIF_F_GRO ourself

napi_gro_receive() checks NETIF_F_GRO bit as well, if the bit is not
set, we will go through GRO_NORMAL in napi_skb_finish(), so fall back
to netif_receive_skb_internal(), so we don't need to check NETIF_F_GRO
ourself.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/sched: fix type of htb statistics
Florent Fourcot [Thu, 30 Aug 2018 14:39:23 +0000 (16:39 +0200)]
net/sched: fix type of htb statistics

tokens and ctokens are defined as s64 in htb_class structure,
and clamped to 32bits value during netlink dumps:

cl->xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl->tokens),
                            INT_MIN, INT_MAX);

Defining it as u32 is working since userspace (tc) is printing it as
signed int, but a correct definition from the beginning is probably
better.

In the same time, 'giants' structure member is unused since years, so
update the comment to mark it unused.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bgmac: remove set but not used variable 'err'
YueHaibing [Sat, 1 Sep 2018 03:13:10 +0000 (03:13 +0000)]
net: bgmac: remove set but not used variable 'err'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/bgmac.c: In function 'bgmac_dma_alloc':
drivers/net/ethernet/broadcom/bgmac.c:619:6: warning:
 variable 'err' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: b53: Provide sensible defaults
Florian Fainelli [Fri, 31 Aug 2018 19:29:49 +0000 (12:29 -0700)]
net: dsa: b53: Provide sensible defaults

The SRAB driver is the default way to communicate with the integrated
switch on iProc platforms and the MMAP driver is the way to communicate
with the integrated switch on DSL BCM63xx and CM BCM33xx.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: remove set but not used variable 'irh'
YueHaibing [Fri, 31 Aug 2018 12:03:56 +0000 (12:03 +0000)]
liquidio: remove set but not used variable 'irh'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/cavium/liquidio/request_manager.c: In function 'lio_process_iq_request_list':
drivers/net/ethernet/cavium/liquidio/request_manager.c:383:27: warning:
 variable 'irh' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Lower the severity of a dcbx log message.
Sudarsana Reddy Kalluru [Fri, 31 Aug 2018 11:10:17 +0000 (04:10 -0700)]
qed: Lower the severity of a dcbx log message.

Driver displays an error message for each unrecognized dcbx TLV that's
received from the peer or configured on the device. It is observed that
syslog will be flooded with such messages in certain scenarios e.g.,
frequent link-flaps/lldp-transactions. Changing the severity of this
message to verbose level as it's not an error scenario/message.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agords: store socket timestamps as ktime_t
Arnd Bergmann [Wed, 29 Aug 2018 15:47:19 +0000 (17:47 +0200)]
rds: store socket timestamps as ktime_t

rds is the last in-kernel user of the old do_gettimeofday()
function. Convert it over to ktime_get_real() to make it
work more like the generic socket timestamps, and to let
us kill off do_gettimeofday().

A follow-up patch will have to change the user space interface
to deal better with 32-bit tasks, which may use an incompatible
layout for 'struct timespec'.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/tls: Add test for recv(PEEK) spanning across multiple records
Vakul Garg [Wed, 29 Aug 2018 10:00:14 +0000 (15:30 +0530)]
selftests/tls: Add test for recv(PEEK) spanning across multiple records

Added test case to receive multiple records with a single recvmsg()
operation with a MSG_PEEK set.

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Add support for async decryption of tls records
Vakul Garg [Wed, 29 Aug 2018 09:56:55 +0000 (15:26 +0530)]
net/tls: Add support for async decryption of tls records

When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.

The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.

The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: remove set but not used variable 'rx_stats'
YueHaibing [Fri, 31 Aug 2018 04:08:01 +0000 (04:08 +0000)]
bnxt_en: remove set but not used variable 'rx_stats'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c: In function 'bnxt_vf_rep_rx':
drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:212:28: warning:
 variable 'rx_stats' set but not used [-Wunused-but-set-variable]
  struct bnxt_vf_rep_stats *rx_stats;

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: remove duplicated include from net_failover.c
YueHaibing [Fri, 31 Aug 2018 03:44:27 +0000 (03:44 +0000)]
net: remove duplicated include from net_failover.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: stmmac: Add CBS support in XGMAC2
Jose Abreu [Thu, 30 Aug 2018 14:09:48 +0000 (15:09 +0100)]
net: stmmac: Add CBS support in XGMAC2

XGMAC2 uses the same CBS mechanism as GMAC5, only registers offset
changes. Lets use the same TC callbacks and implement the .config_cbs
callback in XGMAC2 core.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dpaa2-eth-Move-DPAA2-Ethernet-driver'
David S. Miller [Sun, 2 Sep 2018 00:16:59 +0000 (17:16 -0700)]
Merge branch 'dpaa2-eth-Move-DPAA2-Ethernet-driver'

Ioana Radulescu says:

====================
dpaa2-eth: Move DPAA2 Ethernet driver

The Freescale/NXP DPAA2 Ethernet driver was first included in
drivers/staging, due to its dependencies on two components located
there at the time of its initial submission:
* the fsl-mc bus driver, which was moved to drivers/bus in kernel 4.17
* the dpio driver, which was moved to drivers/soc/fsl in kernel 4.18

More information on the DPAA2 architecture and the interactions
between the fsl-mc bus and the objects present on it can be found in:
Documentation/networking/dpaa2/overview.rst

For easier review, the patch is generated without the -M option,
although the driver files are moved without any code changes.

changes since v1[1]:
* remove RFC label, since dependencies have been merged on net-next
* add patch fixing a possible race at probe (reported by Andrew Lunn)

[1] https://lore.kernel.org/patchwork/patch/971333/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodpaa2-eth: Move DPAA2 Ethernet driver from staging to drivers/net
Ioana Radulescu [Wed, 29 Aug 2018 09:42:40 +0000 (04:42 -0500)]
dpaa2-eth: Move DPAA2 Ethernet driver from staging to drivers/net

The DPAA2 Ethernet driver supports Freescale/NXP SoCs with DPAA2
(DataPath Acceleration Architecture v2). The driver manages
network objects discovered on the fsl-mc bus.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agostaging: fsl-dpaa2/eth: Delay netdev_register() call
Ioana Radulescu [Wed, 29 Aug 2018 09:42:39 +0000 (04:42 -0500)]
staging: fsl-dpaa2/eth: Delay netdev_register() call

Only call netdev_register() at the end of the probe function,
once all other necessary bits and pieces are properly initialized.

We keep the rest of the netdevice initialization code in place,
at the earlier point of the probing sequence, including the
settings previously done in ndo_init.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobridge: Switch to bitmap_zalloc()
Andy Shevchenko [Thu, 30 Aug 2018 10:33:18 +0000 (13:33 +0300)]
bridge: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Share main switch IRQ
Marek Behún [Thu, 30 Aug 2018 00:13:50 +0000 (02:13 +0200)]
net: dsa: mv88e6xxx: Share main switch IRQ

On some boards the interrupt can be shared between multiple devices.
For example on Turris Mox the interrupt is shared between all switches.

Signed-off-by: Marek Behun <marek.behun@nic.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Do not reset nl_net in ip6_route_info_create
David Ahern [Wed, 29 Aug 2018 23:54:01 +0000 (16:54 -0700)]
net/ipv6: Do not reset nl_net in ip6_route_info_create

nl_net is set on entry to ip6_route_info_create. Only devices
within that namespace are considered so no need to reset it
before returning.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Add extack message that dev is required for ONLINK
David Ahern [Wed, 29 Aug 2018 23:53:27 +0000 (16:53 -0700)]
net/ipv4: Add extack message that dev is required for ONLINK

Make IPv4 consistent with IPv6 and return an extack message that the
ONLINK flag requires a nexthop device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: change IPv6 flow-label upon receiving spurious retransmission
Yuchung Cheng [Wed, 29 Aug 2018 21:53:56 +0000 (14:53 -0700)]
tcp: change IPv6 flow-label upon receiving spurious retransmission

Currently a Linux IPv6 TCP sender will change the flow label upon
timeouts to potentially steer away from a data path that has gone
bad. However this does not help if the problem is on the ACK path
and the data path is healthy. In this case the receiver is likely
to receive repeated spurious retransmission because the sender
couldn't get the ACKs in time and has recurring timeouts.

This patch adds another feature to mitigate this problem. It
leverages the DSACK states in the receiver to change the flow
label of the ACKs to speculatively re-route the ACK packets.
In order to allow triggering on the second consecutive spurious
RTO, the receiver changes the flow label upon sending a second
consecutive DSACK for a sequence number below RCV.NXT.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet_sched: add missing tcf_lock for act_connmark
Cong Wang [Wed, 29 Aug 2018 17:15:36 +0000 (10:15 -0700)]
net_sched: add missing tcf_lock for act_connmark

According to the new locking rule, we have to take tcf_lock
for both ->init() and ->dump(), as RTNL will be removed.
However, it is missing for act_connmark.

Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoveth: add software timestamping
Michael Walle [Wed, 29 Aug 2018 15:24:11 +0000 (17:24 +0200)]
veth: add software timestamping

Provide a software TX timestamp as well as the ethtool query interface
and report the software timestamp capabilities.

Tested with "ethtool -T" and two linuxptp instances each bound to a
tunnel endpoint.

Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRevert "net: sched: act: add extack for lookup callback"
Cong Wang [Wed, 29 Aug 2018 17:15:35 +0000 (10:15 -0700)]
Revert "net: sched: act: add extack for lookup callback"

This reverts commit 331a9295de23 ("net: sched: act: add extack for lookup callback").

This extack is never used after 6 months... In fact, it can be just
set in the caller, right after ->lookup().

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 1 Sep 2018 00:41:08 +0000 (17:41 -0700)]
Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-09-01

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.

2) BPF verifier improvements by giving each register its own liveness
   chain which allows to simplify and getting rid of skip_callee() logic,
   from Edward.

3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
   and percpu lru hashmap. Also add generic percpu formatted print on
   bpftool so the same can be dumped there, from Yonghong.

4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
   TCP_SAVED_SYN options to allow reflection of tos/tclass from received
   SYN packet, from Nikita.

5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
   interaction and removal of incorrect shutdown() calls, from John.

6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoxsk: i40e: get rid of useless struct xdp_umem_props
Magnus Karlsson [Fri, 31 Aug 2018 11:40:02 +0000 (13:40 +0200)]
xsk: i40e: get rid of useless struct xdp_umem_props

This commit gets rid of the structure xdp_umem_props. It was there to
be able to break a dependency at one point, but this is no longer
needed. The values in the struct are instead stored directly in the
xdp_umem structure. This simplifies the xsk code as well as af_xdp
zero-copy drivers and as a bonus gets rid of one internal header file.

The i40e driver is also adapted to the new interface in this commit.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoi40e: fix possible compiler warning in xsk TX path
Magnus Karlsson [Fri, 31 Aug 2018 11:40:01 +0000 (13:40 +0200)]
i40e: fix possible compiler warning in xsk TX path

With certain gcc versions, it was possible to get the warning
"'tx_desc' may be used uninitialized in this function" for the
i40e_xmit_zc. This was not possible, however this commit simplifies
the code path so that this warning is no longer emitted.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: add selftest for bpf's (set|get)_sockopt for SAVE_SYN
Nikita V. Shirokov [Fri, 31 Aug 2018 16:43:47 +0000 (09:43 -0700)]
bpf: add selftest for bpf's (set|get)_sockopt for SAVE_SYN

adding selftest for feature, introduced in commit 9452048c79404 ("bpf:
add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt").

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agosamples/bpf: xdpsock, minor fixes
Prashant Bhole [Fri, 31 Aug 2018 01:00:49 +0000 (10:00 +0900)]
samples/bpf: xdpsock, minor fixes

- xsks_map size was fixed to 4, changed it MAX_SOCKS
- Remove redundant definition of MAX_SOCKS in xdpsock_user.c
- In dump_stats(), add NULL check for xsks[i]

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoxsk: remove unnecessary assignment
Prashant Bhole [Fri, 31 Aug 2018 00:59:16 +0000 (09:59 +0900)]
xsk: remove unnecessary assignment

Since xdp_umem_query() was added one assignment of bpf.command was
missed from cleanup. Removing the assignment statement.

Fixes: 84c6b86875e01a0 ("xsk: don't allow umem replace at stack level")
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program
Nikita V. Shirokov [Thu, 30 Aug 2018 14:51:54 +0000 (07:51 -0700)]
bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN sample program

Sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example:
bpf program which is doing TOS/TCLASS reflection (server would reply
with a same TOS/TCLASS as client).

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt
Nikita V. Shirokov [Thu, 30 Aug 2018 14:51:53 +0000 (07:51 -0700)]
bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt

Adding support for two new bpf get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). This would allow for bpf program to build
logic based on data from ingress SYN packet (e.g. doing tcp's tos/
tclass reflection (see sample prog)) and do it transparently from
userspace program point of view.

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoxdp: remove redundant variable 'headroom'
Colin Ian King [Thu, 30 Aug 2018 14:27:18 +0000 (15:27 +0100)]
xdp: remove redundant variable 'headroom'

Variable 'headroom' is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
variable ‘headroom’ set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 31 Aug 2018 21:08:34 +0000 (14:08 -0700)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-08-30

This series contains updates to i40e, i40evf and virtchnl.

Jake implements helper functions to use an array to handle the queue
stats which reduces the boiler plate code as well as keep the complexity
localized to a few functions.

Paweł adds the ability to change a VF's MAC address from the host side
without having to reload the VF driver on the guest side.

Paul adds a check to ensure that the number of queues that the PF sends
to the VF is equal to or less than the maximum number of queues the VF
can support.

Mitch fixes an issue caught by GCC 8, where we need to not include the
terminating null in the length of the string for strncpy().

Lihong fixes a VF issue to ensure that it does not enter into
promiscuous mode when macvlan is added to the VF.  Fixed a potential
crash after a VF is removed, since the workqueue sync for the adminq
task was not being cancelled.

Harshitha fixes the type for field_flags in the virtchnl_filter struct.

Martyna removes an unnecessary check in a conditional if statement.

Björn fixes an issue reported by Jesper Dangaard Brouer, where the
driver was reporting incorrect statistics when XDP was enabled.

Jan fixes the potential reporting of incorrect speed settings.

Patryk fixed an issue where the flag
I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING was getting set when any offload is
set via ethtool.  Resolved by only setting this flag when VLAN offload
is enabled.  Also ensure we hold the rtnl lock when we are clearing the
interrupt scheme.  Added a check when deleting the MAC address from the
VF to ensure that the MAC address was not set by the PF and if it was,
do not delete it.

v2: updated patch 2 in the series based on community feedback from David
    Miller to inline a function
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoi40e: Prevent deleting MAC address from VF when set by PF
Patryk Małek [Tue, 28 Aug 2018 17:16:09 +0000 (10:16 -0700)]
i40e: Prevent deleting MAC address from VF when set by PF

To prevent VF from deleting MAC address that was assigned by the
PF we need to check for that scenario when we try to delete a MAC
address from a VF.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: cancel workqueue sync for adminq when a VF is removed
Lihong Yang [Tue, 28 Aug 2018 17:16:08 +0000 (10:16 -0700)]
i40evf: cancel workqueue sync for adminq when a VF is removed

If a VF is being removed, there is no need to continue with the
workqueue sync for the adminq task, thus cancel it. Without this call,
when VFs are created and removed right away, there might be a chance for
the driver to crash with events stuck in the adminq.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: hold the rtnl lock on clearing interrupt scheme
Patryk Małek [Tue, 28 Aug 2018 17:16:03 +0000 (10:16 -0700)]
i40e: hold the rtnl lock on clearing interrupt scheme

Hold the rtnl lock when we're clearing interrupt scheme
in i40e_shutdown and in i40e_remove.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Don't enable vlan stripping when rx offload is turned on
Patryk Małek [Tue, 28 Aug 2018 17:16:02 +0000 (10:16 -0700)]
i40evf: Don't enable vlan stripping when rx offload is turned on

With current implementation of i40evf_set_features when user sets
any offload via ethtool we set I40EVF_FLAG_AQ_ENABLE_VLAN_STRIPPING
as a required aq which triggers driver to call
i40evf_enable_vlan_stripping. This shouldn't take place.
This patches fixes it by setting the flag only when VLAN offload
is turned on.

Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: Check and correct speed values for link on open
Jan Sokolowski [Tue, 28 Aug 2018 17:16:01 +0000 (10:16 -0700)]
i40e: Check and correct speed values for link on open

If our card has been put in an unstable state due to
other drivers interacting with it, speed settings
might be incorrect. If incorrect, forcefully reset them
on open to known default values.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: report correct statistics when XDP is enabled
Björn Töpel [Fri, 24 Aug 2018 11:21:59 +0000 (13:21 +0200)]
i40e: report correct statistics when XDP is enabled

When XDP is enabled, the driver will report incorrect
statistics. Received frames will reported as transmitted frames.

This commits fixes the i40e implementation of ndo_get_stats64 (struct
net_device_ops), so that iproute2 will report correct statistics
(e.g. when running "ip -stats link show dev eth0") even when XDP is
enabled.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: static analysis report from community
Martyna Szapar [Mon, 20 Aug 2018 15:12:33 +0000 (08:12 -0700)]
i40e: static analysis report from community

Static analysis tools report a problem from original driver submission.
Removing unnecessary check in condition.

Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agovirtchnl: use u8 type for a field in the virtchnl_filter struct
Harshitha Ramamurthy [Mon, 20 Aug 2018 15:12:32 +0000 (08:12 -0700)]
virtchnl: use u8 type for a field in the virtchnl_filter struct

The virtchnl_filter struct has a field called field_flags. A previous
commit mistakenly had the type to be a __u8. What we want is for the
field to be an unsigned 8 bit value, so let's just use the existing
kernel type u8 for that.

Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: set IFF_UNICAST_FLT flag for the VF
Lihong Yang [Mon, 20 Aug 2018 15:12:31 +0000 (08:12 -0700)]
i40evf: set IFF_UNICAST_FLT flag for the VF

Set IFF_UNICAST_FLT flag for the VF to prevent it from entering
promiscuous mode when macvlan is added to the VF.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: use correct length for strncpy
Mitch Williams [Mon, 20 Aug 2018 15:12:30 +0000 (08:12 -0700)]
i40e: use correct length for strncpy

Caught by GCC 8. When we provide a length for strncpy, we should not
include the terminating null. So we must tell it one less than the size
of the destination buffer.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Validate the number of queues a PF sends
Paul M Stillwell Jr [Mon, 20 Aug 2018 15:12:29 +0000 (08:12 -0700)]
i40evf: Validate the number of queues a PF sends

A PF can send any number of queues to the VF and the VF may not
be able to support that many. Check to see that the number of
queues is less than or equal to the max number of queues the
VF can have.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: Change a VF mac without reloading the VF driver
Paweł Jabłoński [Mon, 20 Aug 2018 15:12:26 +0000 (08:12 -0700)]
i40evf: Change a VF mac without reloading the VF driver

Add possibility to change a VF mac address from host side
without reloading the VF driver on the guest side. Without
this patch it is not possible to change the VF mac because
executing i40evf_virtchnl_completion function with
VIRTCHNL_OP_GET_VF_RESOURCES opcode resets the VF mac
address to previous value.

Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40evf: update ethtool stats code and use helper functions
Jacob Keller [Mon, 20 Aug 2018 15:12:25 +0000 (08:12 -0700)]
i40evf: update ethtool stats code and use helper functions

Fix a bug in the way we handled VF queues, by always showing stats for
the maximum number of queues, even if they aren't allocated. It is not
safe to change the number of strings reported to ethtool, as grabbing
statistics occurs over multiple ethtool ops for which the rtnl_lock()
cannot be held the entire time.

Avoid this by always reporting queue stats for the maximum number of
queues in the netdevice. Share some of the helper functionality for
adding stats with the PF code in i40e_ethtool_stats.h

This should reduce the chance of potential future bugs, and make adding
new statistics easier.

Note for the queue stats, unlike the PF driver we do not keep an array
of queue pointers, but an array of queues, so care must be taken to
avoid accessing queue memory that hasn't yet been allocated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h
Jacob Keller [Mon, 20 Aug 2018 15:12:24 +0000 (08:12 -0700)]
i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h

Move the boiler plate structures and helper functions we recently
added into their own header file, so that the complete collection is
located together.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoi40e: convert queue stats to i40e_stats array
Jacob Keller [Mon, 20 Aug 2018 15:12:23 +0000 (08:12 -0700)]
i40e: convert queue stats to i40e_stats array

Use an i40e_stats array to handle the queue stats, instead of coding
similar functionality separately. Because of how the queue stats are
accessed on some kernels, we can't easily use i40e_add_ethtool_stats.

Instead, implement a separate helper, i40e_add_queue_stats, which we'll
use instead. This helper will correctly implement the
u64_stats_fetch_begin_irq logic and allow retries until successful. We
share the most complex code by re-using i40e_add_one_ethtool_stat.

This logic additionally easily supports skipping disabled rings by using
a ternary operator before calling the u64_stats_fetch_begin_irq()
function, so that we correctly zero-out the stats values without having
to perform two separate sections of code.

This significantly reduces the boiler plate code in
i40e_get_ethtool_stats, and helps keep the complex logic contained to as
few functions as possible.

With this patch, we've finally converted all the statistics to use the
helpers and the i40e_stats function.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
6 years agoxsk: include XDP meta data in AF_XDP frames
Björn Töpel [Thu, 30 Aug 2018 13:12:48 +0000 (15:12 +0200)]
xsk: include XDP meta data in AF_XDP frames

Previously, the AF_XDP (XDP_DRV/XDP_SKB copy-mode) ingress logic did
not include XDP meta data in the data buffers copied out to the user
application.

In this commit, we check if meta data is available, and if so, it is
prepended to the frame.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge branch 'bpf-bpffs-bpftool-dump-with-btf'
Daniel Borkmann [Thu, 30 Aug 2018 12:03:54 +0000 (14:03 +0200)]
Merge branch 'bpf-bpffs-bpftool-dump-with-btf'

Yonghong Song says:

====================
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the
basic arraymap") and Commit 699c86d6ec21 ("bpf: btf: add pretty print
for hash/lru_hash maps") added bpffs pretty print for array, hash and
lru hash maps. The pretty print gives users a structurally formatted
dump for keys/values which much easy to understand than raw bytes.

This patch set implemented bpffs pretty print support for
percpu arraymap, percpu hashmap and percpu lru hashmap.
For complex key/value types, the pretty print here is even more useful
due to:

  . large volumne of data making it even harder to correlate bytes
    to a particular field in a particular cpu.
  . kernel rounds the value size for each cpu to multiple of 8.
    User has to be aware of this otherwise wrong value may be
    derived from cpu 1/2/...

For example, we may have a bpffs pretty print like below:
   43602: {
        cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
        cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
for a percpu map.

This patch also added percpu formatted print on bpftool. For example,
bpftool may print like below:
    {
        "key": 0,
        "values": [{
                "cpu": 0,
                "value": {
                    "ui32": 0,
                    "ui16": 0,
                }
            },{
                "cpu": 1,
                "value": {
                    "ui32": 1,
                    "ui16": 0,
                }
            },{
                "cpu": 2,
                "value": {
                    "ui32": 2,
                    "ui16": 0,
                }
            },{
                "cpu": 3,
                "value": {
                    "ui32": 3,
                    "ui16": 0,
                }
            }
        ]
    }

Patch #1 implemented bpffs pretty print for percpu arraymap/hash/lru_hash
in kernel. Patch #2 added the test case in tools bpf selftest test_btf.
Patch #3 added percpu map btf based dump.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools/bpf: bpftool: add btf percpu map formated dump
Yonghong Song [Wed, 29 Aug 2018 21:43:15 +0000 (14:43 -0700)]
tools/bpf: bpftool: add btf percpu map formated dump

The btf pretty print is added to percpu arraymap,
percpu hashmap and percpu lru hashmap.
For each <key, value> pair, the following will be
added to plain/json output:

   {
       "key": <pretty_print_key>,
       "values": [{
             "cpu": 0,
             "value": <pretty_print_value_on_cpu0>
          },{
             "cpu": 1,
             "value": <pretty_print_value_on_cpu1>
          },{
          ....
          },{
             "cpu": n,
             "value": <pretty_print_value_on_cpun>
          }
       ]
   }

For example, the following could be part of plain or json formatted
output:
    {
        "key": 0,
        "values": [{
                "cpu": 0,
                "value": {
                    "ui32": 0,
                    "ui16": 0,
                }
            },{
                "cpu": 1,
                "value": {
                    "ui32": 1,
                    "ui16": 0,
                }
            },{
                "cpu": 2,
                "value": {
                    "ui32": 2,
                    "ui16": 0,
                }
            },{
                "cpu": 3,
                "value": {
                    "ui32": 3,
                    "ui16": 0,
                }
            }
        ]
    }

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agotools/bpf: add bpffs percpu map pretty print tests in test_btf
Yonghong Song [Wed, 29 Aug 2018 21:43:14 +0000 (14:43 -0700)]
tools/bpf: add bpffs percpu map pretty print tests in test_btf

The bpf selftest test_btf is extended to test bpffs
percpu map pretty print for percpu array, percpu hash and
percpu lru hash.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agobpf: add bpffs pretty print for percpu arraymap/hash/lru_hash
Yonghong Song [Wed, 29 Aug 2018 21:43:13 +0000 (14:43 -0700)]
bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash

Added bpffs pretty print for percpu arraymap, percpu hashmap
and percpu lru hashmap.

For each map <key, value> pair, the format is:
   <key_value>: {
cpu0: <value_on_cpu0>
cpu1: <value_on_cpu1>
...
cpun: <value_on_cpun>
   }

For example, on my VM, there are 4 cpus, and
for test_btf test in the next patch:
   cat /sys/fs/bpf/pprint_test_percpu_hash

You may get:
   ...
   43602: {
cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
   72847: {
cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
   }
   ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
6 years agoMerge tag 'mac80211-next-for-davem-2018-08-29' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 30 Aug 2018 05:13:47 +0000 (22:13 -0700)]
Merge tag 'mac80211-next-for-davem-2018-08-29' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few changes at this point:
 * new channels in 60 GHz
 * clarify (average) ACK signal reporting API
 * expose ieee80211_send_layer2_update() for all drivers
 * start/stop mac80211's TXQs properly when required
 * avoid regulatory restore with IE ignoring
 * spelling: contidion -> condition
 * fully implement WFA Multi-AP backhaul
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovxlan: reduce dirty cache line in vxlan_find_mac
Li RongQing [Wed, 29 Aug 2018 03:52:10 +0000 (11:52 +0800)]
vxlan: reduce dirty cache line in vxlan_find_mac

vxlan_find_mac() unconditionally set f->used for every packet,
this causes a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cache line, which are
accessed when send every packets.

so f->used is set only if not equal to jiffies, to reduce dirty
cache line times, this gives 3% speed-up with small packets.

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'liquidio-improve-soft-command-response-handling'
David S. Miller [Thu, 30 Aug 2018 03:07:42 +0000 (20:07 -0700)]
Merge branch 'liquidio-improve-soft-command-response-handling'

Weilin Chang says:

====================
liquidio: improve soft command/response handling

Change soft command handling to fix the possible race condition when the
process handles a response of a soft command that was already freed by an
application which got timeout for this request.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: remove obsolete functions and data structures
Felix Manlunas [Wed, 29 Aug 2018 01:51:44 +0000 (18:51 -0700)]
liquidio: remove obsolete functions and data structures

1. Remove unused functions and data structures.
2. Change the sending of the remaining soft commands to synchronous.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: change octnic_ctrl_pkt to do synchronous soft commands
Felix Manlunas [Wed, 29 Aug 2018 01:51:40 +0000 (18:51 -0700)]
liquidio: change octnic_ctrl_pkt to do synchronous soft commands

1. Change struct octnic_ctrl_pkt to support synchronous operation.
2. Change code which use structure octnic_ctrl_pkt to send sc's
   synchronously.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: make soft command calls synchronous
Felix Manlunas [Wed, 29 Aug 2018 01:51:35 +0000 (18:51 -0700)]
liquidio: make soft command calls synchronous

1. Add wait_for_sc_completion_timeout() for waiting the response and
   handling common response errors
2. Send sc's synchronously: remove unused callback function,
   and context structure; use wait_for_sc_completion_timeout() to wait
   its response.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: improve soft command handling
Felix Manlunas [Wed, 29 Aug 2018 01:51:30 +0000 (18:51 -0700)]
liquidio: improve soft command handling

1. Set LIO_SC_MAX_TMO_MS as the maximum timeout value for a soft command
   (sc).  All sc's use this value as a hard timeout value. Add expiry_time
   in struct octeon_soft_command to keep the hard timeout value. The field
   wait_time and timeout in struct octeon_soft_command will be obsoleted in
   the last patch of this patch series.
2. Add processing a synchronous sc in sc response thread
   lio_process_ordered_list. The memory allocated for a synchronous sc will
   be freed by lio_process_ordered_list() to the sc pool.
3. Add two response lists for lio_process_ordered_list to process the
   storage allocated for sc's:
   OCTEON_DONE_SC_LIST response list keeps all sc's which will be freed to
   the pool after their requestors have finished processing the responses.
   OCTEON_ZOMBIE_SC_LIST response list keeps all sc's which have got
   LIO_SC_MAX_TMO_MS timeout.
   When an sc gets a hard timeout, lio_process_order_list() will recheck
   its status 1 ms later. If the status has not updated by the firmware at
   that time, the sc will be removed from OCTEON_DONE_SC_LIST response list
   to OCTEON_ZOMBIE_SC_LIST response list. The sc's in the
   OCTEON_ZOMBIE_SC_LIST response list will be freed when the driver is
   unloaded.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/tls: Calculate nsg for zerocopy path without skb_cow_data.
Doron Roberts-Kedes [Tue, 28 Aug 2018 23:33:57 +0000 (16:33 -0700)]
net/tls: Calculate nsg for zerocopy path without skb_cow_data.

decrypt_skb fails if the number of sg elements required to map it
is greater than MAX_SKB_FRAGS. nsg must always be calculated, but
skb_cow_data adds unnecessary memcpy's for the zerocopy case.

The new function skb_nsg calculates the number of scatterlist elements
required to map the skb without the extra overhead of skb_cow_data.
This patch reduces memcpy by 50% on my encrypted NBD benchmarks.

Reported-by: Vakul Garg <Vakul.garg@nxp.com>
Reviewed-by: Vakul Garg <Vakul.garg@nxp.com>
Tested-by: Vakul Garg <Vakul.garg@nxp.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: nixge: Add support for 64-bit platforms
Moritz Fischer [Tue, 28 Aug 2018 22:16:31 +0000 (15:16 -0700)]
net: nixge: Add support for 64-bit platforms

Add support for 64-bit platforms to driver.

The hardware only supports 32-bit register accesses
so the accesses need to be split up into two writes
when setting the current and tail descriptor values.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoselftests/net: add ip_defrag selftest
Peter Oskolkov [Tue, 28 Aug 2018 18:36:20 +0000 (11:36 -0700)]
selftests/net: add ip_defrag selftest

This test creates a raw IPv4 socket, fragments a largish UDP
datagram and sends the fragments out of order.

Then repeats in a loop with different message and fragment lengths.

Then does the same with overlapping fragments (with overlapping
fragments the expectation is that the recv times out).

Tested:

root@<host># time ./ip_defrag.sh
ipv4 defrag
PASS
ipv4 defrag with overlaps
PASS

real    1m7.679s
user    0m0.628s
sys     0m2.242s

A similar test for IPv6 is to follow.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoip: fail fast on IP defrag errors
Peter Oskolkov [Tue, 28 Aug 2018 18:36:19 +0000 (11:36 -0700)]
ip: fail fast on IP defrag errors

The current behavior of IP defragmentation is inconsistent:
- some overlapping/wrong length fragments are dropped without
  affecting the queue;
- most overlapping fragments cause the whole frag queue to be dropped.

This patch brings consistency: if a bad fragment is detected,
the whole frag queue is dropped. Two major benefits:
- fail fast: corrupted frag queues are cleared immediately, instead of
  by timeout;
- testing of overlapping fragments is now much easier: any kind of
  random fragment length mutation now leads to the frag queue being
  discarded (IP packet dropped); before this patch, some overlaps were
  "corrected", with tests not seeing expected packet drops.

Note that in one case (see "if (end&7)" conditional) the current
behavior is preserved as there are concerns that this could be
legitimate padding.

Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: fix race condition in instruction completion processing
Rick Farrington [Tue, 28 Aug 2018 18:32:55 +0000 (11:32 -0700)]
liquidio: fix race condition in instruction completion processing

In lio_enable_irq, the pkt_in_done count register was being cleared to
zero.  However, there could be some completed instructions which were not
yet processed due to budget and limit constraints.
So, only write this register with the number of actual completions
that were processed.

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: remove unnecessary delay when processing IQ responses
Rick Farrington [Tue, 28 Aug 2018 18:19:54 +0000 (11:19 -0700)]
liquidio: remove unnecessary delay when processing IQ responses

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ethtool-drop-get_settings-and-set_settings-ops'
David S. Miller [Thu, 30 Aug 2018 02:46:11 +0000 (19:46 -0700)]
Merge branch 'ethtool-drop-get_settings-and-set_settings-ops'

Michal Kubecek says:

====================
ethtool: drop get_settings and set_settings ops

As Andrew Lunn pointed out in recent discussion, there is only one in tree
driver left which still defines deprecated callbacks get_settings() and
set_settings() in ethtool_ops. First patch converts this driver to
get_link_ksettings() and set_link_ksettings(). Second patch then removes
the deprecated callbacks from struct ethtool_ops and ethtool code which
falls back to them.

This doesn't break old versions of ethtool or any other userspace code
using ETHTOOL_{G,S}SET. We still implement both (old) ETHTOOL_{G,S}SET and
(new) ETHTOOL_{G,S}LINKSETTINGS ioctl commands but after this series both
will be implemented only using {g,s}et_link_ksettings(). The only affected
code would be out of tree NIC drivers which have not been converted yet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoethtool: drop get_settings and set_settings callbacks
Michal Kubecek [Tue, 28 Aug 2018 17:56:58 +0000 (19:56 +0200)]
ethtool: drop get_settings and set_settings callbacks

Since [gs]et_settings ethtool_ops callbacks have been deprecated in
February 2016, all in tree NIC drivers have been converted to provide
[gs]et_link_ksettings() and out of tree drivers have had enough time to do
the same.

Drop get_settings() and set_settings() and implement both ETHTOOL_[GS]SET
and ETHTOOL_[GS]LINKSETTINGS only using [gs]et_link_ksettings().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years ago8390/etherh: convert to ethtool_{get, set}_link_ksettings
Michal Kubecek [Tue, 28 Aug 2018 17:56:53 +0000 (19:56 +0200)]
8390/etherh: convert to ethtool_{get, set}_link_ksettings

This is the last in-tree driver using the old {get,set}_settings API.

Note: this is only build tested. I don't have the hardware at hand; as it's
10Mb/s half duplex device and driver can be built only for one subplatform
of 32-bit ARM (Acorn RiscPC), it may be difficult to find someone who does.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: thunderbolt: Convert to use SPDX identifier
Mika Westerberg [Tue, 28 Aug 2018 16:58:43 +0000 (19:58 +0300)]
net: thunderbolt: Convert to use SPDX identifier

This gets rid of the licence boilerblate in favor of SPDX identifier
which only takes a single line comment.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agogenetlink: constify genl_err_attr() argument
Michal Kubecek [Tue, 28 Aug 2018 16:51:58 +0000 (18:51 +0200)]
genetlink: constify genl_err_attr() argument

genl_err_attr() sets netlink_ext_ack::bad_attr which is a pointer to const
struct nlattr so make the attr argument also const.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: Convert to using %pOFn instead of device_node.name
Rob Herring [Tue, 28 Aug 2018 15:44:30 +0000 (10:44 -0500)]
net: ethernet: Convert to using %pOFn instead of device_node.name

In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Felix Fietkau <nbd@openwrt.org>
Cc: John Crispin <john@phrozen.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Nelson Chang <nelson.chang@mediatek.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Acked-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ncsi: remove duplicated include from ncsi-netlink.c
YueHaibing [Thu, 30 Aug 2018 01:29:39 +0000 (01:29 +0000)]
net/ncsi: remove duplicated include from ncsi-netlink.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'verifier-liveness-simplification'
Alexei Starovoitov [Thu, 30 Aug 2018 01:52:13 +0000 (18:52 -0700)]
Merge branch 'verifier-liveness-simplification'

Edward Cree says:

====================
The first patch is a simplification of register liveness tracking by using
 a separate parentage chain for each register and stack slot, thus avoiding
 the need for logic to handle callee-saved registers when applying read
 marks.  In the future this idea may be extended to form use-def chains.
The second patch adds information about misc/zero data on the stack to the
 state dumps emitted to the log at various points; this information was
 found essential in debugging the first patch, and may be useful elsewhere.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agobpf/verifier: display non-spill stack slot types in print_verifier_state
Edward Cree [Wed, 22 Aug 2018 19:02:44 +0000 (20:02 +0100)]
bpf/verifier: display non-spill stack slot types in print_verifier_state

If a stack slot does not hold a spilled register (STACK_SPILL), then each
 of its eight bytes could potentially have a different slot_type.  This
 information can be important for debugging, and previously we either did
 not print anything for the stack slot, or just printed fp-X=0 in the case
 where its first byte was STACK_ZERO.
Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC)
 or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor
 entirely STACK_INVALID.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agobpf/verifier: per-register parent pointers
Edward Cree [Wed, 22 Aug 2018 19:02:19 +0000 (20:02 +0100)]
bpf/verifier: per-register parent pointers

By giving each register its own liveness chain, we elide the skip_callee()
 logic.  Instead, each register's parent is the state it inherits from;
 both check_func_call() and prepare_func_exit() automatically connect
 reg states to the correct chain since when they copy the reg state across
 (r1-r5 into the callee as args, and r0 out as the return value) they also
 copy the parent pointer.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agoMerge branch 'AF_XDP-zerocopy-for-i40e'
Alexei Starovoitov [Wed, 29 Aug 2018 19:26:39 +0000 (12:26 -0700)]
Merge branch 'AF_XDP-zerocopy-for-i40e'

Björn Töpel says:

====================
This patch set introduces zero-copy AF_XDP support for Intel's i40e
driver. In the first preparatory patch we also add support for
XDP_REDIRECT for zero-copy allocated frames so that XDP programs can
redirect them. This was a ToDo from the first AF_XDP zero-copy patch
set from early June. Special thanks to Alex Duyck and Jesper Dangaard
Brouer for reviewing earlier versions of this patch set.

The i40e zero-copy code is located in its own file i40e_xsk.[ch]. Note
that in the interest of time, to get an AF_XDP zero-copy implementation
out there for people to try, some code paths have been copied from the
XDP path to the zero-copy path. It is out goal to merge the two paths
in later patch sets.

In contrast to the implementation from beginning of June, this patch
set does not require any extra HW queues for AF_XDP zero-copy
TX. Instead, the XDP TX HW queue is used for both XDP_REDIRECT and
AF_XDP zero-copy TX.

Jeff, given that most of changes are in i40e, it is up to you how you
would like to route these patches. The set is tagged bpf-next, but
if taking it via the Intel driver tree is easier, let us know.

We have run some benchmarks on a dual socket system with two Broadwell
E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
cores which gives a total of 28, but only two cores are used in these
experiments. One for TR/RX and one for the user space application. The
memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
8192MB and with 8 of those DIMMs in the system we have 64 GB of total
memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0. The
NIC is Intel I40E 40Gbit/s using the i40e driver.

Below are the results in Mpps of the I40E NIC benchmark runs for 64
and 1500 byte packets, generated by a commercial packet generator HW
outputing packets at full 40 Gbit/s line rate. The results are with
retpoline and all other spectre and meltdown fixes, so these results
are not comparable to the ones from the zero-copy patch set in June.

AF_XDP performance 64 byte packets.
Benchmark   XDP_SKB    XDP_DRV    XDP_DRV with zerocopy
rxdrop       2.6        8.2         15.0
txpush       2.2        -           21.9
l2fwd        1.7        2.3         11.3

AF_XDP performance 1500 byte packets:
Benchmark   XDP_SKB   XDP_DRV     XDP_DRV with zerocopy
rxdrop       2.0        3.3         3.3
l2fwd        1.3        1.7         3.1

XDP performance on our system as a base line:

64 byte packets:
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      16      18.4M  0

1500 byte packets:
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      16      3.3M    0

The structure of the patch set is as follows:

Patch 1: Add support for XDP_REDIRECT of zero-copy allocated frames
Patches 2-4: Preparatory patches to common xsk and net code
Patches 5-7: Preparatory patches to i40e driver code for RX
Patch 8: i40e zero-copy support for RX
Patch 9: Preparatory patch to i40e driver code for TX
Patch 10: i40e zero-copy support for TX
Patch 11: Add flags to sample application to force zero-copy/copy mode
====================

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agosamples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock
Björn Töpel [Tue, 28 Aug 2018 12:44:35 +0000 (14:44 +0200)]
samples/bpf: add -c/--copy -z/--zero-copy flags to xdpsock

The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy
mode.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 years agoi40e: add AF_XDP zero-copy Tx support
Magnus Karlsson [Tue, 28 Aug 2018 12:44:34 +0000 (14:44 +0200)]
i40e: add AF_XDP zero-copy Tx support

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
i40e_xsk.c.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>