Gal Pressman [Tue, 13 Feb 2018 08:31:26 +0000 (10:31 +0200)]
net/mlx5e: Vxlan, cleanup an unused member in vxlan work
Cleanup the sa_family member of the vxlan work, it is unused/needed
anywhere in the code.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Tue, 26 Dec 2017 16:27:08 +0000 (18:27 +0200)]
net/mlx5e: Vxlan, replace ports radix-tree with hash table
The VXLAN database is accessed in the data path for each VXLAN TX skb in
order to check whether the UDP port is being offloaded or not.
The number of elements in the database is relatively small, we can
simplify the radix-tree to a hash table and speedup the lookup process.
Measuring mlx5e_vxlan_lookup_port execution time:
Radix Tree Hash Table
--------------- ------------ ------------
Single Stream 161 ns 79 ns (51% improvement)
Multi Stream 259 ns 136 ns (47% improvement)
Measuring UDP stream packet rate, single fully utilized TX core:
Radix Tree: 498,300 PPS
Hash Table: 555,468 PPS (11% improvement)
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Mon, 25 Dec 2017 16:40:52 +0000 (18:40 +0200)]
net/mlx5e: Vxlan, check maximum number of UDP ports
The NIC has a limited number of offloaded VXLAN UDP ports (usually 4).
Instead of letting the firmware fail when trying to add more ports than
it can handle, let the driver check it on its own.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Wed, 17 Jan 2018 09:02:31 +0000 (11:02 +0200)]
net/mlx5e: Vxlan, reflect 4789 UDP port default addition to software database
The hardware offloads 4789 UDP port (default VXLAN port) automatically.
Add it to the software database as well in order to reflect the hardware
state appropriately.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jiri Pirko [Fri, 27 Jul 2018 07:45:05 +0000 (09:45 +0200)]
net: sched: don't dump chains only held by actions
In case a chain is empty and not explicitly created by a user,
such chain should not exist. The only exception is if there is
an action "goto chain" pointing to it. In that case, don't show the
chain in the dump. Track the chain references held by actions and
use them to find out if a chain should or should not be shown
in chain dump.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Jul 2018 16:33:37 +0000 (09:33 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2018-07-27
1) Extend the output_mark to also support the input direction
and masking the mark values before applying to the skb.
2) Add a new lookup key for the upcomming xfrm interfaces.
3) Extend the xfrm lookups to match xfrm interface IDs.
4) Add virtual xfrm interfaces. The purpose of these interfaces
is to overcome the design limitations that the existing
VTI devices have.
The main limitations that we see with the current VTI are the
following:
VTI interfaces are L3 tunnels with configurable endpoints.
For xfrm, the tunnel endpoint are already determined by the SA.
So the VTI tunnel endpoints must be either the same as on the
SA or wildcards. In case VTI tunnel endpoints are same as on
the SA, we get a one to one correlation between the SA and
the tunnel. So each SA needs its own tunnel interface.
On the other hand, we can have only one VTI tunnel with
wildcard src/dst tunnel endpoints in the system because the
lookup is based on the tunnel endpoints. The existing tunnel
lookup won't work with multiple tunnels with wildcard
tunnel endpoints. Some usecases require more than on
VTI tunnel of this type, for example if somebody has multiple
namespaces and every namespace requires such a VTI.
VTI needs separate interfaces for IPv4 and IPv6 tunnels.
So when routing to a VTI, we have to know to which address
family this traffic class is going to be encapsulated.
This is a lmitation because it makes routing more complex
and it is not always possible to know what happens behind the
VTI, e.g. when the VTI is move to some namespace.
VTI works just with tunnel mode SAs. We need generic interfaces
that ensures transfomation, regardless of the xfrm mode and
the encapsulated address family.
VTI is configured with a combination GRE keys and xfrm marks.
With this we have to deal with some extra cases in the generic
tunnel lookup because the GRE keys on the VTI are actually
not GRE keys, the GRE keys were just reused for something else.
All extensions to the VTI interfaces would require to add
even more complexity to the generic tunnel lookup.
So to overcome this, we developed xfrm interfaces with the
following design goal:
It should be possible to tunnel IPv4 and IPv6 through the same
interface.
No limitation on xfrm mode (tunnel, transport and beet).
Should be a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind the
interface.
Interfaces should be configured with a new key that must match a
new policy/SA lookup key.
The lookup logic should stay in the xfrm codebase, no need to
change or extend generic routing and tunnel lookups.
Should be possible to use IPsec hardware offloads of the underlying
interface.
5) Remove xfrm pcpu policy cache. This was added after the flowcache
removal, but it turned out to make things even worse.
From Florian Westphal.
6) Allow to update the set mark on SA updates.
From Nathan Harold.
7) Convert some timestamps to time64_t.
From Arnd Bergmann.
8) Don't check the offload_handle in xfrm code,
it is an opaque data cookie for the driver.
From Shannon Nelson.
9) Remove xfrmi interface ID from flowi. After this pach
no generic code is touched anymore to do xfrm interface
lookups. From Benedict Wong.
10) Allow to update the xfrm interface ID on SA updates.
From Nathan Harold.
11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
From YueHaibing.
12) Return more detailed errors on xfrm interface creation.
From Benedict Wong.
13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
From the kbuild test robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
kbuild test robot [Thu, 26 Jul 2018 07:09:52 +0000 (15:09 +0800)]
xfrm: fix ptr_ret.cocci warnings
net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 44e2b838c24d ("xfrm: Return detailed errors from xfrmi_newlink")
CC: Benedict Wong <benedictwong@google.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
David S. Miller [Fri, 27 Jul 2018 04:33:24 +0000 (21:33 -0700)]
Merge tag 'mlx5e-updates-2018-07-26' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-07-26 (XDP redirect)
This series from Tariq adds the support for device-out XDP redirect.
Start with a simple RX and XDP cleanups:
- Replace call to MPWQE free with dealloc in interface down flow
- Do not recycle RX pages in interface down flow
- Gather all XDP pre-requisite checks in a single function
- Restrict the combination of large MTU and XDP
Since now XDP logic is going to be called from TX side as well,
generic XDP TX logic is not RX only anymore, for that Tariq creates
a new xdp.c file and moves XDP related code into it, and generalizes
the code to support XDP TX for XDP redirect, such as the xdp tx sq
structures and xdp counters.
XDP redirect support:
Add implementation for the ndo_xdp_xmit callback.
Dedicate a new set of XDP-SQ instances to satisfy the XDP_REDIRECT
requests. These instances are totally separated from the existing
XDP-SQ objects that satisfy local XDP_TX actions.
Performance tests:
xdp_redirect_map from ConnectX-5 to ConnectX-5.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet-rate of 64B packets.
Single queue: 7 Mpps.
Multi queue: 55 Mpps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 26 Jul 2018 21:25:26 +0000 (14:25 -0700)]
netdevsim: make debug dirs' dentries static
The root directories of netdevsim should only be used by the core
to create per-device subdirectories, so limit their visibility to
the core file.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 27 Jul 2018 04:27:54 +0000 (21:27 -0700)]
Merge branch 'docs-net-Convert-netdev-FAQ-to-RST'
Tobin C. Harding says:
====================
docs: net: Convert netdev-FAQ to RST
Jon answered all the tree questions on v1 so if you will please take
this through your tree that would be awesome.
v2:
- Fix typo 'canonical_path_format' (thanks Edward)
- Add patch fixing references netdev-FAQ
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobin C. Harding [Thu, 26 Jul 2018 05:02:26 +0000 (15:02 +1000)]
docs: Update references to netdev-FAQ
File 'Documentation/networking/netdev-FAQ.txt' has been converted to RST
format. We should update all links/references to point to the new file.
Update references to netdev-FAQ
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobin C. Harding [Thu, 26 Jul 2018 05:02:25 +0000 (15:02 +1000)]
docs: net: Convert netdev-FAQ to restructured text
Preferred kernel docs format is now restructured text. Convert
netdev-FAQ.txt to restructured text.
- Add SPDX license identifier.
- Change file heading 'Information you need to know about netdev' to
'netdev FAQ' to better suit displayed index (in HTML).
- Change question/answer layout to suit rst. Copy format in
Documentation/bpf/bpf_devel_QA.rst
- Fix indentation of code snippets
- If multiple consecutive URLs appear put them in a list (to maintain
whitespace).
- Use uniform spelling of 'bug fix' throughout document (not bugfix or
bug-fix).
- Add double back ticks to 'net' and 'net-next' when referring to the
trees.
- Use rst references for Documentation/ links.
- Add rst label 'netdev-FAQ' for referencing by other docs files.
- Remove stale entry from Documentation/networking/00-INDEX
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobin C. Harding [Thu, 26 Jul 2018 05:02:24 +0000 (15:02 +1000)]
docs: Add rest label the_canonical_patch_format
In preparation to convert Documentation/network/netdev-FAQ.rst to
restructured text format. We would like to be able to reference 'the
canonical patch format' section.
Add rest label: 'the_canonical_patch_format'.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Fri, 27 Jul 2018 03:51:06 +0000 (11:51 +0800)]
net: adaptec: Replace mdelay() with msleep() in starfire_init_one()
starfire_init_one() is never called in atomic context.
It calls mdelay() to busily wait, which is not necessary.
mdelay() can be replaced with msleep().
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Fri, 27 Jul 2018 02:48:28 +0000 (10:48 +0800)]
isdn: hisax: config: Replace GFP_ATOMIC with GFP_KERNEL
hisax_cs_new() and hisax_cs_setup() are never called in atomic context.
They call kmalloc() and kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Fri, 27 Jul 2018 02:45:30 +0000 (10:45 +0800)]
isdn: hisax: callc: Replace GFP_ATOMIC with GFP_KERNEL in init_PStack()
init_PStack() is never called in atomic context.
It calls kmalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Fri, 27 Jul 2018 02:41:09 +0000 (10:41 +0800)]
isdn: mISDN: netjet: Replace GFP_ATOMIC with GFP_KERNEL in nj_probe()
nj_probe() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jia-Ju Bai [Fri, 27 Jul 2018 02:39:06 +0000 (10:39 +0800)]
isdn: mISDN: hfcpci: Replace GFP_ATOMIC with GFP_KERNEL in hfc_probe()
hfc_probe() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 27 Jul 2018 01:53:12 +0000 (09:53 +0800)]
net: hns: make hns_dsaf_roce_reset non static
hns_dsaf_roce_reset is exported and used in hns_roce_hw_v1.c
In commit
336a443bd9dd ("net: hns: Make many functions static") I make
it static wrongly.
drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function `hns_roce_v1_reset':
hns_roce_hw_v1.c:(.text+0x37ac): undefined reference to `hns_dsaf_roce_reset'
hns_roce_hw_v1.c:(.text+0x37cc): undefined reference to `hns_dsaf_roce_reset'
Fixes: 336a443bd9dd ("net: hns: Make many functions static")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Mon, 20 Nov 2017 11:34:15 +0000 (13:34 +0200)]
net/mlx5e: TX, Use function to access sq_dma object in fifo
Use designated function mlx5e_dma_get() to get
the mlx5e_sq_dma object to be pushed into fifo.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 22 May 2018 14:06:38 +0000 (17:06 +0300)]
net/mlx5e: TX, Move DB fields in TXQ-SQ struct
Pointers in DB are static, move them to read-only area so they
do not share a cacheline with fields modified in datapath.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 16 May 2018 07:46:57 +0000 (10:46 +0300)]
net/mlx5e: RX, Prefetch the xdp_frame data area
A loaded XDP program might write to the xdp_frame data area,
prefetchw() it to avoid a potential cache miss.
Performance tests:
ConnectX-5, XDP_TX packet rate, single ring.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Before: 13,172,976 pps
After: 13,456,248 pps
2% gain.
Fixes: 22f453988194 ("net/mlx5e: Support XDP over Striding RQ")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 22 May 2018 13:48:48 +0000 (16:48 +0300)]
net/mlx5e: Add support for XDP_REDIRECT in device-out side
Add implementation for the ndo_xdp_xmit callback.
Dedicate a new set of XDP-SQ instances to satisfy the XDP_REDIRECT
requests. These instances are totally separated from the existing
XDP-SQ objects that satisfy local XDP_TX actions.
Performance tests:
xdp_redirect_map from ConnectX-5 to ConnectX-5.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet-rate of 64B packets.
Single queue: 7 Mpps.
Multi queue: 55 Mpps.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 22 May 2018 13:43:54 +0000 (16:43 +0300)]
net/mlx5e: Re-order fields of struct mlx5e_xdpsq
In the downstream patch that adds support to XDP_REDIRECT-out,
the XDP xmit frame function doesn't share the same run context as
the NAPI that polls the XDP-SQ completion queue.
Hence, need to re-order the XDP-SQ fields to avoid cacheline
false-sharing.
Take redirect_flush and doorbell out of DB, into separated
cachelines.
Add a cacheline breaker within the stats struct.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 22 May 2018 13:29:31 +0000 (16:29 +0300)]
net/mlx5e: Refactor XDP counters
Separate the XDP counters into two sets:
(1) One set reside in the RQ stats, and they monitor XDP stats
in the RQ side.
(2) Another set is per XDP-SQ, and they monitor XDP stats that
are related to XDP transmit flow.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 15 Jul 2018 07:34:39 +0000 (10:34 +0300)]
net/mlx5e: Make XDP xmit functions more generic
Convert the XDP xmit functions to use the generic xdp_frame API
in XDP_TX flow.
Same functions will be used later in this series to transmit
the XDP redirect-out packets as well.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 16 May 2018 07:16:30 +0000 (10:16 +0300)]
net/mlx5e: Add counter for XDP redirect in RX
Add per-ring and total stats for received packets that
goes into XDP redirection.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 15 Jul 2018 07:28:44 +0000 (10:28 +0300)]
net/mlx5e: Move XDP related code into new XDP files
Take XDP code out of the general EN header and RX file into
new XDP files.
Currently, XDP-SQ resides only within an RQ and used from a
single flow (XDP_TX) triggered upon RX completions.
In a downstream patch, additional type of XDP-SQ instances will be
presented and used for the XDP_REDIRECT flow, totally unrelated to
the RX context.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 31 Dec 2017 13:50:13 +0000 (15:50 +0200)]
net/mlx5e: Restrict the combination of large MTU and XDP
Add checks in control path upon an MTU change or an XDP program set,
to prevent reaching cases where large MTU and XDP are set simultaneously.
This is to make sure we allow XDP only with the linear RX memory scheme,
i.e. a received packet is not scattered to different pages.
Change mlx5e_rx_get_linear_frag_sz() accordingly, so that we make sure
the XDP configuration can really be set, instead of assuming that it is.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Mon, 12 Mar 2018 16:26:51 +0000 (18:26 +0200)]
net/mlx5e: Gather all XDP pre-requisite checks in a single function
Dedicate a function to all checks done when setting an XDP program.
Take indications from priv instead of netdev features.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 12 Jun 2018 07:09:24 +0000 (10:09 +0300)]
net/mlx5e: Do not recycle RX pages in interface down flow
Keep all page-pool recycle calls within NAPI context.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 12 Jun 2018 07:08:43 +0000 (10:08 +0300)]
net/mlx5e: Replace call to MPWQE free with dealloc in interface down flow
No need to expose the MPWQE free function to control path.
The dealloc function already exposed, use it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Thu, 26 Jul 2018 21:14:01 +0000 (14:14 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-07-26
This series contains updates to ixgbe and igb.
Tony fixes ixgbe to add checks to ensure jumbo frames or LRO get enabled
after an XDP program is loaded.
Shannon Nelson adds the missing security configuration registers to the
ixgbe register dump, which will help in debugging.
Christian Grönke fixes an issue in igb that occurs on SGMII based SPF
mdoules, by reverting changes from 2 previous patches. The issue was
that initialization would fail on the fore mentioned modules because the
driver would try to reset the PHY before obtaining the PHY address of
the SGMII attached PHY.
Venkatesh Srinivas replaces wmb() with dma_wmb() for doorbell writes,
which avoids SFENCEs before the doorbell writes.
Alex cleans up and refactors ixgbe Tx/Rx shutdown to reduce time needed
to stop the device. The code refactor allows us to take the completion
time into account when disabling queues, so that on some platforms with
higher completion times, would not result in receive queues disabled
messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 26 Jul 2018 16:27:58 +0000 (18:27 +0200)]
net: sched: unmark chain as explicitly created on delete
Once user manually deletes the chain using "chain del", the chain cannot
be marked as explicitly created anymore.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi")
Signed-off-by: David S. Miller <davem@davemloft.net>
Doron Roberts-Kedes [Wed, 25 Jul 2018 21:48:21 +0000 (14:48 -0700)]
tls: Skip zerocopy path for ITER_KVEC
The zerocopy path ultimately calls iov_iter_get_pages, which defines the
step function for ITER_KVECs as simply, return -EFAULT. Taking the
non-zerocopy path for ITER_KVECs avoids the unnecessary fallback.
See https://lore.kernel.org/lkml/
20150401023311.GL29656@ZenIV.linux.org.uk/T/#u
for a discussion of why zerocopy for vmalloc data is not a good idea.
Discovered while testing NBD traffic encrypted with ktls.
Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Wed, 25 Jul 2018 14:07:24 +0000 (09:07 -0500)]
net: sched: cls_api: fix dead code in switch
Code at line 1850 is unreachable. Fix this by removing the break
statement above it, so the code for case RTM_GETCHAIN can be
properly executed.
Addresses-Coverity-ID:
1472050 ("Structurally dead code")
Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Wed, 25 Jul 2018 12:53:33 +0000 (14:53 +0200)]
l2tp: remove ->recv_payload_hook
The tunnel reception hook is only used by l2tp_ppp for skipping PPP
framing bytes. This is a session specific operation, but once a PPP
session sets ->recv_payload_hook on its tunnel, all frames received by
the tunnel will enter pppol2tp_recv_payload_hook(), including those
targeted at Ethernet sessions (an L2TPv3 tunnel can multiplex PPP and
Ethernet sessions).
So this mechanism is wrong, and uselessly complex. Let's just move this
functionality to the pppol2tp rx handler and drop ->recv_payload_hook.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Wed, 25 Jul 2018 10:00:49 +0000 (18:00 +0800)]
tipc: add missing dev_put() on error in tipc_enable_l2_media
when tipc_own_id failed to obtain node identity,dev_put should
be call before return -EINVAL.
Fixes: 682cd3cf946b ("tipc: confgiure and apply UDP bearer MTU on running links")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vakul Garg [Tue, 24 Jul 2018 11:24:27 +0000 (16:54 +0530)]
net/tls: Removed redundant checks for non-NULL
Removed checks against non-NULL before calling kfree_skb() and
crypto_free_aead(). These functions are safe to be called with NULL
as an argument.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinicius Costa Gomes [Tue, 24 Jul 2018 00:08:00 +0000 (17:08 -0700)]
cbs: Add support for the graft function
This will allow to install a child qdisc under cbs. The main use case
is to install ETF (Earliest TxTime First) qdisc under cbs, so there's
another level of control for time-sensitive traffic.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 26 Jul 2018 13:19:58 +0000 (21:19 +0800)]
net: hns: Make many functions static
Fixes the following sparse warning:
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:73:20: warning: symbol 'hns_ae_get_handle' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:332:6: warning: symbol 'hns_ae_stop' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:360:6: warning: symbol 'hns_ae_toggle_ring_irq' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:580:6: warning: symbol 'hns_ae_update_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:663:6: warning: symbol 'hns_ae_get_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:695:6: warning: symbol 'hns_ae_get_strings' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:728:5: warning: symbol 'hns_ae_get_sset_count' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:774:6: warning: symbol 'hns_ae_update_led_status' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:786:5: warning: symbol 'hns_ae_cpld_set_led_id' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:798:6: warning: symbol 'hns_ae_get_regs' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:823:5: warning: symbol 'hns_ae_get_regs_len' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:342:6: warning: symbol 'hns_gmac_update_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c:934:12: warning: symbol 'hns_mac_get_vaddr' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c:953:5: warning: symbol 'hns_mac_get_cfg' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:343:6: warning: symbol 'hns_dsaf_srst_chns' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:366:1: warning: symbol 'hns_dsaf_srst_chns_acpi' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:373:6: warning: symbol 'hns_dsaf_roce_srst' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:387:6: warning: symbol 'hns_dsaf_roce_srst_acpi' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:571:5: warning: symbol 'hns_mac_get_sfp_prsnt' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c:589:5: warning: symbol 'hns_mac_get_sfp_prsnt_acpi' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:31:12: warning: symbol 'g_dsaf_mode_match' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:45:5: warning: symbol 'hns_dsaf_get_cfg' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:962:6: warning: symbol 'hns_dsaf_tcam_addr_get' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:2087:6: warning: symbol 'hns_dsaf_port_work_rate_cfg' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:2837:5: warning: symbol 'hns_dsaf_roce_reset' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c:76:5: warning: symbol 'hns_ppe_common_get_cfg' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c:107:6: warning: symbol 'hns_ppe_common_free_cfg' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c:340:6: warning: symbol 'hns_ppe_uninit_ex' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c:708:5: warning: symbol 'hns_rcb_get_ring_num' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c:744:14: warning: symbol 'hns_rcb_common_get_vaddr' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c:314:6: warning: symbol 'hns_xgmac_update_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1303:6: warning: symbol 'hns_nic_update_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1585:6: warning: symbol 'hns_nic_poll_controller' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1938:6: warning: symbol 'hns_set_multicast_list' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1960:6: warning: symbol 'hns_nic_set_rx_mode' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:661:6: warning: symbol 'hns_get_ringparam' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:811:6: warning: symbol 'hns_get_channels' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:828:6: warning: symbol 'hns_get_ethtool_stats' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:886:6: warning: symbol 'hns_get_strings' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:976:5: warning: symbol 'hns_get_sset_count' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:1010:5: warning: symbol 'hns_phy_led_set' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:1032:5: warning: symbol 'hns_set_phys_id' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c:1106:6: warning: symbol 'hns_get_regs' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anders Roxell [Thu, 26 Jul 2018 09:53:58 +0000 (11:53 +0200)]
selftests/net: add tls to .gitignore
Add the tls binary to .gitignore
Fixes: 7f657d5bf507 ("selftests: tls: add selftests for TLS sockets")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 26 Jul 2018 09:38:34 +0000 (11:38 +0200)]
selftests: forwarding: add tests for TC chain get and dump operations
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 20 Jul 2018 22:29:34 +0000 (18:29 -0400)]
ixgbe: Refactor queue disable logic to take completion time into account
This change is meant to allow us to take completion time into account when
disabling queues. Previously we were just working with hard coded values
for how long we should wait. This worked fine for the standard case where
completion timeout was operating in the 50us to 50ms range, however on
platforms that have higher completion timeout times this was resulting in
Rx queues disable messages being displayed as we weren't waiting long
enough for outstanding Rx DMA completions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Don Buchholz <donald.buchholz@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 20 Jul 2018 22:29:29 +0000 (18:29 -0400)]
ixgbe: Reorder Tx/Rx shutdown to reduce time needed to stop device
This change is meant to help reduce the time needed to shutdown the
transmit and receive paths for the device. Specifically what we now do
after this patch is disable the transmit path first at the netdev level,
and then work on disabling the Rx. This way while we are waiting on the Rx
queues to be disabled the Tx queues have an opportunity to drain out.
In addition I have dropped the 10ms timeout that was left in the ixgbe_down
function that seems to have been carried through from back in e1000 as far
as I can tell. We shouldn't need it since we don't actually disable the Tx
until much later and we have additional logic in place for verifying the Tx
queues have been disabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Don Buchholz <donald.buchholz@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Venkatesh Srinivas [Fri, 25 May 2018 04:13:21 +0000 (00:13 -0400)]
igb: Use dma_wmb() instead of wmb() before doorbell writes
igb writes to doorbells to post transmit and receive descriptors;
after writing descriptors to memory but before writing to doorbells,
use dma_wmb() rather than wmb(). wmb() is more heavyweight than
necessary before doorbell writes.
On x86, this avoids SFENCEs before doorbell writes in both the
tx and rx refill paths.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Christian Grönke [Tue, 26 Jun 2018 10:12:18 +0000 (10:12 +0000)]
igb: Remove superfluous reset to PHY and page 0 selection
This patch reverts two previous applied patches to fix an issue
that appeared when using SGMII based SFP modules. In the current
state the driver will try to reset the PHY before obtaining the
phy_addr of the SGMII attached PHY. That leads to an error in
e1000_write_phy_reg_sgmii_82575. Causing the initialization to
fail:
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
igb: Copyright (c) 2007-2014 Intel Corporation.
igb: probe of ????:??:??.? failed with error -3
The patches being reverted are:
commit
182785335447957409282ca745aa5bc3968facee
Author: Aaron Sierra <asierra@xes-inc.com>
Date: Tue Nov 29 10:03:56 2016 -0600
igb: reset the PHY before reading the PHY ID
commit
440aeca4b9858248d8f16d724d9fa87a4f65fa33
Author: Matwey V Kornilov <matwey@sai.msu.ru>
Date: Thu Nov 24 13:32:48 2016 +0300
igb: Explicitly select page 0 at initialization
The first reverted patch directly causes the problem mentioned above.
In case of SGMII the phy_addr is not known at this point and will
only be obtained by 'igb_get_phy_id_82575' further down in the code.
The second removed patch selects forces selection of page 0 in the
PHY. Something that the reset tries to address as well.
As pointed out by Alexander Duzck, the patch below fixes the same
issue but in the proper location:
commit
4e684f59d760a2c7c716bb60190783546e2d08a1
Author: Chris J Arges <christopherarges@gmail.com>
Date: Wed Nov 2 09:13:42 2016 -0500
igb: Workaround for igb i210 firmware issue
Reverts:
440aeca4b9858248d8f16d724d9fa87a4f65fa33.
Reverts:
182785335447957409282ca745aa5bc3968facee.
Signed-off-by: Christian Grönke <c.groenke@infodas.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Tue, 3 Jul 2018 00:09:30 +0000 (17:09 -0700)]
ixgbe: add ipsec security registers into ethtool register dump
Add the ixgbe's security configuration registers into
the register dump.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Nguyen [Wed, 30 May 2018 23:14:23 +0000 (16:14 -0700)]
ixgbe: Do not allow LRO or MTU change with XDP
XDP does not support jumbo frames or LRO. These checks are being made
outside the driver when an XDP program is loaded, however, there is
nothing preventing these from changing after an XDP program is loaded.
Add the checks so that while an XDP program is loaded, do not allow MTU
to be changed or LRO to be enabled.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Gustavo A. R. Silva [Wed, 25 Jul 2018 15:22:27 +0000 (10:22 -0500)]
rds: send: Fix dead code in rds_sendmsg
Currently, code at label *out* is unreachable. Fix this by updating
variable *ret* with -EINVAL, so the jump to *out* can be properly
executed instead of directly returning from function.
Addresses-Coverity-ID:
1472059 ("Structurally dead code")
Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Anders Roxell [Wed, 25 Jul 2018 22:20:08 +0000 (00:20 +0200)]
net/rds/Kconfig: RDS should depend on IPV6
Build error, implicit declaration of function __inet6_ehashfn shows up
When RDS is enabled but not IPV6.
net/rds/connection.c: In function ‘rds_conn_bucket’:
net/rds/connection.c:67:9: error: implicit declaration of function ‘__inet6_ehashfn’; did you mean ‘__inet_ehashfn’? [-Werror=implicit-function-declaration]
hash = __inet6_ehashfn(lhash, 0, fhash, 0, rds_hash_secret);
^~~~~~~~~~~~~~~
__inet_ehashfn
Current code adds IPV6 as a depends on in config RDS.
Fixes: eee2fa6ab322 ("rds: Changing IP address internal representation to struct in6_addr")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Jul 2018 05:25:54 +0000 (22:25 -0700)]
Merge branch 'smc-next'
Ursula Braun says:
====================
net/smc: patches 2018-07-25
here are 4 more patches for SMC: The first one is just a small
code cleanup in preparation for patch 2. Patch 2 switches to the
use of the vlan-gid for VLAN traffic. Patch 3 improves diagnosis
when creating SMC connections. Patch 4 improves synchronization
between local and remote link groups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Wed, 25 Jul 2018 14:35:33 +0000 (16:35 +0200)]
net/smc: improve delete link processing
Send an orderly DELETE LINK request before termination of a link group,
add support for client triggered DELETE LINK processing. And send a
disorderly DELETE LINK before module is unloaded.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Graul [Wed, 25 Jul 2018 14:35:32 +0000 (16:35 +0200)]
net/smc: provide fallback reason code
Remember the fallback reason code and the peer diagnosis code for
smc sockets, and provide them in smc_diag.c to the netlink interface.
And add more detailed reason codes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Wed, 25 Jul 2018 14:35:31 +0000 (16:35 +0200)]
net/smc: use correct vlan gid of RoCE device
SMC code uses the base gid for VLAN traffic. The gids exchanged in
the CLC handshake and the gid index used for the QP have to switch
from the base gid to the appropriate vlan gid.
When searching for a matching IB device port for a certain vlan
device, it does not make sense to return an IB device port, which
is not enabled for the used vlan_id. Add another check whether a
vlan gid exists for a certain IB device port.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Wed, 25 Jul 2018 14:35:30 +0000 (16:35 +0200)]
net/smc: fewer parameters for smc_llc_send_confirm_link()
Link confirmation will always be sent across the new link being
confirmed. This allows to shrink the parameter list.
No functional change.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 26 Jul 2018 05:17:45 +0000 (22:17 -0700)]
Merge branch 'nfp-protect-from-theoretical-size-overflows-and-SR-IOV-errors'
Jakub Kicinski says:
====================
nfp: protect from theoretical size overflows and SR-IOV errors
This small set changes the handling of pci_sriov_set_totalvfs() errors.
nfp is the only driver which fails probe on pci_sriov_set_totalvfs()
errors. It turns out some BIOS configurations may break SR-IOV and
users who don't use that feature should not suffer.
Remaining patches makes sure we use overflow-safe function for ring
allocation, even though ring sizes are limited. It won't hurt and
we can also enable fallback to vmalloc() if memory is tight while
at it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 26 Jul 2018 02:40:37 +0000 (19:40 -0700)]
nfp: protect from theoretical size overflows on HW descriptor ring
Use array_size() and store the size as full size_t to protect from
theoretical size overflow when handling HW descriptor rings.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 26 Jul 2018 02:40:36 +0000 (19:40 -0700)]
nfp: restore correct ordering of fields in rx ring structure
Commit
7f1c684a8966 ("nfp: setup xdp_rxq_info") mixed the cache
cold and cache hot data in the nfp_net_rx_ring structure (ignoring
the feedback), to try to fit the structure into 2 cache lines
after struct xdp_rxq_info was added. Now that we are about to add
a new field the structure will grow back to 3 cache lines, so
order the members correctly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 26 Jul 2018 02:40:35 +0000 (19:40 -0700)]
nfp: use kvcalloc() to allocate SW buffer descriptor arrays
Use kvcalloc() instead of tmp variable + kzalloc() when allocating
SW buffer information to allow falling back to vmalloc and to protect
from theoretical integer overflow.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 26 Jul 2018 02:40:34 +0000 (19:40 -0700)]
nfp: don't fail probe on pci_sriov_set_totalvfs() errors
On machines with buggy ACPI tables or when SR-IOV is already enabled
we may not be able to set the SR-IOV VF limit in sysfs, it's not fatal
because the limit is imposed by the driver anyway. Only the sysfs
'sriov_totalvfs' attribute will be too high. Print an error to inform
user about the failure but allow probe to continue.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benedict Wong [Wed, 25 Jul 2018 20:45:29 +0000 (13:45 -0700)]
xfrm: Return detailed errors from xfrmi_newlink
Currently all failure modes of xfrm interface creation return EEXIST.
This change improves the granularity of errnos provided by also
returning ENODEV or EINVAL if failures happen in looking up the
underlying interface, or a required parameter is not provided.
This change has been tested against the Android Kernel Networking Tests,
with additional xfrmi_newlink tests here:
https://android-review.googlesource.com/c/kernel/tests/+/715755
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
YueHaibing [Wed, 25 Jul 2018 08:54:33 +0000 (16:54 +0800)]
xfrm: fix 'passing zero to ERR_PTR()' warning
Fix a static code checker warning:
net/xfrm/xfrm_policy.c:1836 xfrm_resolve_and_create_bundle() warn: passing zero to 'ERR_PTR'
xfrm_tmpl_resolve return 0 just means no xdst found, return NULL
instead of passing zero to ERR_PTR.
Fixes: d809ec895505 ("xfrm: do not assume that template resolving always returns xfrms")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
YueHaibing [Thu, 26 Jul 2018 01:51:27 +0000 (09:51 +0800)]
amd-xgbe: use dma_mapping_error to check map errors
The dma_mapping_error() returns true or false, but we want
to return -ENOMEM if there was an error.
Fixes: 174fd2597b0b ("amd-xgbe: Implement split header receive support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Jul 2018 23:46:02 +0000 (16:46 -0700)]
Merge branch 'mlxsw-Introduce-algorithmic-TCAM-support'
Ido Schimmel says:
====================
mlxsw: Introduce algorithmic TCAM support
The Spectrum-2 ASIC uses an algorithmic TCAM (A-TCAM) where multiple
exact matches lookups are performed instead of a single lookup as with
standard circuit TCAM (C-TCAM) memory. This allows for higher scale and
reduced power consumption.
The lookups are performed by masking a packet using different masks
(e.g., {dst_ip/24, ethtype}) defined for the region and looking for an
exact match. Eventually, the rule with the highest priority will be
picked.
Since the number of masks per-region is limited, the ASIC includes a
C-TCAM that can be used as a spill area for rules that do not fit into
the A-TCAM.
The driver currently uses a C-TCAM only mode which is similar to
Spectrum-1. However, this mode severely limits both the number of
supported ACL rules and the performance of the ACL lookup.
This patch set introduces initial support for the A-TCAM mode where the
C-TCAM is only used for rule spillage.
The first five patches add the registers and ASIC resources needed in
order to make use of the A-TCAM.
Next three patches are the "meat" and add the eRP core which is used to
manage the masks used by each ACL region. The individual commit messages
are lengthy and aim to thoroughly explain the subject.
The next seven patches perform small adjustments in the code and the
related data structures and are meant to prepare the code base to the
introduction of the A-TCAM in the last two patches.
Various A-TCAM optimization will be the focus of follow-up patch sets:
* Pruning - Used to reduce the number of lookups. Each rule will include
a prune vector that indicates which masks should not be considered for
further lookups as they cannot result in a higher priority match
* Bloom filter - Used to reduce the number of lookups. Before performing
a lookup with a given mask the ASIC will consult a bloom filter
(managed by the driver) that indicates whether a match might exist using
the considered mask
* Masks aggregation - Used to increase scale and reduce lookups. Masks
that only differ by up to eight consecutive bits (delta bits) can be
aggregated into a single mask. The delta bits then become a part of the
rule's key. For example, dst_ip/16 and dst_ip/17 can be represented as
dst_ip/16 with a delta bit of one. Rules using the aggregated mask then
specify whether the 17-th bit should be masked or not and its value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:06 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Start using A-TCAM
Now that all the pieces are in place we can start using the A-TCAM
instead of only using the C-TCAM. This allows for much higher scale and
better performance (to be improved further by follow-up patch sets).
Perform the integration with the A-TCAM and the eRP core by reverting
the changes introduced by "mlxsw: spectrum_acl: Enable C-TCAM only mode
in eRP core" and add calls from the C-TCAM code into the eRP core.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:05 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Add A-TCAM rule insertion and deletion
Implement rule insertion and deletion into the A-TCAM before we flip the
driver to start using the A-TCAM.
Rule insertion into the A-TCAM is very similar to C-TCAM, but there are
subtle differences between regions of different sizes (i.e., different
number of key blocks).
Specifically, as explained in "mlxsw: spectrum_acl: Allow encoding a
partial key", in 12 key blocks regions a rule is split into two and the
two halves of the rule are linked using a "large entry key ID".
Such differences are abstracted away by using different region
operations per region type.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:04 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Pass C-TCAM region and entry to insert function
When A-TCAM will be used together with C-TCAM, the C-TCAM code will need
to call into the eRP core in order to get an eRP for an inserted entry.
The eRP core takes an A-TCAM region as one of its arguments, so pass the
C-TCAM region to the insertion function which will later allow us to
derive the A-TCAM region, given it contains the C-TCAM one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:03 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Add A-TCAM region initialization
Before we start using the A-TCAM we need to make sure the region is
properly initialized.
This includes the setting of its type (which affects the size of its eRP
table, for example) and its registration with the eRP core.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:02 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Make global TCAM resources available to regions
Each TCAM region currently uses its own resources and there is no
sharing between the different regions.
This is going to change with A-TCAM as each region will need to allocate
an eRP table from the global eRP tables array.
Make the global TCAM resources available to each region by passing the
TCAM private data to the region initialization routine.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:01 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Encapsulate C-TCAM region in A-TCAM region
In Spectrum-2 the C-TCAM is only used for rules that can't fit in the
A-TCAM due to a limited number of masks per A-TCAM region.
In addition, rules inserted into the C-TCAM may affect rules residing in
the A-TCAM, by clearing their C-TCAM prune bit.
The two regions are thus closely related and can be thought of as if the
C-TCAM region is encapsulated in the A-TCAM one.
Change the data structures to reflect that before introducing A-TCAM
support and make C-TCAM region initialization part of the A-TCAM region
initialization sequence.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:24:00 +0000 (09:24 +0300)]
mlxsw: spectrum_acl: Add A-TCAM initialization
Initialize the A-TCAM as part of the driver's initialization routine.
Specifically, initialize the eRP tables so that A-TCAM regions will be
able to perform allocations of eRP tables upon rule insertion in
subsequent patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:59 +0000 (09:23 +0300)]
mlxsw: spectrum_acl: Allow encoding a partial key
When working with 12 key blocks in the A-TCAM, rules are split into two
records, which constitute two lookups. The two records are linked using
a "large entry key ID". The ID is assigned to key blocks 6 to 11 and
resolved during the first lookup. The second lookup is performed using
the ID and the remaining key blocks.
Allow encoding a partial key so that it can be later used to check if an
ID can be reused.
This is done by adding two arguments to the existing encode function
that specify the range of the block indexes we would like to encode. The
key and mask arguments become optional, as we will not need to encode
both of them all the time.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:58 +0000 (09:23 +0300)]
mlxsw: spectrum_acl: Extend Spectrum-2 region struct
In a similar fashion to Spectrum-1's region struct, Spectrum-2's struct
needs to store a pointer to the common region struct.
The pointer will be used in follow-up patches that implement rules
insertion and deletion.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:57 +0000 (09:23 +0300)]
mlxsw: spectrum_acl: Add support for C-TCAM eRPs
The number of eRPs that can be used by a single A-TCAM region is limited
to 16. When more eRPs are needed, an ordinary circuit TCAM (C-TCAM) can
be used to hold the extra eRPs.
Unlike the A-TCAM, only a single (last) lookup is performed in the
C-TCAM and not a lookup per-eRP. However, modeling the C-TCAM as extra
eRPs will allow us to easily introduce support for pruning in a
follow-up patch set and is also logically correct.
The following diagram depicts the relation between both TCAMs:
C-TCAM
+-------------------+ +--------------------+ +-----------+
| | | | | |
| eRP #1 (A-TCAM) +----> ... +----+ eRP #16 (A-TCAM) +----+ eRP #17 |
| | | | | ... |
+-------------------+ +--------------------+ | eRP #N |
| |
+-----------+
Lookup order is from left to right.
Extend the eRP core APIs with a C-TCAM parameter which indicates whether
the requested eRP is to be used with the C-TCAM or not.
Since the C-TCAM is only meant to absorb rules that can't fit in the
A-TCAM due to exceeded number of eRPs or key collision, an error is
returned when a C-TCAM eRP needs to be created when the eRP state
machine is in its initial state (i.e., 'no masks'). This should only
happen in the face of very unlikely errors when trying to push rules
into the A-TCAM.
In order not to perform unnecessary lookups, the eRP core will only
enable a C-TCAM lookup for a given region if it knows there are C-TCAM
eRPs present.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:56 +0000 (09:23 +0300)]
mlxsw: spectrum_acl: Enable C-TCAM only mode in eRP core
Currently, no calls are performed into the eRP core, but in order to
make review easier we would like to gradually add these calls.
Have the eRP core initialize a region's master mask to all ones and
allow it to use an empty eRP table. This directs the lookup to the
C-TCAM and allows the C-TCAM only mode to continue working.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:55 +0000 (09:23 +0300)]
mlxsw: spectrum_acl: Implement common eRP core
When rules are inserted into the A-TCAM they are associated with a mask,
which is part of the lookup key: { masked key, mask ID, region ID }.
These masks are called rule patterns (RP) and the aggregation of several
masks into one (to be introduced in follow-up patch sets) is called an
extended RP (eRP).
When a packet undergoes a lookup in an ACL region it is masked by the
current set of eRPs used by the region, looking for an exact match.
Eventually, the rule with the highest priority is picked.
These eRPs are stored in several global banks to allow for lookup to
occur using several eRPs simultaneously.
At first, an ACL region will only require a single mask - upon the
insertion of the first rule. In this case, the region can use the
"master RP" which is composed by OR-ing all the masks used by the
region. This mask is a property of the region and thus there is no need
to use the above mentioned banks.
At some point, a second mask will be needed. In this case, the region
will need to allocate an eRP table from the above mentioned banks and
insert its masks there.
>From now on, upon lookup, the eRP table used by the region will be
fetched from the eRP banks - using {eRP bank, Index within the bank} -
and the eRPs present in the table will be used to mask the packet. Note
that masks with consecutive indexes are inserted into consecutive banks.
When rules are deleted and a region only needs a single mask once again
it can free its eRP table and use the master RP.
The above logic is implemented in the eRP core and represented using the
following state machine:
+------------+ create mask - as master RP +---------------+
| +--------------------------------> |
| no masks | | single mask |
| <--------------------------------+ |
+------------+ delete mask +-----+--^------+
| |
| |
create mask - | | delete mask -
create mask transition to use eRP | | transition to
+--------+ table | | use master RP
| | | |
| | | |
+----v--------+----+ create mask +----v--+-----+
| <-------------------------------+ |
| multiple masks | | two masks |
| +-------------------------------> |
+------------------+ delete mask - if two +-------------+
remaining
The code that actually configures rules in the A-TCAM will interface
with the eRP core by getting or putting an eRP based on the required
mask used by the rule.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:54 +0000 (09:23 +0300)]
mlxsw: resources: Add Spectrum-2 eRP resources
Add the following resources to be used by A-TCAM code:
* Maximum number of eRP banks
* Maximum size of eRP bank
* Number of eRP entries required for a 2/4/8/12 key blocks mask
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:53 +0000 (09:23 +0300)]
mlxsw: resources: Add Spectrum-2 maximum large key ID resource
Add a resource to make sure we do not exceed the maximum number of
supported large key IDs in a region.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:52 +0000 (09:23 +0300)]
mlxsw: reg: Add Policy-Engine eRP Table Register
The register is used to add and delete eRPs from the eRP table.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:51 +0000 (09:23 +0300)]
mlxsw: reg: Add Policy-Engine TCAM Entry Register Version 3
The register is used to configure rules in the A-TCAM.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 25 Jul 2018 06:23:50 +0000 (09:23 +0300)]
mlxsw: reg: Prepare PERERP register for A-TCAM usage
Before introducing A-TCAM support we need to make sure all the necessary
fields are configurable and not hard coded to values that worked for the
C-TCAM only use case.
This includes - for example - the ability to configure the eRP table
used by the TCAM region.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 25 Jul 2018 06:11:16 +0000 (06:11 +0000)]
lan743x: Make symbol lan743x_pm_ops static
Fixes the following sparse warning:
drivers/net/ethernet/microchip/lan743x_main.c:2944:25: warning:
symbol 'lan743x_pm_ops' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 25 Jul 2018 06:06:07 +0000 (06:06 +0000)]
tcp: make function tcp_retransmit_stamp() static
Fixes the following sparse warnings:
net/ipv4/tcp_timer.c:25:5: warning:
symbol 'tcp_retransmit_stamp' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianbo Liu [Wed, 25 Jul 2018 02:31:25 +0000 (02:31 +0000)]
net/sched: cls_flower: Use correct inline function for assignment of vlan tpid
This fixes the following sparse warning:
net/sched/cls_flower.c:1356:36: warning: incorrect type in argument 3 (different base types)
net/sched/cls_flower.c:1356:36: expected unsigned short [unsigned] [usertype] value
net/sched/cls_flower.c:1356:36: got restricted __be16 [usertype] vlan_tpid
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Tue, 24 Jul 2018 11:31:45 +0000 (14:31 +0300)]
net/mlx4_core: Allow MTTs starting at any index
Allow obtaining MTTs starting at any index,
thus give a better cache utilization.
For this, allow setting log_mtts_per_seg to 0, and use
this in default.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Anaty Rahamim Bar Kat <anaty@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Jul 2018 23:28:58 +0000 (16:28 -0700)]
Merge branch 'mlx5-Offload-setting-matching-on-tunnel-tos-ttl'
Or Gerlitz says:
====================
net/mlx5: Offload setting/matching on tunnel tos/ttl
This series enables mlx5 offloading of tc eswitch rules that set
tos/ttl (encap) or match on them (decap) for tunnels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 24 Jul 2018 10:59:35 +0000 (13:59 +0300)]
net/mlx5e: Offload TC matching on tos/ttl for ip tunnels
Enable offloading of TC matching on tos/ttl for ipv4/6 tunnels.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 24 Jul 2018 10:59:34 +0000 (13:59 +0300)]
net/mlx5e: Support setup of tos and ttl for tunnel key TC action offload
Use the values provided by user-space for the encapsulation headers.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 24 Jul 2018 10:59:33 +0000 (13:59 +0300)]
net/mlx5e: Use ttl from route lookup on tc encap offload only if needed
Currnetly, the ttl for the encapsulation headers is taken from the
route lookup result. As a pre-step to allow for an offload case when
the user specifies the ttl, take it from the route lookup only if
not zero. While here, also move to use u8 instead int for the ttl.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjorn Helgaas [Mon, 23 Jul 2018 20:59:46 +0000 (15:59 -0500)]
vxge: Remove unnecessary include of <linux/pci_hotplug.h>
The vxge driver doesn't need anything provided by pci_hotplug.h, so remove
the unnecessary include of it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 23 Jul 2018 19:40:07 +0000 (21:40 +0200)]
net: phy: add helper phy_polling_mode
Add a helper for checking whether polling is used to detect PHY status
changes.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kozlowski [Mon, 23 Jul 2018 16:20:20 +0000 (18:20 +0200)]
net: ethernet: fs-enet: Use generic CRC32 implementation
Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.
Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kozlowski [Mon, 23 Jul 2018 16:19:14 +0000 (18:19 +0200)]
net: ethernet: freescale: Use generic CRC32 implementation
Use generic kernel CRC32 implementation because it:
1. Should be faster (uses lookup tables),
2. Removes duplicated CRC generation code,
3. Uses well-proven algorithm instead of coding it one more time.
Suggested-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Camelia Groza [Mon, 23 Jul 2018 15:06:15 +0000 (18:06 +0300)]
net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg
genphy_config_aneg() should be called only by PHYs that implement
the Clause 22 register set. Prevent Clause 45 PHYs that don't implement
the register set from calling the genphy function.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 25 Jul 2018 19:53:38 +0000 (12:53 -0700)]
Merge branch 'virtio_net-Add-ethtool-stat-items'
Toshiaki Makita says:
====================
virtio_net: Add ethtool stat items
Add some ethtool stat items useful for performance analysis.
====================
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Mon, 23 Jul 2018 14:36:09 +0000 (23:36 +0900)]
virtio_net: Add kick stats
So we can infer the number of VM-Exits.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Mon, 23 Jul 2018 14:36:08 +0000 (23:36 +0900)]
virtio_net: Add XDP related stats
Add counters below:
* Tx
- xdp_tx: frames sent by ndo_xdp_xmit or XDP_TX.
- xdp_tx_drops: dropped frames out of xdp_tx ones.
* Rx
- xdp_packets: frames went through xdp program.
- xdp_tx: XDP_TX frames.
- xdp_redirects: XDP_REDIRECT frames.
- xdp_drops: any dropped frames out of xdp_packets ones.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Mon, 23 Jul 2018 14:36:07 +0000 (23:36 +0900)]
virtio_net: Factor out the logic to determine xdp sq
Make sure to use the same logic in all places to determine xdp sq. This
is useful for xdp counters which the following commit will introduce as
well.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Mon, 23 Jul 2018 14:36:06 +0000 (23:36 +0900)]
virtio_net: Make drop counter per-queue
Since when XDP was introduced, drop counter has been able to be updated
much more frequently than before, as XDP_DROP increments the counter.
Thus for performance analysis per-queue drop counter would be useful.
Also this avoids cache contention and race on updating the counter. It
is currently racy because napi handlers read-modify-write it without any
locks.
There are more counters in dev->stats that are racy, but I left them
per-device, because they are rarely updated and does not worth being
per-queue counters IMHO. To fix them we need atomic ops or some kind of
locks.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>