Stefan Raspl [Thu, 1 Mar 2018 12:51:26 +0000 (13:51 +0100)]
net/smc: cleanup smc_llc.h and smc_clc.h headers
Remove structures used internal only from headers.
And remove an extra function parameter.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Mar 2018 18:13:24 +0000 (13:13 -0500)]
Merge branch 'ipv4-ipv6-mcast-align'
Yuval Mintz says:
====================
ipmr, ip6mr: Align multicast routing for IPv4 & IPv6
Historically ip6mr was based [cut-n-paste] on ipmr and the two have not
diverged too much. Apparently as ipv4 multicast routing is more common
than its ipv6 brethren modifications since then are mostly one-way,
affecting ipmr while leaving ip6mr unchanged.
This series is meant to re-factor both ipmr and ip6mr into having common
structures [and some functionality], adding 2 new common files -
mroute_base.h and ipmr_base.c.
The series begins by bringing ip6mr up to speed to some of the changes
applied in the past to ipmr [#2, #3].
It is then possible to re-factor a lot of the common structures -
vif devices [#1], mr_table [#4] mfc_cache [#6], and use the common
structures in both ipmr and ip6mr.
The rest of the patches re-factor some choice flows used by both ipmr
and ip6mr and eliminates duplicity.
This series would later allow for easy extension of ipmr offloading
to support ip6mr offloading as well, as almost all structures
related to the offloading would be shared between the two protocols.
Changes from previous versions
------------------------------
v2:
- #6 Corrected reporting logic when hitting an unresolved cache
- #7 Addressed kernel doc style [Thanks Nikolay]
RFC -> v1:
- Corrected support for CONFIG_IP{,V6}_MROUTE_MULTIPLE_TABLES
- Addressed a couple of kbuild test robot issues
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:39 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite dumproute flows
The various MFC entries are being held in the same kind of mr_tables
for both ipmr and ip6mr, and their traversal logic is identical.
Also, with the exception of the addresses [and other small tidbits]
the major bulk of the nla setting is identical.
Unite as much of the dumping as possible between the two.
Notice this requires creating an mr_table iterator for each, as the
for-each preprocessor macro can't be used by the common logic.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:38 +0000 (23:29 +0200)]
ip6mr: Remove MFC_NOTIFY and refactor flags
MFC_NOTIFY exists in ip6mr, probably as some legacy code
[was already removed for ipmr in commit
06bd6c0370bb ("net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum").
Remove it from ip6mr as well, and move the enum into a common file;
Notice MFC_OFFLOAD is currently only used by ipmr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:37 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite vif seq functions
Same as previously done with the mfc seq, the logic for the vif seq is
refactored to be shared between ipmr and ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:36 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite mfc seq logic
With the exception of the final dump, ipmr and ip6mr have the exact same
seq logic for traversing a given mr_table. Refactor that code and make
it common.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:35 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite logic for searching in MFC cache
ipmr and ip6mr utilize the exact same methods for searching the
hashed resolved connections, difference being only in the construction
of the hash comparison key.
In order to unite the flow, introduce an mr_table operation set that
would contain the protocol specific information required for common
flows, in this case - the hash parameters and a comparison key
representing a (*,*) route.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:34 +0000 (23:29 +0200)]
ipmr, ip6mr: Make mfc_cache a common structure
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic - mr_mfc
and convert both ipmr and ip6mr into using it.
For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:33 +0000 (23:29 +0200)]
ipmr, ip6mr: Unite creation of new mr_table
Now that both ipmr and ip6mr are using the same mr_table structure,
we can have a common function to allocate & initialize a new instance.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:32 +0000 (23:29 +0200)]
mroute*: Make mr_table a common struct
Following previous changes to ip6mr, mr_table and mr6_table are
basically the same [up to mr6_table having additional '6' suffixes to
its variable names].
Move the common structure definition into a common header; This
requires renaming all references in ip6mr to variables that had the
distinct suffix.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:31 +0000 (23:29 +0200)]
ip6mr: Align hash implementation to ipmr
Since commit
8fb472c09b9d ("ipmr: improve hash scalability") ipmr has
been using rhashtable as a basis for its mfc routes, but ip6mr is
currently still using the old private MFC hash implementation.
Align ip6mr to the current ipmr implementation.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:30 +0000 (23:29 +0200)]
ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 28 Feb 2018 21:29:29 +0000 (23:29 +0200)]
ipmr,ipmr6: Define a uniform vif_device
The two implementations have almost identical structures - vif_device and
mif_device. As a step toward uniforming the mr_tables, eliminate the
mif_device and relocate the vif_device definition into a new common
header file.
Also, introduce a common initializing function for setting most of the
vif_device fields in a new common source file. This requires modifying
the ipv{4,6] Kconfig and ipv4 makefile as we're introducing a new common
config option - CONFIG_IP_MROUTE_COMMON.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Mar 2018 03:45:05 +0000 (22:45 -0500)]
Merge branch 'fib_rules-support-sport-dport-and-proto-match'
Roopa Prabhu says:
====================
fib_rules: support sport, dport and proto match
This series extends fib rule match support to include sport, dport
and ip proto match (to complete the 5-tuple match support).
Common use-cases of Policy based routing in the data center require
5-tuple match. The last 2 patches in the series add a call to flow dissect
in the fwd path if required by the installed fib rules (controlled by a flag).
v1:
- Fix errors reported by kbuild and feedback on RFC
- extend port match uapi to accomodate port ranges
v2:
- address comments from Nikolay, David Ahern and Paolo (Thanks!)
Pending things I will submit separate patches for:
- extack for fib rules
- fib rules test (as requested by david ahern)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:43:22 +0000 (22:43 -0500)]
ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:42:41 +0000 (22:42 -0500)]
ipv6: route: dissect flow in input path if fib rules need it
Dissect flow in fwd path if fib rules require it. Controlled by
a flag to avoid penatly for the common case. Flag is set when fib
rules with sport, dport and proto match that require flow dissect
are installed. Also passes the dissected hash keys to the multipath
hash function when applicable to avoid dissecting the flow again.
icmp packets will continue to use inner header for hash
calculations (Thanks to Nikolay Aleksandrov for some review here).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:41:37 +0000 (22:41 -0500)]
ipv6: fib6_rules: support for match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:41:06 +0000 (22:41 -0500)]
ipv4: fib_rules: support match on sport, dport and ip proto
support to match on src port, dst port and ip protocol.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Thu, 1 Mar 2018 03:40:16 +0000 (22:40 -0500)]
net: fib_rules: support for match on ip_proto, sport and dport
uapi for ip_proto, sport and dport range match
in fib rules.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 28 Feb 2018 19:43:38 +0000 (20:43 +0100)]
r8169: fix interrupt number after adding support for MSI-X interrupts
In case of MSI-X the interrupt number may differ from pcidev->irq.
Fix this by using pci_irq_vector().
Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 17:34:20 +0000 (12:34 -0500)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2018-02-28
This series contains updates to fm10k only.
Jake provides all the changes in this series, starting with making the
function header comments consistent and to align with how the kernel
documentation expects it. Also cleaned up code comment as well as bump
the driver version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 17:25:49 +0000 (12:25 -0500)]
Merge branch 'selftests-forwarding-Add-VRF-based-tests'
Ido Schimmel says:
====================
selftests: forwarding: Add VRF-based tests
One of the nice things about network namespaces is that they allow one
to easily create and test complex environments.
Unfortunately, these namespaces can not be used with actual switching
ASICs, as their ports can not be migrated to other network namespaces
(NETIF_F_NETNS_LOCAL) and most of them probably do not support the
L1-separation provided by namespaces.
However, a similar kind of flexibility can be achieved by using VRFs and
by looping the switch ports together. For example:
br0
+
vrf-h1 | vrf-h2
+ +---+----+ +
| | | |
192.0.2.1/24 + + + + 192.0.2.2/24
swp1 swp2 swp3 swp4
+ + + +
| | | |
+--------+ +--------+
The VRFs act as lightweight namespaces representing hosts connected to
the switch.
This approach for testing switch ASICs has several advantages over the
traditional method that requires multiple physical machines, to name a
few:
1. Only the device under test (DUT) is being tested without noise from
other system.
2. Ability to easily provision complex topologies. Testing bridging
between 4-ports LAGs or 8-way ECMP requires many physical links that are
not always available. With the VRF-based approach one merely needs to
loopback more ports.
These tests are written with switch ASICs in mind, but they can be run
on any Linux box using veth pairs to emulate physical loopbacks.
v2:
* Order local variables declaration according to function arguments
order (Petr)
v1:
* Change location to net/forwarding instead of forwarding/
* Add ability to pause on failure
* Add ability to pause on cleanup
* Make configuration file optional
* Make ping/ping6/mz configurable
* Add more tc tests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:19 +0000 (12:25 +0200)]
selftests: forwarding: Introduce basic shared blocks tests
Test shared block infrastructure. This is a basic test that shares TC
block in between 2 clsact qdiscs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:18 +0000 (12:25 +0200)]
selftests: forwarding: Introduce basic tc chains tests
Tests chains matching and goto chain action.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:17 +0000 (12:25 +0200)]
selftests: forwarding: Introduce tc actions tests
Add first part of actions tests. This patch only contains tests of gact
ok/drop/trap and mirred redirect egress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:16 +0000 (12:25 +0200)]
selftests: forwarding: Introduce tc flower matching tests
Add first part of flower tests. This patch only contains dst/src ip/mac
matching.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:15 +0000 (12:25 +0200)]
selftests: forwarding: Allow to get netdev interfaces names from commandline
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:14 +0000 (12:25 +0200)]
selftests: forwarding: Add MAC get helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 28 Feb 2018 10:25:13 +0000 (12:25 +0200)]
selftests: forwarding: Add tc offload check helper
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:12 +0000 (12:25 +0200)]
selftests: forwarding: Test IPv6 weighted nexthops
Have one host generate 16K IPv6 echo requests with a random flow label
and check that they are distributed between both multipath links
according to the provided weights.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:11 +0000 (12:25 +0200)]
selftests: forwarding: Test IPv4 weighted nexthops
Use different weights for the multipath route configured on the first
router and check that the different flows generated by the first host
are distributed according to the provided weights.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:10 +0000 (12:25 +0200)]
selftests: forwarding: Create test topology for multipath routing
Create a topology with two hosts, each directly connected to a different
router. Both routers are connected using two links, enabling multipath
routing.
Test IPv4 and IPv6 ping using default MTU and large MTU.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:09 +0000 (12:25 +0200)]
selftests: forwarding: Add a test for basic IPv4 and IPv6 routing
Configure two hosts which are directly connected to the same router and
test IPv4 and IPv6 ping. Use a large MTU and check that ping is
unaffected.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:08 +0000 (12:25 +0200)]
selftests: forwarding: Add a test for flooded traffic
Add test cases for unknown unicast and unregistered multicast flooding.
For each traffic type, turn off flooding on one bridged port and inject
a packet of the specified type through the second bridged port. Make
sure the packet was not received by checking the ACL counters on the
other end. Later, turn on flooding and make sure the packet was
received.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:07 +0000 (12:25 +0200)]
selftests: forwarding: Add a test for FDB learning
Send a packet with a specific destination MAC, make sure it was learned
on the ingress port and then aged-out.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Feb 2018 10:25:06 +0000 (12:25 +0200)]
selftests: forwarding: Add initial testing framework
Add initial framework to test packet forwarding functionality. The tests
can run on actual devices using loop-backed cables or using veth pairs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Feb 2018 09:59:27 +0000 (10:59 +0100)]
ipvlan: use per device spinlock to protect addrs list updates
This changeset moves ipvlan address under RCU protection, using
a per ipvlan device spinlock to protect list mutation and RCU
read access to protect list traversal.
Also explicitly use RCU read lock to traverse the per port
ipvlans list, so that we can now perform a full address lookup
without asserting the RTNL lock.
Overall this allows the ipvlan driver to check fully for duplicate
addresses - before this commit ipv6 addresses assigned by autoconf
via prefix delegation where accepted without any check - and avoid
the following rntl assertion failure still in the same code path:
RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (124)
WARNING: CPU: 15 PID: 0 at drivers/net/ipvlan/ipvlan_core.c:124 ipvlan_addr_busy+0x97/0xa0 [ipvlan]
Modules linked in: ipvlan(E) ixgbe
CPU: 15 PID: 0 Comm: swapper/15 Tainted: G E 4.16.0-rc2.ipvlan+ #1782
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
RIP: 0010:ipvlan_addr_busy+0x97/0xa0 [ipvlan]
RSP: 0018:
ffff881ff9e03768 EFLAGS:
00010286
RAX:
0000000000000000 RBX:
ffff881fdf2a9000 RCX:
0000000000000000
RDX:
0000000000000001 RSI:
00000000000000f6 RDI:
0000000000000300
RBP:
ffff881fdf2a8000 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000001 R11:
ffff881ff9e034c0 R12:
ffff881fe07bcc00
R13:
0000000000000001 R14:
ffffffffa02002b0 R15:
0000000000000001
FS:
0000000000000000(0000) GS:
ffff881ff9e00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007fc5c1a4f248 CR3:
000000207e012005 CR4:
00000000001606e0
Call Trace:
<IRQ>
ipvlan_addr6_event+0x6c/0xd0 [ipvlan]
notifier_call_chain+0x49/0x90
atomic_notifier_call_chain+0x6a/0x100
ipv6_add_addr+0x5f9/0x720
addrconf_prefix_rcv_add_addr+0x244/0x3c0
addrconf_prefix_rcv+0x2f3/0x790
ndisc_router_discovery+0x633/0xb70
ndisc_rcv+0x155/0x180
icmpv6_rcv+0x4ac/0x5f0
ip6_input_finish+0x138/0x6a0
ip6_input+0x41/0x1f0
ipv6_rcv+0x4db/0x8d0
__netif_receive_skb_core+0x3d5/0xe40
netif_receive_skb_internal+0x89/0x370
napi_gro_receive+0x14f/0x1e0
ixgbe_clean_rx_irq+0x4ce/0x1020 [ixgbe]
ixgbe_poll+0x31a/0x7a0 [ixgbe]
net_rx_action+0x296/0x4f0
__do_softirq+0xcf/0x4f5
irq_exit+0xf5/0x110
do_IRQ+0x62/0x110
common_interrupt+0x91/0x91
</IRQ>
v1 -> v2: drop unneeded in_softirq check in ipvlan_addr6_validator_event()
Fixes: e9997c2938b2 ("ipvlan: fix check for IP addresses in control path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Feb 2018 10:43:27 +0000 (11:43 +0100)]
ipvlan: egress mcast packets are not exceptional
Currently, if IPv6 is enabled on top of an ipvlan device in l3
mode, the following warning message:
Dropped {multi|broad}cast of type= [86dd]
is emitted every time that a RS is generated and dmseg is soon
filled with irrelevant messages. Replace pr_warn with pr_debug,
to preserve debuggability, without scaring the sysadmin.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 17:06:02 +0000 (12:06 -0500)]
Merge branch 'mlxsw-mq-red-offload'
Jiri Pirko says:
====================
mlxsw: Offload multi-queue RED support
Nogah says:
Support a two level hierarchy of offloaded qdiscs in mlxsw, with sch_prio
being the root qdisc and sch_red as the children.
+----------+
| sch_prio |
+----+-----+
|
|
+----------------------------------+
| | |
| | |
| | |
+---v---+ +----v---+ +-----v--+
|sch_red| |sch_red | |sch_red |
+-------+ +--------+ +--------+
When setting sch_prio as the root qdisc on a physical port, mlxsw will
offload it. When adding it with sch_red as a child qdisc, it will offload
it as well.
Relocating child qdisc or connecting them to more then one child will
result in unoffloading them. Relocating child qdisc more then once is
highly unrecommended and might cause a miss match between the kernel
configuration and the offloaded one. The offloaded configuration will be
aligned with the one shown in the show command.
Changing the priomap parameter of sch_prio might cause a band that its
configuration was changed and it has offloaded sch_red set on it, to lose
some stats data as if sch_red was unoffloaded and offloaded again. However,
it won't affect the data on this band that will have sch_red continuously.
Patch 1 adds support for setting RED as the child of root qdisc.
Patches 2-4 add support for RED bstasts for offloaded child qdiscs.
Patches 5-6 handle backlog related changes for offloaded child qdiscs.
Patches 7-8 update PRIO in mlxsw to be able to have RED as child on its
bands.
Patch 9 adds offload handles for PRIO graft operations. In mlxsw it will
cause the driver to stop offloading the child in question.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:07 +0000 (10:45 +0100)]
mlxsw: spectrum: qdiscs: prio: Handle graft command
Handle graft command for an offloaded sch_prio.
Grafting a qdisc to any place other than under its original parent is not
supported by mlxsw and will cause the grafted qdisc to stop being
offloaded.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:06 +0000 (10:45 +0100)]
net: sch: prio: Add offload ability for grafting a child
Offload sch_prio graft command for capable drivers.
Warn in case of a failure, unless the graft was done as part of a destroy
operation (the new qdisc is a noop) or if all the qdiscs (the parent, the
old child, and the new one) are not offloaded.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:05 +0000 (10:45 +0100)]
mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bands
When the number the bands of sch_prio is decreased, child qdiscs on the
deleted bands would get deleted as well.
This change and deletions are being done under sch_tree_lock of the
sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if
it is offloaded. Un-offloading can't be done inside this lock.
Move the offload command to be done before reducing the number of bands,
so unoffloading of the qdiscs that are about to be deleted could be done
outside of the lock.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:04 +0000 (10:45 +0100)]
mlxsw: spectrum: Update sch_prio stats to include sch_red related drops
sch_prio as root qdisc should count all the drops its children have. Since
it is possible for it to have sch_red children, it needs to count RED early
drops.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:03 +0000 (10:45 +0100)]
net: sch: Don't warn on missmatching qlen and backlog for offloaded qdiscs
Offloaded qdiscs are allowed to expose only parts of their statistics.
It means that if backlog is being exposed and qlen is not, it might trigger
a warning in qdisc_tree_reduce_backlog.
Do not warn in case the qdisc that was removed was an offloaded one.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:02 +0000 (10:45 +0100)]
mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscs
When removing a child qdisc its backlog will be decreased from the parent
backlog. The driver backlog count should do the same.
When the parent changes its configuration, the child might need to clean
its stats. However, the backlog can't be cleaned with the rest of the
stats, because it reflects a momentary value that needs to be synced with
the core, not the history of the qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:01 +0000 (10:45 +0100)]
mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomap
Priority counters count packets according to their packet priority.
Collect the stats for sch_red based on these counters, so the qdisc bstats
will be the sum of counters matching the priorities marked in the qdisc
priomap.
Changing the mapping of the priorities to bands while traffic is running
can result in losing the stats of the bands qdiscs from their last dump
call to this change, as if the qdisc was unoffloaded and re-offloaded. It
will not affect the traffic behaviour according to sch_red.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:45:00 +0000 (10:45 +0100)]
mlxsw: spectrum: qdiscs: Add priority map per qdisc
Add priority map per qdisc, to indicate which priorities are being
directed through this qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:44:59 +0000 (10:44 +0100)]
mlxsw: spectrum: Add priority counters
Add TX packets and bytes counters per switch priority per port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Wed, 28 Feb 2018 09:44:58 +0000 (10:44 +0100)]
mlxsw: spectrum: qdiscs: Support qdisc per tclass
Add the option to set a qdisc per tclass. Match the qdisc to the tclass by
parent ID. Supported currently for sch_red only.
It allows offloading sch_prio as root qdisc and sch_red as its child.
(However, doing so might corrupt the stats for both parent and child.)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Wed, 28 Feb 2018 09:14:13 +0000 (10:14 +0100)]
net: mvpp2: Add hardware offloading for VLAN filtering
Marvell PPv2 controller allows for generic packet filtering. This commit
adds entries to implement VLAN filtering. The approach taken is :
- Filter entries that would match on the presence of the VLAN tag
(existing VLAN detection, DSA / EDSA detection) will set the next
lookup ID to be for the VID.
- For each VLAN existing on a given port, we add an entry that matches
this specific VID. If the incoming packet matches the VID entry, it is
set for the next lookup in the chain (LU_L2).
- A Guard entry is added for each port, that will match if the incoming
packet didn't match any of the above VID entries. This entry tags the
packet to be dropped.
Due to this design, and the fact that the total 256 filter entries are
also used for other purposes, we have a limit of 10 VLANs per port. To
accommodate the case where we would need more VLANS on one port, this
patch implements the ndo_set_features to allow for disabling of VLAN
filtering using ethtool.
The default config has VLAN filtering disabled.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Wed, 28 Feb 2018 06:55:20 +0000 (07:55 +0100)]
r8169: convert remaining feature flag and remove enum features
Now that only one feature flag is left we can convert it and remove
enum features.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 17:00:28 +0000 (12:00 -0500)]
Merge branch 'macmace-cleanups'
Finn Thain says:
====================
Fixes, cleanup and modernization for macmace driver
Changes since v4 of combined patch series:
- Removed redundant and non-portable MACH_IS_MAC tests.
- Omitted patches unrelated to macmace driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Wed, 28 Feb 2018 04:22:33 +0000 (23:22 -0500)]
net/macmace: Drop redundant MACH_IS_MAC test
The MACH_IS_MAC test is redundant here because the platform device
won't get registered unless MACH_IS_MAC.
Adopt module_platform_driver() convention.
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Wed, 28 Feb 2018 04:22:33 +0000 (23:22 -0500)]
net/macmace: Fix and clean up log messages
Don't log the unexpanded "eth%d" format string.
Log the chip revision in the probe message (consistent with mace.c).
Drop redundant debug messages for FIFO events recorded in the
interface statistics (also consistent with mace.c).
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Tue, 27 Feb 2018 23:48:21 +0000 (15:48 -0800)]
inet: whitespace cleanup
Ran simple script to find/remove trailing whitespace and blank lines
at EOF because that kind of stuff git whines about and editors leave
behind.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hernán Gonzalez [Tue, 27 Feb 2018 22:29:23 +0000 (19:29 -0300)]
emulex/benet: Constify *be_misconfig_evt_port_state[]
Note: This is compile only tested as I have no access to the hw.
No benefit gained except for some self-documenting.
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function old new delta
Total: Before=
2757703, After=
2757703, chg +0.00%
Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hernán Gonzalez [Tue, 27 Feb 2018 22:31:34 +0000 (19:31 -0300)]
qlogic/qed: Constify *pkt_type_str[]
Note: This is compile only tested as I have no access to the hw.
Constifying and declaring as static saves 24 bytes.
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24)
Function old new delta
pkt_type_str 24 - -24
Total: Before=
3599256, After=
3599232, chg -0.00%
Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar>
Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Feb 2018 16:07:12 +0000 (11:07 -0500)]
Merge branch 'SFP-updates'
Russell King says:
====================
SFP updates
Included in this series are a further few updates for SFP support:
- Adding support for Fiberstore's non-standard BiDi modules operating
at 1310nm/1550nm wavelengths rather than the 1000BASE-BX standard of
1310nm/1490nm.
- Adding support for negotiating the PHY interface mode with the MAC,
so that modules supporting faster speeds and Gigabit ethernet work
with Gigabit-only MACs.
- Adding support for high power (>1W) SFP modules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Nettleton [Tue, 27 Feb 2018 15:53:12 +0000 (15:53 +0000)]
sfp: add high power module support
This patch is the result of work by both Jon Nettleton and Russell King.
Jon wrote the original patch, adding support for SFP modules which
require a power level greater than '1'.
Russell's changes:
- Fix the power levels for big-endian, and make the code flow better.
- Convert to use device_property_read_u8()
- Warn for power levels exceeding host level
SFF-8431 says:
"To avoid exceeding system power supply limits and cooling capacity,
all modules at power up by default shall operate with up to 1.0 W.
Hosts supporting Power Level II or III operation may enable a Power
Level II or III module through the 2-wire interface. Power Level II
or III modules shall assert the power level declaration bit of
SFF-8472."
Print a warning for modules that exceed the host power level, and
leave them operating in power level 1.
- Fix i2c write
The first byte of any write after the bus address is always the
device address. In order to write a value to device D, address I,
value V, we need to generate on the bus:
S
DDDDDDDD A IIIIIIII A VVVVVVVV A P
where S = start, R = restart, A = ack, P = stop. Splitting this
as two:
S
DDDDDDDD A IIIIIIII A R
DDDDDDDD A VVVVVVVV A P
results in the device's address register being written first by I
and then by V - the addressed register within the device is not
written.
- Avoid power mode switching if 0xa2 is not implemented
Some modules indicate that they support power level II or power level
III, but do not implement address 0xa2, meaning that the bit to set
them to high power mode is not accessible.
These modules appear to have the sff8472_compliance field set to zero,
and also do not implement diagnostics. Detect this, but also ensure
that the module does not require the address switching mode, which we
do not implement.
- Use mW for power level rather than power level number.
- Fix high power mode transition
We must not switch to SFP_MOD_PRESENT state until we have finished
initialising, because the remaining state machines check for that
state. Add SFP_MOD_HPOWER as an intermediate state.
- Use definition for I2C register address rather than constant.
Signed-off-by: Jon Nettleton <jon@solid-run.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 27 Feb 2018 15:53:07 +0000 (15:53 +0000)]
dt-bindings: add maximum power level to SFP binding
Add the new maximum power level property to the SFP binding.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 27 Feb 2018 15:53:02 +0000 (15:53 +0000)]
phylink,sfp: negotiate interface format with MAC
Negotiate the interface format with the MAC rather than requiring it to
be a fixed type specified solely by the SFP module. This allows modules
that can work with several different interface signalling formats to
select a format compatible with the MAC - for example, a Fiber module
supporing Gigabit ethernet and faster connected to a Gigabit only MAC
needs to select the 1000BASE-X mode.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 27 Feb 2018 15:52:57 +0000 (15:52 +0000)]
sfp: support 1G BiDi (eg, FiberStore SFP-GE-BX) modules
Some BiDi modules (eg, FiberStore SFP-GE-BX) are not compliant with
1000BASE-BX as they use different wavelengths from the 1000BASE-BX
standard (eg, 1310nm/1550nm rather than 1310nm/1490nm). These modules
support 1000BASE-X ethernet, so detect them by a failure to find any
other support, the 8B10B encoding and a bit rate that falls within the
1Gbps window.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 27 Feb 2018 15:38:08 +0000 (17:38 +0200)]
team: Use extack to report enslavement failures
Use extack inside team's enslavement function and also propagate it to
the netdevice notifier to allow enslaved ports to report the failure
reason. Example:
$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev lo master team0
Error: Loopback device can't be added as a team port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Thu, 18 Jan 2018 17:18:57 +0000 (09:18 -0800)]
fm10k: bump version number
We're aligned with latest version released on SourceForge, so update the
version number to match.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 16 Jan 2018 19:20:52 +0000 (11:20 -0800)]
fm10k: fix incorrect warning for function prototype
Recent kernels now complain about incorrect function prototype comments,
in order to ensure comments are accurate to the function. However, it
incorrectly associates the comment above the fm10k_pci_tbl[] as
a function header comment. Fix this by removing the extra "*" in the
comment. This normally indicates that the function is a doxygen style
function header comment.
Once removed, the logic no longer kicks in and the following warning is
fixed:
warning: cannot understand function prototype: 'const struct pci_device_id fm10k_pci_tbl[] = '
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 16 Jan 2018 19:20:51 +0000 (11:20 -0800)]
fm10k: fix function doxygen comments
Several function header comments had incorrect function parameter
definitions. Recent versions of the upstream kernel have started to warn
about these issues. Fix up the comments which do not match in order to
resolve these new warnings.
While fixing these, update the copyright year also.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 28 Feb 2018 14:54:54 +0000 (09:54 -0500)]
Merge tag 'mlx5-updates-2018-02-23' of git://git./linux/kernel/git/mellanox/linux
Saeed Mahameed says:
mlx5-update-2018-02-23 (IB representors)
From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode
The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.
Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.
For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.
An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Feb 2018 19:53:27 +0000 (14:53 -0500)]
Merge branch 'mlx4-misc'
Tariq Toukan says:
====================
mlx4_en misc for 4.17
This patchset contains misc enhancements from the team
to the mlx4 Eth driver.
Patch 1 by Eran adds physical layer counters.
Patch 2 by Eran cleans-up a redundant warn print.
Patch 3 combines the checks of two end cases into a single if statement.
Patch 4 takes common code structures out of the #ifdef, following your
comment on a previous patch.
Series generated against net-next commit:
f74290fdb363 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Tue, 27 Feb 2018 14:17:22 +0000 (16:17 +0200)]
net/mlx4_en: RX csum, pre-define enabled protocols for IP status masking
Pre-define a mask for IP status of a completion, that tests the
MLX4_CQE_STATUS_IPV6 only in case CONFIG_IPV6 is enabled.
Use it for IP status testing upon completion, instead of separating
the datapath into two flows.
This takes common code structures (such as closing parenthesis)
back to their original place, and makes code more readable.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Tue, 27 Feb 2018 14:17:21 +0000 (16:17 +0200)]
net/mlx4_en: Combine checks of end-cases in RX completion function
Combine two end-cases in the same if statement with a single return value.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Tue, 27 Feb 2018 14:17:20 +0000 (16:17 +0200)]
net/mlx4_en: Remove unnecessary warn print in reset config
In mlx4_en_reset_config, there was a redundant warn print that was left
from previous versions of this function. No warn is needed anymore.
This warn can be confusing when RX-FCS is changed:
Turn OFF RX-FCS:
mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)
Turn ON RX-FCS:
mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Tue, 27 Feb 2018 14:17:19 +0000 (16:17 +0200)]
net/mlx4_en: Add physical RX/TX bytes/packets counters
Add physical RX/TX packets/bytes counters into ethtool output to monitor
all traffic that was received and transmitted on the port. These
counters are available only for none Virtual Function.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Feb 2018 19:46:28 +0000 (14:46 -0500)]
Merge branch 'mlxsw-Offloading-encapsulated-SPAN'
Jiri Pirko says:
====================
mlxsw: Offloading encapsulated SPAN
Petr says:
This patch series introduces support for mirroring with GRE
encapsulation. It offloads tc action mirred mirror from a mlxsw port to
either a gretap or an ip6gretap netdevice.
Spectrum hardware needs to know all the details of the requested
encapsulation: source and destination MAC and IP addresses, details of
VLAN tagging, etc. The only variables are the encapsulated packet
itself, and TOS field, which may be inherited. To that end, mlxsw driver
resolves the route that encapsulated packets would take, queries the
corresponding neighbor, and with that configuration in hand, configures
the mirroring in the hardware.
The driver also hooks into event handlers for netdevice changes, FIB and
neighbor events, and reconsiders the configuration on each such change.
When the new configuration differs from the currently-offloaded one, the
existing offload is removed and replaced with a new one.
It is possible to mirror to {ip6,}gretap from a matchall rule as well as
from a flower match.
** Note that with this patch set, mlxsw build depends on NET_IPGRE and
IPV6_GRE.
Current limitations:
- There has to be a route that directs packets to an mlxsw port. We
intend to extend the logic to support other netdevice types in the
future, but the eventual egress netdevice will have to be an mlxsw
port in any case.
- Offload reconfiguration due to changes in netdevice configuration
creates a window of time where packets are not mirrored. Under some
circumstances this can be prevented by configuring an unused port
analyzer and migrating mirrors over to that. However that's currently
not implemented.
- Remote address of a tunnel device needs to be set, there may not be a
GRE key, checksumming or sequence numbers, and TTL needs to be fixed
(non-inherit). These are hard requirements imposed by the underlying
hardware.
- TOS of a tunnel device needs to be "inherit". The hardware supports a
fixed TOS, but that's currently not implemented.
The series start with two patches, #1 and #2, that publish one function
and add support for querying IPv6 tunnel parameters.
In patches #3 and #4, we introduce helpers to GRE and tunneling code
that we will use later in the patchset from the SPAN code.
Patches #5 and #6 introduce support for encapsulated SPAN in reg.h.
The following seven patches, #7-#13, then prepare the SPAN codebase for
introduction of mirroring to netdevices that don't correspond to front
panel ports.
Then #14 and #15 pull all this together to implement mirroring to
{ip6,}gretap netdevices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:49 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Support mirror to ip6gretap
Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap
netdevice.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:48 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Support mirror to gretap
When a user requests mirror from a mlxsw physical port (possibly based
on an ACL match) to a gretap netdevice, the driver needs to resolve the
request to a particular physical port that the mirrored packets will
egress through, and a suite of configuration keys (importantly, IP and
MAC addresses). That means calling into routing and neighbor kernel code
to simulate the decisions made by the system for packets passing through
a gretap netdevice.
Add a new instance of mlxsw_sp_span_entry_ops to support this.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:47 +0000 (14:53 +0100)]
mlxsw: Move a mirroring check to mlxsw_sp_span_entry_create
The check for whether a mirror port (which is a mlxsw front panel port)
belongs to the same mlxsw instance as the mirrored port, is currently
only done in spectrum_acl, even though it's applicable for the matchall
case as well. Thus move it to mlxsw_sp_span_entry_create().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:46 +0000 (14:53 +0100)]
mlxsw: Handle config changes pertinent to SPAN
For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.
Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.
Call this function strategically at event handlers that might influence
the mirroring configuration.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:45 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Generalize SPAN support
To support mirroring to different device types, the functions that
partake in configuring the port analyzer need to be extended to admit
non-trivial SPAN types.
Create a structure where all details of SPAN configuration are kept,
struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops
to keep per-SPAN-type operations.
Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and
once for a suite of NOP callbacks used for invalidated SPAN entry. Put
the formet as a sole member of a new array mlxsw_sp_span_entry_types,
where all known SPAN types are kept. Introduce a new function,
mlxsw_sp_span_entry_ops(), to look up the right ops suite given a
netdevice.
Change mlxsw_sp_span_mirror_add() to use both parms and ops structures.
Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to
take these as arguments. Modify mlxsw_sp_span_entry_configure() and
mlxsw_sp_span_entry_deconfigure() to dispatch to ops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:44 +0000 (14:53 +0100)]
mlxsw: spectrum: Keep mirror netdev in mlxsw_sp_span_entry
Currently the only mirror action supported by mlxsw is mirror to another
mlxsw physical port. Correspondingly, span_entry, which tracks each
mlxsw mirror in the system, currently holds a u8 number of the
destination port.
To extend this system to mirror to gretap and ip6gretap netdevices, have
struct mlxsw_sp_span_entry actually hold the destination netdevice
itself.
This change then trickles down in obvious manner to SPAN module API and
mirror-related interfaces in struct mlxsw_afa_ops.
To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked
and the corresponding SPAN entry invalidated.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:43 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Extract mlxsw_sp_span_entry_{de, }configure()
Configuring the hardware for encapsulated SPAN involves more code than
the simple mirroring case. Extract the related code to a separate
function to separate it from the rest of SPAN entry creation. Extract
deconfigure as well for symmetry, even though disablement is the same
regardless of SPAN type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:42 +0000 (14:53 +0100)]
mlxsw: spectrum_span: Initialize span_entry.id eagerly
It is known statically ahead of time which SPAN entry will have which
ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until
the entry is actually created. This simplifies some code in
mlxsw_sp_span_entry_create()
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:41 +0000 (14:53 +0100)]
mlxsw: span: Remove span_entry by span_id
Instead of removing span_entry by the port number, allow removing by
SPAN id. That simplifies some code right here, and for mirroring to soft
netdevices, avoids problems with netdevice pointer invalidation and
reuse.
Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port()
and keep it--follow-up patches will make use of it.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:40 +0000 (14:53 +0100)]
mlxsw: reg: Extend mlxsw_reg_mpat_pack()
To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field
to set the SPAN type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:39 +0000 (14:53 +0100)]
mlxsw: reg: Add SPAN encapsulation to MPAT register
MPAT Register is used to query and configure the Switch Port Analyzer
Table. To configure Port Analyzer to encapsulate mirrored packets,
additional fields need to be specified for the MPAT register.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:38 +0000 (14:53 +0100)]
ip_tunnel: Rename & publish init_tunnel_flow
Initializing struct flowi4 is useful for drivers that need to emulate
routing decisions made by a tunnel interface. Publish the
function (appropriately renamed) so that the drivers in question don't
need to cut'n'paste it around.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:37 +0000 (14:53 +0100)]
net: GRE: Add is_gretap_dev, is_ip6gretap_dev
Determining whether a device is a GRE device is easily done by
inspecting struct net_device.type. However, for the tap variants, the
type is just ARPHRD_ETHER.
Therefore introduce two predicate functions that use netdev_ops to tell
the tap devices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:36 +0000 (14:53 +0100)]
mlxsw: spectrum_ipip: Support decoding IPv6 tunnel addresses
To support mirroring to ip6gretap, the SPAN module needs to be able to
decode IPv6 addresses specified at that tunnel.
Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to
support IPv6 addresses. To that end, add and publish a support function
mlxsw_sp_ipip_netdev_parms6().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 27 Feb 2018 13:53:35 +0000 (14:53 +0100)]
mlxsw: spectrum_ipip: Extract mlxsw_sp_l3addr_is_zero
Extract the logic for determining whether a given IPv4/IPv6 address is
all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function.
Make that function public within the module.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Feb 2018 19:31:20 +0000 (14:31 -0500)]
Merge branch 'ibmvnic-Miscellaneous-driver-fixes-and-enhancements'
Thomas Falcon says:
====================
ibmvnic: Miscellaneous driver fixes and enhancements
There is not a general theme to this patch set other than that it
fixes a few issues with the ibmvnic driver. I will just give a quick
summary of what each patch does here.
"ibmvnic: Fix TX descriptor tracking again" resolves a race condition
introduced in an earlier fix to track outstanding transmit descriptors.
This condition can throw off the tracking counter to the point that
a transmit queue will halt forever.
"ibmvnic: Allocate statistics buffers during probe" allocates queue
statistics buffers on device probe to avoid a crash when accessing
statistics of an unopened interface.
"ibmvnic: Harden TX/RX pool cleaning" includes additional checks to
avoid a bad access when cleaning RX and TX buffer pools during a device
reset.
"ibmvnic: Report queue stops and restarts as debug output" changes TX
queue state notifications from informational to debug messages. This
information is not necessarily useful to a user and under load can result
in a lot of log output.
"ibmvnic: Do not attempt to login if RX or TX queues are not allocated"
checks that device queues have been allocated successfully before
attempting device login. This resolves a panic that could occur if a
user attempted to configure a device after a failed reset.
Thanks for your attention.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 27 Feb 2018 00:10:59 +0000 (18:10 -0600)]
ibmvnic: Do not attempt to login if RX or TX queues are not allocated
If a device reset fails for some reason, TX and RX queue resources
could be released. If a user attempts to open the device in this scenario,
it may result in a kernel panic as the driver tries to access this
memory. To fix this, include a check before device login that TX/RX
queues are still there before enabling the device. In addition, return a
value that can be checked in case of any errors to avoid waiting for a
completion that will never come.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 27 Feb 2018 00:10:58 +0000 (18:10 -0600)]
ibmvnic: Report queue stops and restarts as debug output
It's not necessary to report each time a queue is stopped and restarted
as an informational message. Change that to be a debug message so that
it can be observed if needed but not printed by default.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 27 Feb 2018 00:10:57 +0000 (18:10 -0600)]
ibmvnic: Harden TX/RX pool cleaning
If the driver releases resources after a failed reset or some other
error, the driver might attempt to clean up and free memory that
isn't there anymore. Include some additional checks that RX/TX queues
along with their associated structures are still there before cleaning.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 27 Feb 2018 00:10:56 +0000 (18:10 -0600)]
ibmvnic: Allocate statistics buffers during probe
Currently, buffers holding individual queue statistics are allocated
when the device is opened. If an ibmvnic interface is hotplugged or
initialized but never opened, an attempt to get statistics with
ethtool will result in a kernel panic.
Since the driver allocates a constant number, the maximum supported
queues, of buffers, these can be allocated during device probe and
freed when the device is hot-unplugged or the module is removed.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Tue, 27 Feb 2018 00:10:55 +0000 (18:10 -0600)]
ibmvnic: Fix TX descriptor tracking again
Sorry, the previous change introduced a race condition between
transmit completion processing and tracking TX descriptors. If a
completion is received before the number of descriptors is logged,
the number of descriptors will be add but not removed. After enough
times, this could halt the transmit queue forever.
Log the number of descriptors used by a transmit before sending.
I stress tested the fix on two different systems running over the
weekend without any issues.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Feb 2018 19:28:11 +0000 (14:28 -0500)]
Merge branch 'stmmac-barrier-fixes-and-cleanup'
Niklas Cassel says:
====================
stmmac barrier fixes and cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 26 Feb 2018 21:47:09 +0000 (22:47 +0100)]
net: stmmac: make dwmac4_release_tx_desc() clear all descriptor fields
Make dwmac4_release_tx_desc() clear all descriptor fields, not just
TDES2 and TDES3.
I'm suspecting that TDES0 and TDES1 wasn't cleared because the DMA
engine uses them to store the tx hardware timestamp (if PTP is enabled).
However, stmmac_tx_clean() calls stmmac_get_tx_hwtstamp(), which reads
and saves the timestamp, before it calls release_tx_desc(), so this
is not an issue.
stmmac_xmit() and stmmac_tso_xmit() both always overwrite TDES0,
however, stmmac_tso_xmit() sometimes sets TDES1, and since neither
stmmac_xmit() nor stmmac_tso_xmit() explicitly clears TDES1, both
functions might reuse a DMA descriptor with old TDES1 data.
I haven't observed any misbehavior even though TDES1 sometimes
point to an old skb, however, explicitly clearing both TDES0 and TDES1
in dwmac4_release_tx_desc() minimizes the chances of undefined behavior.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 26 Feb 2018 21:47:08 +0000 (22:47 +0100)]
net: stmmac: ensure that the device has released ownership before reading data
According to Documentation/memory-barriers.txt, we need to use a
dma_rmb() after reading the status/own bit, to ensure that all
descriptor fields are read after reading the own bit.
This way, we ensure that the DMA engine is done with the DMA
descriptor before we read the other descriptor fields, e.g. reading
the tx hardware timestamp (if PTP is enabled).
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 26 Feb 2018 21:47:07 +0000 (22:47 +0100)]
net: stmmac: use correct barrier between coherent memory and MMIO
The last memory barrier in stmmac_xmit()/stmmac_tso_xmit() is placed
between a coherent memory write and a MMIO write:
The own bit is written in First Desc (TSO: MSS desc or First Desc).
<barrier>
The DMA engine is started by a write to the tx desc tail pointer/
enable dma transmission register, i.e. a MMIO write.
This barrier cannot be a simple dma_wmb(), since a dma_wmb() is only
used to guarantee the ordering, with respect to other writes,
to cache coherent DMA memory.
To guarantee that the cache coherent memory writes have completed
before we attempt to write to the cache incoherent MMIO region,
we need to use the more heavyweight barrier wmb().
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 26 Feb 2018 21:47:06 +0000 (22:47 +0100)]
net: stmmac: ensure that the MSS desc is the last desc to set the own bit
A dma_wmb() is used to guarantee the ordering, with respect to
other writes, to cache coherent DMA memory.
There is a dma_wmb() in prepare_tx_desc()/prepare_tso_tx_desc() which
ensures that TDES0/1/2 is written before TDES3 (which contains the own
bit), for First Desc.
However, in the rare case that MSS changes, there will be a MSS
context descriptor in front of the regular DMA descriptors:
<MSS desc> <- DMA Next Descriptor
<First Desc>
<desc n>
<Last Desc>
Thus, for this special case, we need a dma_wmb()
after prepare_tso_tx_desc()/before writing the own bit to the MSS desc,
so that we flush the write to TDES3 for First Desc,
in order to ensure that the MSS descriptor is the last descriptor to
set the own bit.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Feb 2018 19:19:11 +0000 (14:19 -0500)]
Merge branch 'RDS-optimized-notification-for-zerocopy-completion'
Sowmini Varadhan says:
====================
RDS: optimized notification for zerocopy completion
Resending with acked-by additions: previous attempt does not show
up in Patchwork. This time with a new mail Message-Id.
RDS applications use predominantly request-response, transacation
based IPC, so that ingress and egress traffic are well-balanced,
and it is possible/desirable to reduce system-call overhead by
piggybacking the notifications for zerocopy completion response
with data.
Moreover, it has been pointed out that socket functions block
if sk_err is non-zero, thus if the RDS code does not plan/need
to use sk_error_queue path for completion notification, it
is preferable to remove the sk_errror_queue related paths in
RDS.
Both of these goals are implemented in this series.
v2: removed sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>