Russell King [Tue, 7 Feb 2017 23:03:02 +0000 (15:03 -0800)]
MIPS: Octeon: Remove unnecessary MODULE_*()
octeon-platform.c can not be built as a module for two reasons:
(a) the Makefile doesn't allow it:
obj-y := cpu.o setup.o octeon-platform.o octeon-irq.o csrc-octeon.o
(b) the multiple *_initcall() statements, each of which are translated
to a module_init() call when attempting a module build, become
aliases to init_module(). Having more than one alias will cause a
build error.
Hence, rather than adding a linux/module.h include, remove the redundant
MODULE_*() from this file.
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:03:01 +0000 (15:03 -0800)]
iscsi: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/target/iscsi/iscsi_target_login.c:1135:7: error: implicit declaration of function 'try_module_get' [-Werror=implicit-function-declaration]
Add linux/module.h to iscsi_target_login.c.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:03:00 +0000 (15:03 -0800)]
net: mvneta: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/net/ethernet/marvell/mvneta.c:2694:26: error: storage size of 'status' isn't known
drivers/net/ethernet/marvell/mvneta.c:2695:26: error: storage size of 'changed' isn't known
drivers/net/ethernet/marvell/mvneta.c:2695:9: error: variable 'changed' has initializer but incomplete type
drivers/net/ethernet/marvell/mvneta.c:2709:2: error: implicit declaration of function 'fixed_phy_update_state' [-Werror=implicit-function-declaration]
Add linux/phy_fixed.h to mvneta.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:59 +0000 (15:02 -0800)]
net: fman: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/net/ethernet/freescale/fman/fman_memac.c:519:21: error: dereferencing pointer to incomplete type 'struct fixed_phy_status'
Add linux/phy_fixed.h to fman_memac.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:58 +0000 (15:02 -0800)]
net: bgmac: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/net/ethernet/broadcom/bgmac.c:1015:17: error: dereferencing pointer to incomplete type 'struct mii_bus'
drivers/net/ethernet/broadcom/bgmac.c:1185:2: error: implicit declaration of function 'phy_start' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1198:2: error: implicit declaration of function 'phy_stop' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1239:9: error: implicit declaration of function 'phy_mii_ioctl' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1389:28: error: 'phy_ethtool_get_link_ksettings' undeclared here (not in a function)
drivers/net/ethernet/broadcom/bgmac.c:1390:28: error: 'phy_ethtool_set_link_ksettings' undeclared here (not in a function)
drivers/net/ethernet/broadcom/bgmac.c:1403:13: error: dereferencing pointer to incomplete type 'struct phy_device'
drivers/net/ethernet/broadcom/bgmac.c:1417:3: error: implicit declaration of function 'phy_print_status' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1424:26: error: storage size of 'fphy_status' isn't known
drivers/net/ethernet/broadcom/bgmac.c:1424:9: error: variable 'fphy_status' has initializer but incomplete type
drivers/net/ethernet/broadcom/bgmac.c:1425:11: warning: excess elements in struct initializer
drivers/net/ethernet/broadcom/bgmac.c:1425:3: error: unknown field 'link' specified in initializer
drivers/net/ethernet/broadcom/bgmac.c:1426:12: note: in expansion of macro 'SPEED_1000'
drivers/net/ethernet/broadcom/bgmac.c:1426:3: error: unknown field 'speed' specified in initializer
drivers/net/ethernet/broadcom/bgmac.c:1427:13: note: in expansion of macro 'DUPLEX_FULL'
drivers/net/ethernet/broadcom/bgmac.c:1427:3: error: unknown field 'duplex' specified in initializer
drivers/net/ethernet/broadcom/bgmac.c:1432:12: error: implicit declaration of function 'fixed_phy_register' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1432:31: error: 'PHY_POLL' undeclared (first use in this function)
drivers/net/ethernet/broadcom/bgmac.c:1438:8: error: implicit declaration of function 'phy_connect_direct' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1439:6: error: 'PHY_INTERFACE_MODE_MII' undeclared (first use in this function)
drivers/net/ethernet/broadcom/bgmac.c:1521:2: error: implicit declaration of function 'phy_disconnect' [-Werror=implicit-function-declaration]
drivers/net/ethernet/broadcom/bgmac.c:1541:15: error: expected declaration specifiers or '...' before string constant
Add linux/phy.h to bgmac.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:57 +0000 (15:02 -0800)]
net: lan78xx: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/net/usb/lan78xx.c:394:33: sparse: expected ; at end of declaration
drivers/net/usb/lan78xx.c:394:33: sparse: Expected } at end of struct-union-enum-specifier
drivers/net/usb/lan78xx.c:394:33: sparse: got interface
drivers/net/usb/lan78xx.c:403:1: sparse: Expected ; at the end of type declaration
drivers/net/usb/lan78xx.c:403:1: sparse: got }
Add linux/phy.h to lan78xx.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:56 +0000 (15:02 -0800)]
net: macb: fix build errors when linux/phy*.h is removed from net/dsa.h
drivers/net/ethernet/cadence/macb.h:862:33: sparse: expected ; at end of declaration
drivers/net/ethernet/cadence/macb.h:862:33: sparse: Expected } at end of struct-union-enum-specifier
drivers/net/ethernet/cadence/macb.h:862:33: sparse: got phy_interface
drivers/net/ethernet/cadence/macb.h:877:1: sparse: Expected ; at the end of type declaration
drivers/net/ethernet/cadence/macb.h:877:1: sparse: got }
In file included from drivers/net/ethernet/cadence/macb_pci.c:29:0:
drivers/net/ethernet/cadence/macb.h:862:2: error: unknown type name 'phy_interface_t'
phy_interface_t phy_interface;
^~~~~~~~~~~~~~~
Add linux/phy.h to macb.h
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:55 +0000 (15:02 -0800)]
net: cgroups: fix build errors when linux/phy*.h is removed from net/dsa.h
net/core/netprio_cgroup.c:303:16: error: expected declaration specifiers or '...' before string constant
MODULE_LICENSE("GPL v2");
^~~~~~~~
Add linux/module.h to fix this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 7 Feb 2017 23:02:54 +0000 (15:02 -0800)]
net: sunrpc: fix build errors when linux/phy*.h is removed from net/dsa.h
Removing linux/phy.h from net/dsa.h reveals a build error in the sunrpc
code:
net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function 'xprt_rdma_bc_put':
net/sunrpc/xprtrdma/svc_rdma_backchannel.c:277:2: error: implicit declaration of function 'module_put' [-Werror=implicit-function-declaration]
net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function 'xprt_setup_rdma_bc':
net/sunrpc/xprtrdma/svc_rdma_backchannel.c:348:7: error: implicit declaration of function 'try_module_get' [-Werror=implicit-function-declaration]
Fix this by adding linux/module.h to svc_rdma_backchannel.c
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 18:47:52 +0000 (13:47 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2017-02-09' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.11
Mostly smaller changeds and fixes all over, nothing really major
standing out.
Major changes:
iwlwifi
* work on support for new A000 devices continues
* fix 802.11w, which was failing to due an IGTK bug
ath10k
* add debugfs file peer_debug_trigger for debugging firmware
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:42:03 +0000 (14:42 +0100)]
spectrum: flower: Treat ETH_P_ALL as a special case and translate for HW
HW does not understand ETH_P_ALL. So treat this special case differently
and translate to 0/0 key/mask. That will allow HW to match all ethertypes.
Fixes: 7aa0f5aa9030 ("mlxsw: spectrum: Implement TC flower offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 18:37:49 +0000 (13:37 -0500)]
Merge branch 'net-checkpatch'
Tobin C. Harding says:
====================
Whitespace checkpatch fixes
This patch set fixes various whitespace checkpatch errors and warnings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
tcharding [Thu, 9 Feb 2017 06:56:07 +0000 (17:56 +1100)]
net: Fix checkpatch, Missing a blank line after declarations
This patch fixes multiple occurrences of checkpatch WARNING: Missing
a blank line after declarations.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcharding [Thu, 9 Feb 2017 06:56:06 +0000 (17:56 +1100)]
net: Fix checkpatch block comments warnings
Fix multiple occurrences of checkpatch warning. WARNING: Block
comments use * on subsequent lines. Also make comment blocks
more uniform.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcharding [Thu, 9 Feb 2017 06:56:05 +0000 (17:56 +1100)]
net: Fix checkpatch whitespace errors
This patch fixes two trivial whitespace errors. Brace should be
on the previous line and trailing statements should be on next line.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcharding [Thu, 9 Feb 2017 06:56:04 +0000 (17:56 +1100)]
net: Fix checkpatch WARNING: please, no space before tabs
This patch fixes multiple occurrences of space before tabs warnings.
More lines of code were moved than required to keep kernel-doc
comments uniform.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 18:18:34 +0000 (13:18 -0500)]
Merge branch 'act_pedit-relative-offset'
Amir Vadai says:
====================
net/sched: act_pedit: Use offset relative to conventional network headers
Some FW/HW parser APIs are such that they need to get the specific header type (e.g
IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or transport).
Enhancing the UAPI to allow for specifying that, would allow the same flows to be
set into both SW and HW.
This patchset also makes pedit more robust. Currently fields offset is specified
by offset relative to the ip header, while using negative offsets for
MAC layer fields.
This series enables the user to set offset relative to the relevant header.
Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action \
pedit munge ip ttl add 0xff \
pedit munge tcp dport set 8080 \
pipe action mirred egress redirect dev veth0
Will forward traffic destined to tcp dport 80, while modifying the
destination port to 8080, and decreasing the ttl by one.
I've uploaded a draft for the userspace [2] to make it easier to review and
test the patchset.
[1] - http://patchwork.ozlabs.org/patch/700909/
[2] - git: https://bitbucket.org/av42/iproute2.git
branch: pedit
Patchset was tested and applied on top of upstream commit
bd092ad1463c ("Merge
branch 'remove-__napi_complete_done'")
Thanks,
Amir
Changes since V2:
- Instead of reusing unused bits in existing uapi fields, using new netlink
attributes for the new information. This way new/old user space and new/old
kernel can live together without having misunderstandings.
Changes since V1:
- No changes - V1 was sent and didn't make it for 4.10.
- You asked me [1] why did I use specific header names instead of layers (L2,
L3...), and I explained that it is on purpose, this extra information is
planned to be used by hardware drivers to offload the action.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 7 Feb 2017 07:56:08 +0000 (09:56 +0200)]
net/act_pedit: Introduce 'add' operation
This command could be useful to inc/dec fields.
For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower ip_proto tcp \
action pedit munge ip ttl add 0xff pipe \
action mirred egress redirect dev veth0
In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 7 Feb 2017 07:56:07 +0000 (09:56 +0200)]
net/act_pedit: Support using offset relative to the conventional network headers
Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough.
After this patch, the action has information about the exact header type
and field inside this header. This information could be used later on
for hardware offloading of pedit.
Backward compatibility was being kept:
1. Old kernel <-> new userspace
2. New kernel <-> old userspace
3. add rule using new userspace <-> dump using old userspace
4. add rule using old userspace <-> dump using new userspace
When using the extended api, new netlink attributes are being used. This
way, operation will fail in (1) and (3) - and no malformed rule be added
or dumped. Of course, new user space that doesn't need the new
functionality can use the old netlink attributes and operation will
succeed.
Since action can support both api's, (2) should work, and it is easy to
write the new user space to have (4) work.
The action is having a strict check that only header types and commands
it can handle are accepted. This way future additions will be much
easier.
Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action pedit munge tcp dport set 8080 pipe \
action mirred egress redirect dev veth0
Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 7 Feb 2017 07:56:06 +0000 (09:56 +0200)]
net/skbuff: Introduce skb_mac_offset()
Introduce skb_mac_offset() that could be used to get mac header offset.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 16:46:42 +0000 (11:46 -0500)]
Merge branch 'mlxsw-offload-mc-flood'
Jiri Pirko says:
====================
mlxsw: Offload MC flood for unregister MC
Nogah says:
When multicast is enabled, the Linux bridge floods unregistered multicast
packets only to ports connected to a multicast router. Devices capable of
offloading the Linux bridge need to be made aware of such ports, for
proper flooding behavior.
On the other hand, when multicast is disabled, such packets should be
flooded to all ports. This patchset aims to fix that, by offloading
the multicast state and the list of multicast router ports.
The first 3 patches adds switchdev attributes to offload this data.
The rest of the patchset add implementation for handling this data in the
mlxsw driver.
The effects this data has on the MDB (namely, when the multicast is
disabled the MDB should be considered as invalid, and when it is enabled, a
packet that is flooded by it should also be flooded to the multicast
routers ports) is subject of future work.
Testing of this patchset included:
Sending 3 mc packets streams, LL, register and unregistered, and checking
that they reached only to the ports that should have received them.
The configs were:
mc disabled, mc without mc router ports and mc with fixed router port.
It was checked for vlan aware bridge, vlan unaware bridge and vlan unaware
bridge with another vlan unaware bridge on the same machine
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:49 +0000 (14:54 +0100)]
mlxsw: spectrum: Update mc_disabled flag by switchdev attr
Add a function to update mc_disabled from switchdev attr
SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:48 +0000 (14:54 +0100)]
mlxsw: spectrum: Extend port_orig_get for bridge devices
The function mlxsw_sp_port_orig_get returns the vport from the physical
port if needed, based on the original device.
This patch addresses the case where the original device is a bridge.
If it is vlan unaware bridge, it returns the matching vport. If it is vlan
aware bridge, there is no matching vport, and it returns the original port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:47 +0000 (14:54 +0100)]
mlxsw: spectrum: Add an option to flood mc by mc_router_port
The decision whether to flood a multicast packet to a port dependent
on three flags: mc_disabled, mc_router_port, mc_flood.
If mc_disabled is on, the port will be flooded according to mc_flood,
otherwise, according to mc_router_port. To accomplish that, add those
flags into the mlxsw_sp_port struct and update the mc flood table
accordingly.
Update mc_router_port by switchdev attribute
SWITCHDEV_ATTR_ID_PORT_MC_ROUTER_PORT.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:46 +0000 (14:54 +0100)]
mlxsw: spectrum: Separate bc and mc floods
Break the bm (broadcast-multicast) into two tables, one for broadcast
(and link local multicast that behaves like bc) and one for unknown
multicasts.
Add a bool into mlxsw_sp_port named mc_flood that reflect the value this
port should have in the mc flood table (currently, always 1);
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:45 +0000 (14:54 +0100)]
mlxsw: spectrum: Change max vfid
A user that wants many bridges will use 1.Q bridge which are scalable.
One can have as many 1.Q bridges as vfids.
This patch sets their number to 1k, which is a reasonably large number.
This change is done here because the next patches will add a new flood
table, and without it, it will increase the overall size of the flood
tables dramatically.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:44 +0000 (14:54 +0100)]
mlxsw: spectrum: Make port flood update more generic
Currently, there is a per port flood update function only for the UC
table. Make the function more generic by changing the table type to be
an input.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:43 +0000 (14:54 +0100)]
mlxsw: spectrum: Break flood set func to be per table
Currently, the flood set function can't operate on only one table, but
sets both uc_flood and mb_flood together.
This patch creates a function that sets the flood state per table.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:42 +0000 (14:54 +0100)]
switchdev: bridge: Offload mc router ports
Offload the mc router ports list, whenever it is being changed.
It is done because in some cases mc packets needs to be flooded to all
the ports in this list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:41 +0000 (14:54 +0100)]
bridge: mcast: Merge the mc router ports deletions to one function
There are three places where a port gets deleted from the mc router port
list. This patch join the actual deletion to one function.
It will be helpful for later patch that will offload changes in the mc
router ports list.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Thu, 9 Feb 2017 13:54:40 +0000 (14:54 +0100)]
switchdev: bridge: Offload multicast disabled
Offload multicast disabled flag, for more accurate mc flood behavior:
When it is on, the mdb should be ignored.
When it is off, unregistered mc packets should be flooded to mc router
ports.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 16:38:10 +0000 (11:38 -0500)]
Merge branch 'sched-cls_api-small-cleanup'
Jiri Pirko says:
====================
sched: cls_api: small cleanup
This patchset makes couple of things in cls_api code a bit nicer and easier
for reader to digest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:39:00 +0000 (14:39 +0100)]
sched: check negative err value to safe one level of indent
As it is more common, check err for !0. That allows to safe one level of
indentation and makes the code easier to read. Also, make 'next' variable
global in function as it is used twice.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:38:59 +0000 (14:38 +0100)]
sched: add missing curly braces in else branch in tc_ctl_tfilter
Curly braces need to be there, for stylistic reasons.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:38:58 +0000 (14:38 +0100)]
sched: move err set right before goto errout in tc_ctl_tfilter
This makes the reader to know right away what is the error value.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:38:57 +0000 (14:38 +0100)]
sched: push TC filter protocol creation into a separate function
Make the long function tc_ctl_tfilter a little bit shorter and easier to
read. Also make the creation of filter proto symmetric to destruction.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:38:56 +0000 (14:38 +0100)]
sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api
Creation is done in this file, move destruction to be at the same place.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 9 Feb 2017 13:38:55 +0000 (14:38 +0100)]
sched: rename tcf_destroy to tcf_destroy_proto
This function destroys TC filter protocol, not TC filter. So name it
accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 16:32:14 +0000 (11:32 -0500)]
Merge branch 'mlxsw-identical-routes-handling'
Jiri Pirko says:
====================
mlxsw: Identical routes handling
Ido says:
The kernel can store several FIB aliases that share the same prefix and
length. These aliases can differ in other parameters such as TOS and
metric, which are taken into account during lookup.
Offloading devices might not have the same flexibility, allowing only a
single route with the same prefix and length to be reflected. mlxsw is
one such device.
This patchset aims to correctly handle this situation in the mlxsw
driver. The first four patches introduce small changes in the IPv4 FIB
code, so that listeners of the FIB notification chain will be able to
correctly handle identical routes.
The last three patches build on top of previous work and introduce the
necessary changes in the mlxsw driver. The biggest change is the
introduction of a FIB node, where identical routes are chained, instead
of a primitive reference counting. This is explained in detail in the
fifth patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:44 +0000 (10:28 +0100)]
mlxsw: spectrum_router: Add support for route replace
Upon the reception of an ENTRY_REPLACE notification, resolve the FIB
node corresponding to the prefix and length and insert the new route
before the first matching entry.
Since the notification also signals the deletion of the replaced route,
delete it from the driver's cache.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:43 +0000 (10:28 +0100)]
mlxsw: spectrum_router: Add support for route append
When a new route is appended, it's placed after existing routes sharing
the same parameters (prefix, length, table ID, TOS and priority).
While the device supports only one route with the same prefix and length
in a single table, it's important to correctly place the appended route
in the driver's cache, as when a route is deleted the next one is
programmed into the device.
Following the reception of an ENTRY_APPEND notification, resolve the
FIB node corresponding to the prefix and length and correctly place the
new entry in its entry list.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:42 +0000 (10:28 +0100)]
mlxsw: spectrum_router: Correctly handle identical routes
In the device, routes are indexed in a routing table based on the prefix
and its length. This is in contrast to the kernel's FIB where several
FIB aliases can exist with these parameters being identical. In such
cases, the routes will be sorted by table ID (LOCAL first, then MAIN),
TOS and finally priority (metric).
During lookup, these routes will be evaluated in order. In case the
packet's TOS field is non-zero and a FIB alias with a matching TOS is
found, then it's selected. Otherwise, the lookup defaults to the route
with TOS 0 (if it exists). However, if the requested scope is narrower
than the one found, then the lookup continues.
To best reflect the kernel's datapath we should take the above into
account. Given a prefix and its length, the reflected route will always
be the first one in the FIB alias list. However, if the route has a
non-zero TOS then its action will be converted to trap instead of
forward, since we currently don't support TOS-based routing. If this
turns out to be a real issue, we can add support for that using
policy-based switching.
The route's scope can be effectively ignored as any packet being routed
by the device would've been looked-up using the widest scope (UNIVERSE).
To achieve that we need to do two changes. Firstly, we need to create
another struct (FIB node) that will hold the list of FIB entries sharing
the same prefix and length. This struct will be hashed using these two
parameters.
Secondly, we need to change the route reflection to match the above
logic, so that the first FIB entry in the list will be programmed into
the device while the rest will remain in the driver's cache in case of
subsequent changes.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:41 +0000 (10:28 +0100)]
ipv4: fib: Add events for FIB replace and append
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.
Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:40 +0000 (10:28 +0100)]
ipv4: fib: Send notification before deleting FIB alias
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
notification is sent after the reference on the previous FIB info was
dropped. This is problematic as potential listeners might need to access
it in their notification blocks.
Solve this by sending the notification prior to the deletion of the
replaced FIB alias. This is consistent with ENTRY_DEL notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:39 +0000 (10:28 +0100)]
ipv4: fib: Send deletion notification with actual FIB alias type
When a FIB alias is removed, a notification is sent using the type
passed from user space - can be RTN_UNSPEC - instead of the actual type
of the removed alias. This is problematic for listeners of the FIB
notification chain, as several FIB aliases can exist with matching
parameters, but the type.
Solve this by passing the actual type of the removed FIB alias.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 9 Feb 2017 09:28:38 +0000 (10:28 +0100)]
ipv4: fib: Only flush FIB aliases belonging to currently flushed table
In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.
The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.
When flushing a particular table, skip any aliases belonging to a
different one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Patrick McHardy <kaber@trash.net>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 03:59:36 +0000 (22:59 -0500)]
Merge branch 'openvswitch-Conntrack-integration-improvements'
Jarno Rajahalme [Thu, 9 Feb 2017 19:22:01 +0000 (11:22 -0800)]
openvswitch: Pack struct sw_flow_key.
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there. In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:22:00 +0000 (11:22 -0800)]
openvswitch: Add force commit.
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.
Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched. If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.
This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry. If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:59 +0000 (11:21 -0800)]
openvswitch: Add original direction conntrack tuple to sw_flow_key.
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:58 +0000 (11:21 -0800)]
openvswitch: Inherit master's labels.
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection. Instead, we do a lookup in the
expectation table directly. However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this. This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.
This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough. Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:57 +0000 (11:21 -0800)]
openvswitch: Refactor labels initialization.
Refactoring conntrack labels initialization makes changes in later
patches easier to review.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:56 +0000 (11:21 -0800)]
openvswitch: Simplify labels length logic.
Since
23014011ba42 ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time. This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build. In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:55 +0000 (11:21 -0800)]
openvswitch: Unionize ovs_key_ct_label with a u32 array.
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:54 +0000 (11:21 -0800)]
openvswitch: Do not trigger events for unconfirmed connections.
Receiving change events before the 'new' event for the connection has
been received can be confusing. Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.
Fixes: 182e3042e15d ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac66735870 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:53 +0000 (11:21 -0800)]
openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted.
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry. Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made. Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.
This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.
Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Thu, 9 Feb 2017 19:21:52 +0000 (11:21 -0800)]
openvswitch: Fix comments for skb->_nfct
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 10 Feb 2017 03:27:08 +0000 (22:27 -0500)]
Merge branch 'ena-bug-fixes'
Netanel Belgazal says:
====================
Bug Fixes in ENA driver
Changes from V3:
* Rebase patchset to master and solve merge conflicts.
* Remove redundant bug fix (fix error handling when probe fails)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:39 +0000 (15:21 +0200)]
net/ena: update driver version to 1.1.2
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:38 +0000 (15:21 +0200)]
net/ena: change condition for host attribute configuration
Move the host info config to be the first admin command that is executed.
This change require the driver to remove the 'feature check'
from host info configuration flow.
The check is removed since the supported features bitmask field
is retrieved only after calling ENA_ADMIN_DEVICE_ATTRIBUTES admin command.
If set host info is not supported an error will be returned by the device.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:37 +0000 (15:21 +0200)]
net/ena: change driver's default timeouts
The timeouts were too agressive and sometimes cause false alarms.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:36 +0000 (15:21 +0200)]
net/ena: reduce the severity of ena printouts
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:35 +0000 (15:21 +0200)]
net/ena: use READ_ONCE to access completion descriptors
Completion descriptors are accessed from the driver and from the device.
To avoid reading the old value, use READ_ONCE macro.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:34 +0000 (15:21 +0200)]
net/ena: use napi_complete_done() return value
Do not unamsk interrupts if we are in busy poll mode.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:33 +0000 (15:21 +0200)]
net/ena: fix potential access to freed memory during device reset
If the ena driver detects that the device is not behave as expected,
it tries to reset the device.
The reset flow calls ena_down, which will frees all the resources
the driver allocates and then it will reset the device.
This flow can cause memory corruption if the device is still writes
to the driver's memory space.
To overcome this potential race, move the reset before the device
resources are freed.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:32 +0000 (15:21 +0200)]
net/ena: refactor ena_get_stats64 to be atomic context safe
ndo_get_stat64() can be called from atomic context, but the current
implementation sends an admin command to retrieve the statistics from
the device. This admin command can sleep.
This patch re-factors the implementation of ena_get_stats64() to use
the {rx,tx}bytes/count from the driver's inner counters, and to obtain
the rx drop counter from the asynchronous keep alive (heart bit)
event.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:31 +0000 (15:21 +0200)]
net/ena: fix NULL dereference when removing the driver after device reset failed
If for some reason the device stops responding, and the device reset
failes to recover the device, the mmio register read data structure
will not be reinitialized.
On driver removal, the driver will also try to reset the device, but
this time the mmio data structure will be NULL.
To solve this issue, perform the device reset in the remove function
only if the device is runnig.
Crash log
54.240382] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 54.244186] IP: [<
ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
[ 54.244186] PGD 0
[ 54.244186] Oops: 0002 [#1] SMP
[ 54.244186] Modules linked in: ena_drv(OE-) snd_hda_codec_generic kvm_intel kvm crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_intel aes_x86_64 snd_hda_controller lrw gf128mul cirrus glue_helper ablk_helper ttm snd_hda_codec drm_kms_helper cryptd snd_hwdep drm snd_pcm pvpanic snd_timer syscopyarea sysfillrect snd parport_pc sysimgblt serio_raw soundcore i2c_piix4 mac_hid lp parport psmouse floppy
[ 54.244186] CPU: 5 PID: 1841 Comm: rmmod Tainted: G OE 3.16.0-031600-generic #
201408031935
[ 54.244186] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 54.244186] task:
ffff880135852880 ti:
ffff8800bb640000 task.ti:
ffff8800bb640000
[ 54.244186] RIP: 0010:[<
ffffffffc067de5a>] [<
ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
[ 54.244186] RSP: 0018:
ffff8800bb643d50 EFLAGS:
00010083
[ 54.244186] RAX:
000000000000deb0 RBX:
0000000000030d40 RCX:
0000000000000003
[ 54.244186] RDX:
0000000000000202 RSI:
0000000000000058 RDI:
ffffc90000775104
[ 54.244186] RBP:
ffff8800bb643d88 R08:
0000000000000000 R09:
cf00000000000000
[ 54.244186] R10:
0000000fffffffe0 R11:
0000000000000001 R12:
0000000000000000
[ 54.244186] R13:
ffffc90000765000 R14:
ffffc90000775104 R15:
00007fca1fa98090
[ 54.244186] FS:
00007fca1f1bd740(0000) GS:
ffff88013fd40000(0000) knlGS:
0000000000000000
[ 54.244186] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 54.244186] CR2:
0000000000000000 CR3:
00000000b9cf6000 CR4:
00000000001406e0
[ 54.244186] Stack:
[ 54.244186]
0000000000000202 0000005800000286 ffffc90000765000 ffffc90000765000
[ 54.244186]
ffff880135f6b000 ffff8800b9360000 00007fca1fa98090 ffff8800bb643db8
[ 54.244186]
ffffffffc0680b3d ffff8800b93608c0 ffffc90000765000 ffff880135f6b000
[ 54.244186] Call Trace:
[ 54.244186] [<
ffffffffc0680b3d>] ena_com_dev_reset+0x1d/0x1b0 [ena_drv]
[ 54.244186] [<
ffffffffc0678497>] ena_remove+0xa7/0x130 [ena_drv]
[ 54.244186] [<
ffffffff813d4df6>] pci_device_remove+0x46/0xc0
[ 54.244186] [<
ffffffff814c3b7f>] __device_release_driver+0x7f/0xf0
[ 54.244186] [<
ffffffff814c4738>] driver_detach+0xc8/0xd0
[ 54.244186] [<
ffffffff814c3969>] bus_remove_driver+0x59/0xd0
[ 54.244186] [<
ffffffff814c4fde>] driver_unregister+0x2e/0x60
[ 54.244186] [<
ffffffff810f0a80>] ? show_refcnt+0x40/0x40
[ 54.244186] [<
ffffffff813d4ec3>] pci_unregister_driver+0x23/0xa0
[ 54.244186] [<
ffffffffc068413f>] ena_cleanup+0x10/0xed1 [ena_drv]
[ 54.244186] [<
ffffffff810f3a47>] SyS_delete_module+0x157/0x1e0
[ 54.244186] [<
ffffffff81014fb7>] ? do_notify_resume+0xc7/0xd0
[ 54.244186] [<
ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
[ 54.244186] Code: c3 4d 8d b5 04 01 01 00 4c 89 f7 e8 e1 5a 11 c1 48 89 45 c8 41 0f b7 85 00 01 01 00 8d 48 01 66 2d 52 21 66 41 89 8d 00 01 01 00 <66> 41 89 04 24 0f b7 45 d4 89 45 d0 89 c1 41 0f b7 85 00 01 01
[ 54.244186] RIP [<
ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
[ 54.244186] RSP <
ffff8800bb643d50>
[ 54.244186] CR2:
0000000000000000
[ 54.244186] ---[ end trace
18dd9889b6497810 ]---
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:30 +0000 (15:21 +0200)]
net/ena: fix RSS default hash configuration
ENA default hash configures IPv4_frag hash twice instead of
configure non-IP packets.
The bug caused IPv4 fragmented packets to be calculated based on
L2 source and destination address instead of L3 source and destination.
IPv4 packets can reach to the wrong Rx queue.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:29 +0000 (15:21 +0200)]
net/ena: fix ethtool RSS flow configuration
ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
treat the ena_flow_hash_to_flow_type enum as power of two values.
Change the values of ena_admin_flow_hash_fields to be power of two values.
This bug effect the ethtool set/get rxnfc.
ethtool will report wrong values hash fields for get and will
configure wrong hash fields in set.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:28 +0000 (15:21 +0200)]
net/ena: fix queues number calculation
The ENA driver tries to open a queue per vCPU.
To determine how many vCPUs the instance have it uses num_possible_cpus()
while it should have use num_online_cpus() instead.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Thu, 9 Feb 2017 13:21:27 +0000 (15:21 +0200)]
net/ena: remove ntuple filter support from device feature list
Remove NETIF_F_NTUPLE from netdev->features.
The ENA device driver does not support ntuple filtering.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Feb 2017 22:24:30 +0000 (17:24 -0500)]
Merge branch 'enic-vxlan-offload'
Govindarajulu Varadarajan says:
====================
enic: add vxlan offload support
This series adds vxlan offload support for enic driver. The first
patch adds vxlan devcmd for configuring vxland offload parameters.
Second patch adds ndo_udp_tunnel_add/del and offload on rx path.
There are to modes in which fw supports vxlan offload.
mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok
mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok
Some hw supports only mode 0, some support mode 0 and 2. Driver gets
the supported modes bitmap using get_supported_feature_ver devcmd
and selects the highest mode both driver and fw supports.
Third patch adds offload support on tx path by adding
enic_features_check().
v2: Order local variable declarations from longest to shortest line,
on all three patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 9 Feb 2017 00:43:09 +0000 (16:43 -0800)]
enic: add vxlan offload on tx path
Define ndo_features_check. Hw supports offload only for ipv4 inner and
ipv4 outer pkt.
Code refactor for setting inner tcp pseudo csum.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 9 Feb 2017 00:43:08 +0000 (16:43 -0800)]
enic: add udp_tunnel ndo for vxlan offload
Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload.
enic supports offload of only one ipv4/udp port.
There are two modes that fw supports for vxlan offload.
mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok
mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok
tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 9 Feb 2017 00:43:07 +0000 (16:43 -0800)]
enic: add devcmds for vxlan offload
This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd
overlay_offload_ctrl: enable/disable offload
overlay_offload_cfg: update offload udp port number
get_supported_feature_ver: get hw supported offload version. Each
version has different bitmap for csum_ok/encap
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 8 Feb 2017 23:00:43 +0000 (00:00 +0100)]
net: dsa: mv88e6xxx: Move forward declaration to where it is needed
Move it out from the middle for the #defines to just before it is
needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 8 Feb 2017 22:40:04 +0000 (14:40 -0800)]
net: dsa: Fix duplicate object rule
While adding switch.o to the list of DSA object files, we essentially
duplicated the previous obj-y line and just added switch.o, remove the
duplicate.
Fixes: f515f192ab4f ("net: dsa: add switch notifier")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Feb 2017 22:09:20 +0000 (17:09 -0500)]
Merge branch 'qcom-emac-more-ethtool'
Timur Tabi says:
====================
net: qcom/emac: add the last ethtool functions
These two patches implement the remaining two ethtool functions that
are of interest to the Qualcomm EMAC driver. These are the last
patches that will be submitted for the 4.11 merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 8 Feb 2017 21:49:28 +0000 (15:49 -0600)]
net: qcom/emac: add ethtool support for setting ring parameters
Implement the set_ringparam method, which allows the user to specify
the size of the TX and RX descriptor rings. The values are constrained
to the limits of the hardware.
Since the driver does not use separate queues for mini or jumbo frames,
attempts to set those values are rejected.
If the interface is already running when the setting is changed, then
the interface is reset.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Wed, 8 Feb 2017 21:49:27 +0000 (15:49 -0600)]
net: qcom/emac: add ethtool support for reading hardware registers
Implement the get_regs_len and get_regs ethtool methods. The driver
returns the values of selected hardware registers.
The make the register offsets known to emac_ethtool, the the register
offset macros are all combined into one header file. They were
inexplicably and arbitrarily split between two files.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 8 Feb 2017 21:24:19 +0000 (22:24 +0100)]
ARM: orion: remove unused wnr854t_switch_plat_data
The other instances of this structure got removed along with the MDIO
device change, but this one was left behind and needs to be removed
as well:
arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
Fixes: 575e93f7b5e6 ("ARM: orion: Register DSA switch as a MDIO device")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Feb 2017 21:57:39 +0000 (16:57 -0500)]
Merge branch 'sctp-sender-stream-reconf-reset-add-streams'
Xin Long says:
====================
sctp: add sender-side procedures for stream reconf asoc reset and add streams
Patch 4/6 is to implement sender-side procedures for the SSN/TSN Reset
Request Parameter described in rfc6525 section 5.1.4, patch 3/6 is
ahead of it to define a function to make the request chunk for it.
Patch 6/6 is to implement sender-side procedures for the Add Incoming
and Outgoing Streams Request Parameter Request Parameter described in
rfc6525 section 5.1.5 and 5.1.6, patch 5/6 is ahead of it to define a
function to make the request chunk for it.
Patch 2/6 is a fix to recover streams states when it fails to send
request and Patch 1/6 is to drop some unncessary __packed from some
old structures.
v1->v2:
- put these into a smaller group.
- rename some temporary variables in the codes.
- rename the titles of the commits and improve some changelogs.
v2->v3:
- re-split the patchset and make sure it has no dead codes for review.
- move some codes into stream.c from socket.c.
v3->v4:
- add one more patch to fix a send reset stream request issue.
- doing actual work only when request is sent successfully.
- reduce some indents in sctp_send_add_streams.
v4->v5:
- close streams before sending request and recover them when sending
fails in patch 1/5 and patch 3/5
v5->v6:
- add patch 1/6 to drop some unncessary __packed from some old structures.
- remove __packed from some new structures in patch 3/6 and 5/6.
- define unsigned int outcnt and incnt to make codes smaller in patch 6/6.
- use krealloc instead of kcalloc and remove ksize check in patch 6/6, as
ksize check is acutally used in krealloc already.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:20 +0000 (01:18 +0800)]
sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.
It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:19 +0000 (01:18 +0800)]
sctp: add support for generating stream reconf add incoming/outgoing streams request chunk
This patch is to define Add Incoming/Outgoing Streams Request
Parameter described in rfc6525 section 4.5 and 4.6. They can
be in one same chunk trunk as rfc6525 section 3.1-7 describes,
so make them in one function.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:18 +0000 (01:18 +0800)]
sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.
It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:17 +0000 (01:18 +0800)]
sctp: add support for generating stream reconf ssn/tsn reset request chunk
This patch is to define SSN/TSN Reset Request Parameter described
in rfc6525 section 4.3.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:16 +0000 (01:18 +0800)]
sctp: streams should be recovered when it fails to send request.
Now when sending stream reset request, it closes the streams to
block further xmit of data until this request is completed, then
calls sctp_send_reconf to send the chunk.
But if sctp_send_reconf returns err, and it doesn't recover the
streams' states back, which means the request chunk would not be
queued and sent, so the asoc will get stuck, streams are closed
and no packet is even queued.
This patch is to fix it by recovering the streams' states when
it fails to send the request, it is also to fix a return value.
Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Wed, 8 Feb 2017 17:18:15 +0000 (01:18 +0800)]
sctp: drop unnecessary __packed from some stream reconf structures
commit
85c727b59483 ("sctp: drop __packed from almost all SCTP structures")
has removed __packed from almost all SCTP structures. But there still are
three structures where it should be dropped.
This patch is to remove it from some stream reconf structures.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Feb 2017 21:47:54 +0000 (16:47 -0500)]
Merge branch 'sfc-more-encap-offloads'
Edward Cree says:
====================
sfc: more encap offloads
This patch series adds support for RX checksum offload of encapsulated packets.
It also adds support for configuring the hardware's lists of UDP ports used for
VXLAN and GENEVE encapsulation offloads. Since changing these lists causes the
MC to reboot, the driver has been hardened against reboots, which used to be
considered an exceptional occurrence but are now normal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 8 Feb 2017 16:52:10 +0000 (16:52 +0000)]
sfc: configure UDP tunnel offload ports
Implement ndo_udp_tunnel_{add,del} to update the NIC's list of VXLAN and
GENEVE UDP ports. Also reset the port list to empty on driver load and
on driver unload, with appropriate flag set on the unload case.
These port numbers are used for RX inner checksum offload, and in future
will also be used for TX inner checksum offload and encapsulated TSO.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthew Slattery [Wed, 8 Feb 2017 16:51:50 +0000 (16:51 +0000)]
sfc: update mcdi_pcol definitions for MC_CMD_SET_TUNNEL_ENCAP_UDP_PORTS
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 8 Feb 2017 16:51:33 +0000 (16:51 +0000)]
sfc: call mcdi_reboot_detected() when MC reboots during an MCDI command
This function wasn't being called in this particular case when the MC
reboots. This caused resource reallocations to not be handled properly
and often ended up disabling the interface.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 8 Feb 2017 16:51:18 +0000 (16:51 +0000)]
sfc: harden driver against MC resets during initial probe
This is mainly to prepare for a future overlay networking patch that
could cause an MC reset at probe time if the UDP tunnel port list is
set immediately upon driver load.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 8 Feb 2017 16:51:02 +0000 (16:51 +0000)]
sfc: set csum_level for encapsulated packets
Set the csum_level for encapsulated packets where the encapsulation
type, l3 class and l4 class are sets that need it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Cooper [Wed, 8 Feb 2017 16:50:40 +0000 (16:50 +0000)]
sfc: process RX event inner checksum flags
Add support for RX checksum offload of encapsulated packets. This
essentially just means paying attention to the inner checksum flags
in the RX event, and if *either* checksum flag indicates a fail then
don't tell the kernel that checksum offload was successful.
Also, count these checksum errors and export the counts to ethtool -S.
Test the most common "good" case of RX events with a single bitmask
instead of a series of ifs. Move the more specific error checking
in to a separate function for clarity, and don't use unlikely() there
since we know at least one of the bits is bad.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 13:36:49 +0000 (14:36 +0100)]
mlxsw: spectrum_router: Don't reflect LINKDOWN nexthops
The kernel resolves the nexthops for a given route using
FIB_LOOKUP_IGNORE_LINKSTATE which means a notification can be sent for a
route with one of its nexthops being LINKDOWN.
In case IGNORE_ROUTES_WITH_LINKDOWN is set for the nexthop netdev, then
we shouldn't reflect the nexthop to the device's table.
Once the nexthop netdev's carrier goes up we'll be notified using NH_ADD
and reflect it to the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Feb 2017 20:25:19 +0000 (15:25 -0500)]
Merge branch 'mlxsw-Reflect-nexthop-status-changes'
Jiri Pirko says:
====================
mlxsw: Reflect nexthop status changes
Ido says:
When the kernel forwards IPv4 packets via multipath routes it doesn't
consider nexthops that are dead or linkdown. For example, if the nexthop
netdev is administratively down or doesn't have a carrier.
Devices capable of offloading such multipath routes need to be made
aware of changes in the reflected nexthops' status. Otherwise, the
device might forward packets via non-functional nexthops, resulting in
packet loss. This patchset aims to fix that.
The first 11 patches deal with the necessary restructuring in the
mlxsw driver, so that it's able to correctly add and remove nexthops
from the device's adjacency table.
The 12th patch adds the NH_{ADD,DEL} events to the FIB notification
chain. These notifications are sent whenever the kernel decides to add
or remove a nexthop from the forwarding plane.
Finally, the last three patches add support for these events in the
mlxsw driver, which is currently the only driver capable of offloading
multipath routes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:42 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Flush resources when RIF is deleted
When the last IP address is removed from a netdev, its RIF is deleted.
However, if user didn't first remove neighbours and nexthops using this
interface, then they would still be present in the device's tables.
Therefore, whenever a RIF is deleted, make sure all the neighbours and
nexthops (adjacency entries) using it are removed from the relevant
tables as well.
The action associated with any route using this RIF would be refreshed,
most likely to trap. If the kernel decides to remove the route (f.e.,
because all the nexthops are now DEAD), then an event would be sent,
causing the route to be removed from the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:40 +0000 (11:16 +0100)]
mlxsw: spectrum_router: Reflect nexthop status changes
When a packet hits a multipath route in the device's routing table, a
hash is computed over its headers, which is then used to select the
appropriate nexthop from the device's adjacency table.
There are situations in which the kernel removes a nexthop from a
multipath route (e.g., no carrier) and the device should do the same.
Upon the reception of NH_{ADD,DEL} events, add or remove a nexthop from
the device's adjacency table and refresh all the routes using the
nexthop group. If all the nexthops of a multipath route are invalid,
then any packet hitting the route would be trapped to the CPU for
forwarding.
If all the nexthops are DEAD, then the kernel would remove the route
entirely. On the other hand, if all the nexthops are merely LINKDOWN,
then the kernel would keep the route and forward any incoming packet
using a different route.
While the last case might sound like a problem, it's expected that a
routing daemon running in user space would remove such a route from the
FIB as it's dumped with the DEAD flag set.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 8 Feb 2017 10:16:39 +0000 (11:16 +0100)]
ipv4: fib: Notify about nexthop status changes
When a multipath route is hit the kernel doesn't consider nexthops that
are DEAD or LINKDOWN when IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN is set.
Devices that offload multipath routes need to be made aware of nexthop
status changes. Otherwise, the device will keep forwarding packets to
non-functional nexthops.
Add the FIB_EVENT_NH_{ADD,DEL} events to the fib notification chain,
which notify capable devices when they should add or delete a nexthop
from their tables.
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>