Amit Cohen [Sun, 19 Jan 2020 13:00:52 +0000 (15:00 +0200)]
mlxsw: Add ECN configurations with IPinIP tunnels
Initialize ECN mapping registers during router init according to
INET_ECN_encapsulate() and INET_ECN_decapsulate().
For IP-in-IP encapsulation, this is required to ensure that ECN bits in
the underlay are set in accordance with the kernel. For decapsulation,
this is required to ensure that packets with invalid ECN combination in
underlay and overlay are trapped to the kernel and not forwarded.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:51 +0000 (15:00 +0200)]
mlxsw: reg: Add Tunneling IPinIP Decapsulation ECN Mapping Register
This register configures the actions that are done during IPinIP
decapsulation based on the ECN bits.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:50 +0000 (15:00 +0200)]
mlxsw: reg: Add Tunneling IPinIP Encapsulation ECN Mapping Register
This register performs mapping from overlay ECN to underlay ECN during
IPinIP encapsulation.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:49 +0000 (15:00 +0200)]
mlxsw: Add NON_ROUTABLE trap
Add a trap for packets that the device decided to drop because they are
not supposed to be routed. For example, IGMP queries can be flooded by
the device in layer 2 and reach the router. Such packets should not be
routed and instead dropped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:48 +0000 (15:00 +0200)]
devlink: Add non-routable packet trap
Add packet trap that can report packets that reached the router, but are
non-routable. For example, IGMP queries can be flooded by the device in
layer 2 and reach the router. Such packets should not be routed and
instead dropped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:47 +0000 (15:00 +0200)]
selftests: devlink_trap_l3_drops: Add test cases of irif and erif disabled
Add test cases to check that packets routed through disabled RIFs and
packets routed from disabled RIFs are dropped and devlink counters
increase when the action is trap.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 19 Jan 2020 13:00:46 +0000 (15:00 +0200)]
mlxsw: Add irif and erif disabled traps
IRIF_DISABLED and ERIF_DISABLED are driver specific traps. Packets are
dropped for these reasons when they need to be routed through/from
existing router interfaces (RIF) which are disabled.
Add devlink driver-specific traps and mlxsw trap IDs used to report
these traps.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 19 Jan 2020 15:17:07 +0000 (16:17 +0100)]
Merge branch 'for-net-next' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 E-Switch chains and prios
This series has two parts,
1) A merge commit with mlx5-next branch that include updates for mlx5
HW layouts needed for this and upcoming submissions.
2) From Paul, Increase the number of chains and prios
Currently the Mellanox driver supports offloading tc rules that
are defined on the first 4 chains and the first 16 priorities.
The restriction stems from the firmware flow level enforcement
requiring a flow table of a certain level to point to a flow
table of a higher level. This limitation may be ignored by setting
the ignore_flow_level bit when creating flow table entries.
Use unmanaged tables and ignore flow level to create more tables than
declared by fs_core steering. Manually manage the connections between the
tables themselves.
HW table is instantiated for every tc <chain,prio> tuple. The miss rule
of every table either jumps to the next <chain,prio> table, or continues
to slow_fdb. This logic is realized by following this sequence:
1. Create an auto-grouped flow table for the specified priority with
reserved entries
Reserved entries are allocated at the end of the flow table.
Flow groups are evaluated in sequence and therefore it is guaranteed
that the flow group defined on the last FTEs will be the last to evaluate.
Define a "match all" flow group on the reserved entries, providing
the platform to add table miss actions.
2. Set the miss rule action to jump to the next <chain,prio> table
or the slow_fdb.
3. Link the previous priority table to point to the new table by
updating its miss rule.
Please pull and let me know if there's any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Fri, 17 Jan 2020 15:10:42 +0000 (23:10 +0800)]
net: phy: adin: fix a warning about msleep
found a warning by the following command:
./scripts/checkpatch.pl -f drivers/net/phy/adin.c
WARNING: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst
#628: FILE: drivers/net/phy/adin.c:628:
+ msleep(10);
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 19 Jan 2020 15:00:17 +0000 (16:00 +0100)]
Merge branch 'Rate-adaptation-for-Felix-DSA-switch'
Vladimir Oltean says:
====================
Rate adaptation for Felix DSA switch
When operating the MAC at 2.5Gbps (2500Base-X and USXGMII/QSXGMII) and
in combination with certain PHYs, it is possible that the copper side
may operate at lower link speeds. In this case, it is the PHY who has a
MAC inside of it that emits pause frames towards the switch's MAC,
telling it to slow down so that the transmission is lossless.
These patches are the support needed for the switch side of things to
work.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Marginean [Thu, 16 Jan 2020 18:19:33 +0000 (20:19 +0200)]
net: dsa: felix: Allow PHY to AN 10/100/1000 with 2500 serdes link
If the serdes link is set to 2500 using interfce type 2500base-X, lower
link speeds over on the line side should still be supported.
Rate adaptation is done out of band, in our case using AQR PHYs this is
done using flow control.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Marginean [Thu, 16 Jan 2020 18:19:32 +0000 (20:19 +0200)]
net: dsa: felix: Handle PAUSE RX regardless of AN result
Flow control is used with 2500Base-X and AQR PHYs to do rate adaptation
between line side 100/1000 links and MAC running at 2.5G.
This is independent of the flow control configuration settled on line
side though AN.
In general, allowing the MAC to handle flow control even if not
negotiated with the link partner should not be a problem, so the patch
just enables it in all cases.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 19 Jan 2020 09:29:05 +0000 (10:29 +0100)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next, they are:
1) Incorrect uapi header comment in bitwise, from Jeremy Sowden.
2) Fetch flow statistics if flow is still active.
3) Restrict flow matching on hardware based on input device.
4) Add nf_flow_offload_work_alloc() helper function.
5) Remove the last client of the FLOW_OFFLOAD_DYING flag, use teardown
instead.
6) Use atomic bitwise operation to operate with flow flags.
7) Add nf_flowtable_hw_offload() helper function to check for the
NF_FLOWTABLE_HW_OFFLOAD flag.
8) Add NF_FLOW_HW_REFRESH to retry hardware offload from the flowtable
software datapath.
9) Remove indirect calls in xt_hashlimit, from Florian Westphal.
10) Add nf_flow_offload_tuple() helper to consolidate code.
11) Add nf_flow_table_offload_cmd() helper function.
12) A few whitespace cleanups in nf_tables in bitwise and the bitmap/hash
set types, from Jeremy Sowden.
13) Cleanup netlink attribute checks in bitwise, from Jeremy Sowden.
14) Replace goto by return in error path of nft_bitwise_dump(), from
Jeremy Sowden.
15) Add bitwise operation netlink attribute, also from Jeremy.
16) Add nft_bitwise_init_bool(), from Jeremy Sowden.
17) Add nft_bitwise_eval_bool(), also from Jeremy.
18) Add nft_bitwise_dump_bool(), from Jeremy Sowden.
19) Disallow hardware offload for other that NFT_BITWISE_BOOL,
from Jeremy Sowden.
20) Add NFTA_BITWISE_DATA netlink attribute, again from Jeremy.
21) Add support for bitwise shift operation, from Jeremy Sowden.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Jan 2020 13:30:27 +0000 (14:30 +0100)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2020-01-17
This series contains updates to igc, i40e, fm10k and ice drivers.
Sasha fixes a typo in a code comment that referred to silicon that is
not supported in the igc driver. Cleaned up a defined that was not
being used. Added support for another i225 SKU which does not have an
NVM. Added support for TCP segmentation offload (TSO) into igc. Added
support for PHY power management control to provide a reliable and
accurate indication of PHY reset completion.
Jake adds support for the new txqueue parameter to the transmit timeout
function in fm10k which reduces the code complexity when determining
which transmit queue is stuck.
Julio Faracco makes the similar changes that Jake did for fm10k, for
i40e and ice drivers. Added support for the new txqueue parameter in
the transmit timeout functions for i40e and ice.
Colin Ian King cleans up a redundant initialization of a local variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 19 Dec 2019 23:45:35 +0000 (23:45 +0000)]
ice: remove redundant assignment to variable xmit_done
The variable xmit_done is being initialized with a value that is never
read and it is being updated later with a new value. The initialization
is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Julio Faracco [Wed, 18 Dec 2019 18:38:45 +0000 (15:38 -0300)]
ice: Removing hung_queue variable to use txqueue function parameter
The scope of function .ndo_tx_timeout was changed to include the hang
queue when a TX timeout event occurs. See commit
0290bd291cc0
("netdev: pass the stuck queue to the timeout handler") for more
details. Now, drivers don't need to identify which queue is stopped.
Drivers can simply use the queue index provided by dev_watchdog and
execute all actions needed to restore network traffic. This commit do
some cleanups into Intel ice driver to remove a redundant loop to find
stopped queue.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Julio Faracco [Wed, 18 Dec 2019 18:38:44 +0000 (15:38 -0300)]
i40e: Removing hung_queue variable to use txqueue function parameter
The scope of function .ndo_tx_timeout was changed to include the hang
queue when a TX timeout event occurs. See commit
0290bd291cc0
("netdev: pass the stuck queue to the timeout handler") for more
details. Now, drivers don't need to identify which queue is stopped.
Drivers can simply use the queue index provided by dev_watchdog and
execute all actions needed to restore network traffic. This commit do
some cleanups into Intel i40e driver to remove a redundant loop to find
stopped queue.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Thu, 19 Dec 2019 20:10:00 +0000 (12:10 -0800)]
fm10k: use txqueue parameter in fm10k_tx_timeout
Make use of the new txqueue parameter to the .ndo_tx_timeout function.
In fm10k_tx_timeout, remove the now unnecessary loop to determine which
Tx queue is stuck. Instead, just double check the specified queue
This could be improved further to attempt resetting only the specific
queue that got stuck. However, that is a much larger refactor and has
been left as a future improvement.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Wed, 8 Jan 2020 08:19:24 +0000 (10:19 +0200)]
igc: Add PHY power management control
PHY power management control should provide a reliable and accurate
indication of PHY reset completion and decrease the delay time
after a PHY reset
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Sun, 5 Jan 2020 12:20:53 +0000 (14:20 +0200)]
igc: Add support for TSO
TCP segmentation offload allows a device to segment a single frame
into multiple frames with a data payload size specified in socket buffer.
As a result we can now send data approximately up to seven percents fast
than was previously possible on my system.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Wed, 1 Jan 2020 07:14:17 +0000 (09:14 +0200)]
igc: Add SKU for i225 device
Add support for blank NVM SKU
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Wed, 1 Jan 2020 06:49:13 +0000 (08:49 +0200)]
igc: Remove unused definition
Remove the unused IGC_FUNC_0 definition and make the code cleaner
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sasha Neftin [Sun, 29 Dec 2019 14:09:09 +0000 (16:09 +0200)]
igc: Fix typo in a comment
Fix typo in a context descriptor comment
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Guillaume Nault [Thu, 16 Jan 2020 20:16:46 +0000 (21:16 +0100)]
netns: Constify exported functions
Mark function parameters as 'const' where possible.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 16 Jan 2020 18:41:53 +0000 (20:41 +0200)]
net: dsa: felix: Don't error out on disabled ports with no phy-mode
The felix_parse_ports_node function was tested only on device trees
where all ports were enabled. Fix this check so that the driver
continues to probe only with the ports where status is not "disabled",
as expected.
Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Marginean [Thu, 16 Jan 2020 18:09:59 +0000 (20:09 +0200)]
net: dsa: felix: Don't restart PCS SGMII AN if not needed
Some PHYs like VSC8234 don't like it when AN restarts on their system side
and they restart line side AN too, going into an endless link up/down loop.
Don't restart PCS AN if link is up already.
Although in theory this feedback loop should be possible with the other
in-band AN modes too, for some reason it was not seen with the VSC8514
QSGMII and AQR412 USXGMII PHYs. So keep this logic only for SGMII where
the problem was found.
Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK")
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Marginean [Thu, 16 Jan 2020 18:05:06 +0000 (20:05 +0200)]
net: dsa: felix: Set USXGMII link based on BMSR, not LPA
At least some PHYs (AQR412) don't advertise copper-side link status
during system side AN.
So remove this duplicate assignment to pcs->link and rely on the
previous one for link state: the local indication from the MAC PCS.
Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK")
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 16 Jan 2020 17:59:44 +0000 (19:59 +0200)]
Documentation: Fix typo in devlink documentation
The driver is named "mlxsw", not "mlx5".
Fixes: d4255d75856f ("devlink: document info versions for each driver")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 16 Jan 2020 17:18:28 +0000 (19:18 +0200)]
enetc: Don't print from enetc_sched_speed_set when link goes down
It is not an error to unplug a cable from the ENETC port even with TSN
offloads, so don't spam the log with link-related messages from the
tc-taprio offload subsystem, a single notification is sufficient:
[10972.351859] fsl_enetc 0000:00:00.0 eno0: Qbv PSPEED set speed link down.
[10972.360241] fsl_enetc 0000:00:00.0 eno0: Link is Down
Fixes: 2e47cb415f0a ("enetc: update TSN Qbv PSPEED set according to adjust link speed")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Ardelean [Thu, 16 Jan 2020 14:15:35 +0000 (16:15 +0200)]
net: phy: adin: const-ify static data
Some bits of static data should have been made const from the start.
This change adds the const qualifier where appropriate.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hongbo Yao [Thu, 16 Jan 2020 13:14:04 +0000 (21:14 +0800)]
drivers/net: netdevsim depends on INET
If CONFIG_INET is not set and CONFIG_NETDEVSIM=y.
Building drivers/net/netdevsim/fib.o will get the following error:
drivers/net/netdevsim/fib.o: In function `nsim_fib4_rt_hw_flags_set':
fib.c:(.text+0x12b): undefined reference to `fib_alias_hw_flags_set'
drivers/net/netdevsim/fib.o: In function `nsim_fib4_rt_destroy':
fib.c:(.text+0xb11): undefined reference to `free_fib_info'
Correct the Kconfig for netdevsim.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 48bb9eb47b270 ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Hongbo Yao <yaohongbo@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 16 Jan 2020 04:48:50 +0000 (20:48 -0800)]
net: phy: Maintain MDIO device and bus statistics
We maintain global statistics for an entire MDIO bus, as well as broken
down, per MDIO bus address statistics. Given that it is possible for
MDIO devices such as switches to access MDIO bus addresses for which
there is not a mdio_device instance created (therefore not a a
corresponding device directory in sysfs either), we also maintain
per-address statistics under the statistics folder. The layout looks
like this:
/sys/class/mdio_bus/../statistics/
transfers
errrors
writes
reads
transfers_<addr>
errors_<addr>
writes_<addr>
reads_<addr>
When a mdio_device instance is registered, a statistics/ folder is
created with the tranfers, errors, writes and reads attributes which
point to the appropriate MDIO bus statistics structure.
Statistics are 64-bit unsigned quantities and maintained through the
u64_stats_sync.h helper functions.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 15 Jan 2020 19:57:41 +0000 (11:57 -0800)]
netdevsim: fix nsim_fib6_rt_create() error path
It seems nsim_fib6_rt_create() intent was to return
either a valid pointer or an embedded error code.
BUG: unable to handle page fault for address:
fffffffffffffff4
PGD
9870067 P4D
9870067 PUD
9872067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 22851 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:jhash2 include/linux/jhash.h:125 [inline]
RIP: 0010:rhashtable_jhash2+0x76/0x2c0 lib/rhashtable.c:963
Code: b9 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 14 08 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 30 02 00 00 49 8d 7e 04 <41> 8b 06 48 be 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6
RSP: 0018:
ffffc90016127190 EFLAGS:
00010246
RAX:
0000000000000007 RBX:
00000000dfb3ab49 RCX:
dffffc0000000000
RDX:
0000000000000000 RSI:
ffffffff839ba7c8 RDI:
fffffffffffffff8
RBP:
ffffc900161271c0 R08:
ffff8880951f8640 R09:
ffffed1015d0703d
R10:
ffffed1015d0703c R11:
ffff8880ae8381e3 R12:
00000000dfb3ab49
R13:
00000000dfb3ab49 R14:
fffffffffffffff4 R15:
0000000000000007
FS:
00007f40bfbc6700(0000) GS:
ffff8880ae800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
fffffffffffffff4 CR3:
0000000093660000 CR4:
00000000001406f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
rht_key_get_hash include/linux/rhashtable.h:133 [inline]
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
rht_head_hashfn include/linux/rhashtable.h:174 [inline]
__rhashtable_insert_fast.constprop.0+0xe15/0x1180 include/linux/rhashtable.h:723
rhashtable_insert_fast include/linux/rhashtable.h:832 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:603 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:658 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:719 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:744 [inline]
nsim_fib_event_nb+0x1b16/0x2600 drivers/net/netdevsim/fib.c:772
notifier_call_chain+0xc2/0x230 kernel/notifier.c:83
__atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:173
atomic_notifier_call_chain+0x2e/0x40 kernel/notifier.c:183
call_fib_notifiers+0x173/0x2a0 net/core/fib_notifier.c:35
call_fib6_notifiers+0x4b/0x60 net/ipv6/fib6_notifier.c:22
call_fib6_entry_notifiers+0xfb/0x150 net/ipv6/ip6_fib.c:399
fib6_add_rt2node net/ipv6/ip6_fib.c:1216 [inline]
fib6_add+0x20cd/0x3ec0 net/ipv6/ip6_fib.c:1471
__ip6_ins_rt+0x54/0x80 net/ipv6/route.c:1315
ip6_ins_rt+0x96/0xd0 net/ipv6/route.c:1325
__ipv6_dev_ac_inc+0x76f/0xb20 net/ipv6/anycast.c:324
ipv6_sock_ac_join+0x4c1/0x790 net/ipv6/anycast.c:139
do_ipv6_setsockopt.isra.0+0x3908/0x4290 net/ipv6/ipv6_sockglue.c:670
ipv6_setsockopt+0xff/0x180 net/ipv6/ipv6_sockglue.c:944
udpv6_setsockopt+0x68/0xb0 net/ipv6/udp.c:1564
sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3149
__sys_setsockopt+0x261/0x4c0 net/socket.c:2130
__do_sys_setsockopt net/socket.c:2146 [inline]
__se_sys_setsockopt net/socket.c:2143 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2143
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45aff9
Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madhuparna Bhowmik [Wed, 15 Jan 2020 15:55:53 +0000 (21:25 +0530)]
net: xen-netback: hash.c: Use built-in RCU list checking
list_for_each_entry_rcu has built-in RCU and lock checking.
Pass cond argument to list_for_each_entry_rcu.
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik04@gmail.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Wed, 20 Nov 2019 13:06:19 +0000 (15:06 +0200)]
net/mlx5: E-Switch, Increase number of chains and priorities
Increase the number of chains and priorities to support
the whole range available in tc.
We use unmanaged tables and ignore flow level to create more
tables than what we declared to fs_core steering, and we manage
the connections between the tables themselves.
To support that we need FW with ignore_flow_level capability.
Otherwise the old behaviour will be used, where we are limited
by the number of levels we declared (4 chains, 16 prios).
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Wed, 8 Jan 2020 10:11:04 +0000 (12:11 +0200)]
net/mlx5: E-Switch, Refactor chains and priorities
To support the entire chain and prio range (32bit + 16bit),
instead of a using a static array of chains/prios of limited size, create
them dynamically, and use a rhashtable to search for existing chains/prio
combinations.
This will be used in next patch to actually increase the number using
unamanged tables support and ignore flow level capability.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Tue, 26 Nov 2019 12:15:00 +0000 (14:15 +0200)]
net/mlx5: ft: Check prio and chain sanity for ft offload
Before changing the chain from original chain to ft offload chain,
make sure user doesn't actually use chains.
While here, normalize the prio range to that which we support.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Tue, 26 Nov 2019 12:13:42 +0000 (14:13 +0200)]
net/mlx5: ft: Use getter function to get ft chain
FT chain is defined as the next chain after tc.
To prepare for next patches that will increase the number of tc
chains available at runtime, use a getter function to get this
value.
The define is still used in static fs_core allocation,
to calculate the number of chains. This static allocation
will be used if the relevant capabilities won't be available
to support dynamic chains.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Thu, 14 Nov 2019 15:02:59 +0000 (17:02 +0200)]
net/mlx5: Allow creating autogroups with reserved entries
Exclude the last n entries for an autogrouped flow table.
Reserving entries at the end of the FT will ensure that this FG will be
the last to be evaluated. This will be used in the next patch to create
a miss group enabling custom actions on FT miss.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Sun, 5 Jan 2020 13:15:54 +0000 (15:15 +0200)]
net/mlx5: Add ignore level support fwd to table rules
If user sets ignore flow level flag on a rule, that rule can point to
a flow table of any level, including those with levels equal or less
than the level of the flow table it is added on.
This with unamanged tables will be used to create a FDB chain/prio
hierarchy much larger than currently supported level range.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Tue, 23 Jul 2019 08:43:57 +0000 (11:43 +0300)]
net/mlx5: fs_core: Introduce unmanaged flow tables
Currently, Most of the steering tree is statically declared ahead of time,
with steering prios instances allocated for each fdb chain to assign max
number of levels for each of them. This allows fs_core to manage the
connections and levels of the flow tables hierarcy to prevent loops, but
restricts us with the number of supported chains and priorities.
Introduce unmananged flow tables, allowing the user to manage the flow
table connections. A unamanged table is detached from the fs_core flow
table hierarcy, and is only connected back to the hierarchy by explicit
FTEs forward actions.
This will be used together with firmware that supports ignoring the flow
table levels to increase the number of supported chains and prios.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 16 Jan 2020 23:46:42 +0000 (15:46 -0800)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/saeed/linux
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
net/mlx5e: Add discard counters per priority
net/mlx5e: Expose FEC feilds and related capability bit
net/mlx5: Add mlx5_ifc definitions for connection tracking support
net/mlx5: Add copy header action struct layout
net/mlx5: Expose resource dump register mapping
net/mlx5: Add structures and defines for MIRC register
net/mlx5: Read MCAM register groups 1 and 2
net/mlx5: Add structures layout for new MCAM access reg groups
net/mlx5: Expose vDPA emulation device capabilities
net/mlx5: Add Virtio Emulation related device capabilities
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Thu, 14 Nov 2019 14:59:58 +0000 (16:59 +0200)]
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param
which already carries the max_fte, prio and flags memebers, and is
used the same in similar mlx5_create_flow_table() function.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aharon Landau [Mon, 16 Dec 2019 10:50:13 +0000 (12:50 +0200)]
net/mlx5e: Add discard counters per priority
Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Mon, 30 Dec 2019 12:22:57 +0000 (14:22 +0200)]
net/mlx5e: Expose FEC feilds and related capability bit
Introduce 50G per lane FEC modes capability bit and newly supported
fields in PPLM register which allow this configuration.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Paul Blakey [Mon, 1 Apr 2019 10:31:32 +0000 (13:31 +0300)]
net/mlx5: Add mlx5_ifc definitions for connection tracking support
Add the required hardware definitions to mlx5_ifc:
ignore_flow_level, registers, copy_header, and fwd_and_modify cap.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Sholomo <ozsh@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Hamdan Igbaria [Thu, 9 Jan 2020 11:26:53 +0000 (13:26 +0200)]
net/mlx5: Add copy header action struct layout
Add definition for copy header action, copy action is used
to copy header fields from source to destination.
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Mon, 4 Nov 2019 12:51:55 +0000 (14:51 +0200)]
net/mlx5: Expose resource dump register mapping
Add new register enumeration for resource dump. Add layout mapping for
resource dump: access command and response.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Mon, 7 Oct 2019 07:30:32 +0000 (10:30 +0300)]
net/mlx5: Add structures and defines for MIRC register
Add needed structures, layouts and defines for MIRC (Management Image
Re-activation Control) register. This structure will be used for the FSM
reactivation flow in the downstream patches.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Mon, 7 Oct 2019 07:31:42 +0000 (10:31 +0300)]
net/mlx5: Read MCAM register groups 1 and 2
On load, Driver caches MCAM (Management Capabilities Mask Register)
registers. in addition to the only MCAM register group (0) the driver
already reads, here we add support for reading groups 1 and 2.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Mon, 7 Oct 2019 07:29:46 +0000 (10:29 +0300)]
net/mlx5: Add structures layout for new MCAM access reg groups
MCAM has 3 access_reg_groups (0-2). Defines data structures in order to
read and parse access_reg_groups #1 and #2.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:57 +0000 (20:05 +0000)]
netfilter: bitwise: add support for shifts.
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR
and XOR. Extend it to do shifts as well.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:56 +0000 (20:05 +0000)]
netfilter: bitwise: add NFTA_BITWISE_DATA attribute.
Add a new bitwise netlink attribute that will be used by shift
operations to store the size of the shift. It is not used by boolean
operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:55 +0000 (20:05 +0000)]
netfilter: bitwise: only offload boolean operations.
Only boolean operations supports offloading, so check the type of the
operation and return an error for other types.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:54 +0000 (20:05 +0000)]
netfilter: bitwise: add helper for dumping boolean operations.
Split the code specific to dumping bitwise boolean operations out into a
separate function. A similar function will be added later for shift
operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:53 +0000 (20:05 +0000)]
netfilter: bitwise: add helper for evaluating boolean operations.
Split the code specific to evaluating bitwise boolean operations out
into a separate function. Similar functions will be added later for
shift operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:52 +0000 (20:05 +0000)]
netfilter: bitwise: add helper for initializing boolean operations.
Split the code specific to initializing bitwise boolean operations out
into a separate function. A similar function will be added later for
shift operations.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:51 +0000 (20:05 +0000)]
netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a
value of a new enum, nft_bitwise_ops. It describes the type of
operation an expression contains. Currently, it only has one value:
NFT_BITWISE_BOOL. More values will be added later to implement shifts.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:50 +0000 (20:05 +0000)]
netfilter: bitwise: replace gotos with returns.
When dumping a bitwise expression, if any of the puts fails, we use goto
to jump to a label. However, no clean-up is required and the only
statement at the label is a return. Drop the goto's and return
immediately instead.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:49 +0000 (20:05 +0000)]
netfilter: bitwise: remove NULL comparisons from attribute checks.
In later patches, we will be adding more checks. In order to be
consistent and prevent complaints from checkpatch.pl, replace the
existing comparisons with NULL with logical NOT operators.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jeremy Sowden [Wed, 15 Jan 2020 20:05:48 +0000 (20:05 +0000)]
netfilter: nf_tables: white-space fixes.
Indentation fixes for the parameters of a few nft functions.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 13 Jan 2020 18:02:22 +0000 (19:02 +0100)]
netfilter: flowtable: add nf_flow_table_offload_cmd()
Split nf_flow_table_offload_setup() in two functions to make it more
maintainable.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 13 Jan 2020 18:02:14 +0000 (19:02 +0100)]
netfilter: flowtable: add nf_flow_offload_tuple() helper
Consolidate code to configure the flow_cls_offload structure into one
helper function.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 7 Jan 2020 11:25:10 +0000 (12:25 +0100)]
netfilter: hashlimit: do not use indirect calls during gc
no need, just use a simple boolean to indicate we want to reap all
entries.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 6 Jan 2020 11:56:47 +0000 (12:56 +0100)]
netfilter: flowtable: refresh flow if hardware offload fails
If nf_flow_offload_add() fails to add the flow to hardware, then the
NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable
software path.
If flowtable hardware offload is enabled, this patch enqueues a new
request to offload this flow to hardware.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 7 Jan 2020 08:56:27 +0000 (09:56 +0100)]
netfilter: flowtable: add nf_flowtable_hw_offload() helper function
This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that
the flowtable hardware offload is enabled.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Sun, 5 Jan 2020 19:41:15 +0000 (20:41 +0100)]
netfilter: flowtable: use atomic bitwise operations for flow flags
Originally, all flow flag bits were set on only from the workqueue. With
the introduction of the flow teardown state and hardware offload this is
no longer true. Let's be safe and use atomic bitwise operation to
operation with flow flags.
Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Sun, 5 Jan 2020 21:00:57 +0000 (22:00 +0100)]
netfilter: flowtable: remove dying bit, use teardown bit instead
The dying bit removes the conntrack entry if the netdev that owns this
flow is going down. Instead, use the teardown mechanism to push back the
flow to conntrack to let the classic software path decide what to do
with it.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Sun, 5 Jan 2020 19:48:51 +0000 (20:48 +0100)]
netfilter: flowtable: add nf_flow_offload_work_alloc()
Add helper function to allocate and initialize flow offload work and use
it to consolidate existing code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Mon, 6 Jan 2020 11:42:55 +0000 (12:42 +0100)]
netfilter: flowtable: restrict flow dissector match on meta ingress device
Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Sun, 5 Jan 2020 21:26:38 +0000 (22:26 +0100)]
netfilter: flowtable: fetch stats only if flow is still alive
Do not fetch statistics if flow has expired since it might not in
hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING
check from nf_flow_offload_stats() since this flag is never set on.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>
Jeremy Sowden [Wed, 1 Jan 2020 13:41:32 +0000 (13:41 +0000)]
netfilter: nft_bitwise: correct uapi header comment.
The comment documenting how bitwise expressions work includes a table
which summarizes the mask and xor arguments combined to express the
supported boolean operations. However, the row for OR:
mask xor
0 x
is incorrect.
dreg = (sreg & 0) ^ x
is not equivalent to:
dreg = sreg | x
What the code actually does is:
dreg = (sreg & ~x) ^ x
Update the documentation to match.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
YueHaibing [Thu, 16 Jan 2020 02:26:57 +0000 (02:26 +0000)]
sfc: remove duplicated include from efx.c
Remove duplicated include.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Tue, 14 Jan 2020 20:09:18 +0000 (12:09 -0800)]
devlink: fix typos in qed documentation
Review of the recently added documentation file for the qed driver
noticed a couple of typos. Fix them now.
Noticed-by: Michal Kalderon <mkalderon@marvell.com>
Fixes: 0f261c3ca09e ("devlink: add a driver-specific file for the qed driver")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrich Weber [Tue, 14 Jan 2020 14:19:43 +0000 (15:19 +0100)]
pptp: support sockets bound to an interface
use sk_bound_dev_if for route lookup as already done
in most of the other ip_route_output_ports() calls.
Since most PPPoA providers use 10.0.0.138 as default gateway IP
this will allow connections to multiple PPTP providers with the
same IP address over different interfaces.
Signed-off-by: Ulrich Weber <ulrich.weber@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2020 22:04:04 +0000 (23:04 +0100)]
Merge tag 'batadv-next-for-davem-
20200114' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- fix typo and kerneldocs, by Sven Eckelmann
- use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer
- silence some endian sparse warnings by adding annotations,
by Sven Eckelmann
- Update copyright years to 2020, by Sven Eckelmann
- Disable deprecated sysfs configuration by default, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2020 12:48:18 +0000 (13:48 +0100)]
Merge branch 'bridge-add-vlan-notifications-and-rtm-support'
Nikolay Aleksandrov says:
====================
net: bridge: add vlan notifications and rtm support
This patch-set is a prerequisite for adding per-vlan options support
because we need to be able to send vlan-only notifications and do larger
vlan netlink dumps. Per-vlan options are needed as we move the control
more to vlans and would like to add per-vlan state (needed for per-vlan
STP and EVPN), per-vlan multicast options and control, and I'm sure
there would be many more per-vlan options coming.
Now we create/delete/dump vlans with the device AF_SPEC attribute which is
fine since we support vlan ranges or use a compact bridge_vlan_info
structure, but that cannot really be extended to support per-vlan options
well. The biggest issue is dumping them - we tried using the af_spec with
a new vlan option attribute but that led to insufficient message size
quickly, also another minor problem with that is we have to dump all vlans
always when notifying which, with options present, can be huge if they have
different options set, so we decided to add new rtm message types
specifically for vlans and register handlers for them and a new bridge vlan
notification nl group for vlan-only notifications.
The new RTM NEW/DEL/GETVLAN types introduced match the current af spec
bridge functionality and in fact use the same helpers.
The new nl format is:
[BRIDGE_VLANDB_ENTRY]
[BRIDGE_VLANDB_ENTRY_INFO] - bridge_vlan_info (either 1 vlan or
range start)
[BRIDGE_VLANDB_ENTRY_RANGE] - range end
This allows to encapsulate a range in a single attribute and also to
create vlans and immediately set options on all of them with a single
attribute. The GETVLAN dump can span multiple messages and dump all the
necessary information. The vlan-only notifications are sent on
NEW/DELVLAN events or when vlan options change (currently only flags),
we try hard to compress the vlans into ranges in the notifications as
well. When the per-vlan options are added we'll add helpers to check for
option equality between neighbor vlans and will keep compressing them
when possible.
Note patch 02 is not really required, it's just a nice addition to have
human-readable error messages from the different vlan checks.
iproute2 changes and selftests will be sent with the next set which adds
the first per-vlan option - per-vlan state similar to the port state.
v2: changed patch 03 and patch 04 to use nlmsg_parse() in order to
strictly validate the msg and make sure there are no remaining bytes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:14 +0000 (19:56 +0200)]
net: bridge: vlan: notify on vlan add/delete/change flags
Now that we can notify, send a notification on add/del or change of flags.
Notifications are also compressed when possible to reduce their number
and relieve user-space of extra processing, due to that we have to
manually notify after each add/del in order to avoid double
notifications. We try hard to notify only about the vlans which actually
changed, thus a single command can result in multiple notifications
about disjoint ranges if there were vlans which didn't change inside.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:13 +0000 (19:56 +0200)]
net: bridge: vlan: add rtnetlink group and notify support
Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN
and add support for sending vlan notifications (both single and ranges).
No functional changes intended, the notification support will be used by
later patches.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:12 +0000 (19:56 +0200)]
net: bridge: vlan: add rtm range support
Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes
RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress
similar vlans into ranges. This will be also used when per-vlan options
are introduced and vlans' options match, they will be put into a single
range which is encapsulated in one netlink attribute. We need to run
similar checks as br_process_vlan_info() does because these ranges will
be used for options setting and they'll be able to skip
br_process_vlan_info().
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:11 +0000 (19:56 +0200)]
net: bridge: vlan: add del rtm message support
Adding RTM_DELVLAN support similar to RTM_NEWVLAN is simple, just need to
map DELVLAN to DELLINK and register the handler.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:10 +0000 (19:56 +0200)]
net: bridge: vlan: add new rtm message support
Add initial RTM_NEWVLAN support which can only create vlans, operating
similar to the current br_afspec(). We will use it later to also change
per-vlan options. Old-style (flag-based) vlan ranges are not allowed
when using RTM messages, we will introduce vlan ranges later via a new
nested attribute which would allow us to have all the information about a
range encapsulated into a single nl attribute.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:09 +0000 (19:56 +0200)]
net: bridge: vlan: add rtm definitions and dump support
This patch adds vlan rtm definitions:
- NEWVLAN: to be used for creating vlans, setting options and
notifications
- DELVLAN: to be used for deleting vlans
- GETVLAN: used for dumping vlan information
Dumping vlans which can span multiple messages is added now with basic
information (vid and flags). We use nlmsg_parse() to validate the header
length in order to be able to extend the message with filtering
attributes later.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:08 +0000 (19:56 +0200)]
net: bridge: netlink: add extack error messages when processing vlans
Add extack messages on vlan processing errors. We need to move the flags
missing check after the "last" check since we may have "last" set but
lack a range end flag in the next entry.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Tue, 14 Jan 2020 17:56:07 +0000 (19:56 +0200)]
net: bridge: vlan: add helpers to check for vlan id/range validity
Add helpers to check if a vlan id or range are valid. The range helper
must be called when range start or end are detected.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2020 02:53:36 +0000 (18:53 -0800)]
Merge branch 'net-Add-route-offload-indication'
Ido Schimmel says:
====================
net: Add route offload indication
This patch set adds offload indication to IPv4 and IPv6 routes. So far
offload indication was only available for the nexthop via
'RTNH_F_OFFLOAD', which is problematic as a nexthop is usually shared
between multiple routes.
Based on feedback from Roopa and David on the RFC [1], the indication is
split to 'offload' and 'trap'. This is done because not all the routes
present in hardware actually offload traffic from the kernel. For
example, host routes merely trap packets to the kernel. The two flags
are dumped to user space via the 'rtm_flags' field in the ancillary
header of the rtnetlink message.
In addition, the patch set uses the new flags in order to test the FIB
offload API by adding a dummy FIB offload implementation to netdevsim.
The new tests are added to a shared library and can be therefore shared
between different drivers.
Patches #1-#3 add offload indication to IPv4 routes.
Patches #4 adds offload indication to IPv6 routes.
Patches #5-#6 add support for the offload indication in mlxsw.
Patch #7 adds dummy FIB offload implementation in netdevsim.
Patches #8-#10 add selftests.
v2 (feedback from David Ahern):
* Patch #2: Name last argument of fib_dump_info()
* Patch #2: Move 'struct fib_rt_info' to include/net/ip_fib.h so that it
could later be passed to fib_alias_hw_flags_set()
* Patch #3: Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
* Patch #6: Convert to new fib_alias_hw_flags_set() interface
* Patch #7: Convert to new fib_alias_hw_flags_set() interface
[1] https://patchwork.ozlabs.org/cover/
1170530/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:18 +0000 (13:23 +0200)]
selftests: mlxsw: Add test for FIB offload API
The test reuses the common FIB offload tests in order to make sure that
mlxsw correctly implements FIB offload.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:17 +0000 (13:23 +0200)]
selftests: netdevsim: Add test for FIB offload API
Test various aspects of the FIB offload API on top of the netdevsim
implementation. Both good and bad flows are tested.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:16 +0000 (13:23 +0200)]
selftests: forwarding: Add helpers and tests for FIB offload
Implement a set of common helpers and tests for FIB offload that can be
used by multiple drivers to check their FIB offload implementations.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:15 +0000 (13:23 +0200)]
netdevsim: fib: Add dummy implementation for FIB offload
Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing
currently "programmed" routes in a hash table. Each route in the hash
table is marked with "trap" indication. The indication is cleared when
the route is replaced or when the netdevsim instance is deleted.
This will later allow us to test the route offload API on top of
netdevsim.
v2:
* Convert to new fib_alias_hw_flags_set() interface
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:14 +0000 (13:23 +0200)]
mlxsw: spectrum_router: Set hardware flags for routes
Previous patches added support for two hardware flags for IPv4 and IPv6
routes: 'RTM_F_OFFLOAD' and 'RTM_F_TRAP'. Both indicate the presence of
the route in hardware. The first indicates that traffic is actually
offloaded from the kernel, whereas the second indicates that packets
hitting such routes are trapped to the kernel for processing (e.g., host
routes).
Use these two flags in mlxsw. The flags are modified in two places.
Firstly, whenever a route is updated in the device's table. This
includes the addition, deletion or update of a route. For example, when
a host route is promoted to perform NVE decapsulation, its action in the
device is updated, the 'RTM_F_OFFLOAD' flag set and the 'RTM_F_TRAP'
flag cleared.
Secondly, when a route is replaced and overwritten by another route, its
flags are cleared.
v2:
* Convert to new fib_alias_hw_flags_set() interface
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:13 +0000 (13:23 +0200)]
mlxsw: spectrum_router: Separate nexthop offload indication from route
The driver currently uses the 'RTNH_F_OFFLOAD' flag for both routes and
nexthops, which is cumbersome and unnecessary now that we have separate
flag for the route itself.
Separate the offload indication for nexthops from routes and call it
whenever the offload state within the nexthop group changes.
Note that IPv6 (unlike IPv4) does not share the same nexthop group
between different routes, whereas mlxsw does. Therefore, whenever the
offload indication within an IPv6 nexthop group changes, all the linked
routes need to be updated.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:12 +0000 (13:23 +0200)]
ipv6: Add "offload" and "trap" indications to routes
In a similar fashion to previous patch, add "offload" and "trap"
indication to IPv6 routes.
This is done by using two unused bits in 'struct fib6_info' to hold
these indications. Capable drivers are expected to set these when
processing the various in-kernel route notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:11 +0000 (13:23 +0200)]
ipv4: Add "offload" and "trap" indications to routes
When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.
While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.
This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.
'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].
Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.
The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.
v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()
[1]
struct fib_alias {
struct hlist_node fa_list; /* 0 16 */
struct fib_info * fa_info; /* 16 8 */
u8 fa_tos; /* 24 1 */
u8 fa_type; /* 25 1 */
u8 fa_state; /* 26 1 */
u8 fa_slen; /* 27 1 */
u32 tb_id; /* 28 4 */
s16 fa_default; /* 32 2 */
u8 offload:1; /* 34: 0 1 */
u8 trap:1; /* 34: 1 1 */
u8 unused:6; /* 34: 2 1 */
/* XXX 5 bytes hole, try to pack */
struct callback_head rcu __attribute__((__aligned__(8))); /* 40 16 */
/* size: 56, cachelines: 1, members: 12 */
/* sum members: 50, holes: 1, sum holes: 5 */
/* sum bitfield members: 8 bits (1 bytes) */
/* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
/* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:10 +0000 (13:23 +0200)]
ipv4: Encapsulate function arguments in a struct
fib_dump_info() is used to prepare RTM_{NEW,DEL}ROUTE netlink messages
using the passed arguments. Currently, the function takes 11 arguments,
6 of which are attributes of the route being dumped (e.g., prefix, TOS).
The next patch will need the function to also dump to user space an
indication if the route is present in hardware or not. Instead of
passing yet another argument, change the function to take a struct
containing the different route attributes.
v2:
* Name last argument of fib_dump_info()
* Move 'struct fib_rt_info' to include/net/ip_fib.h so that it could
later be passed to fib_alias_hw_flags_set()
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 14 Jan 2020 11:23:09 +0000 (13:23 +0200)]
ipv4: Replace route in list before notifying
Subsequent patches will add an offload / trap indication to routes which
will signal if the route is present in hardware or not.
After programming the route to the hardware, drivers will have to ask
the IPv4 code to set the flags by passing the route's key.
In the case of route replace, the new route is notified before it is
actually inserted into the FIB alias list. This can prevent simple
drivers (e.g., netdevsim) that program the route to the hardware in the
same context it is notified in from being able to set the flag.
Solve this by first inserting the new route to the list and rollback the
operation in case the route was vetoed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 14 Jan 2020 09:24:19 +0000 (10:24 +0100)]
net: socionext: get rid of huge dma sync in netsec_alloc_rx_data
Socionext driver can run on dma coherent and non-coherent devices.
Get rid of huge dma_sync_single_for_device in netsec_alloc_rx_data since
now the driver can let page_pool API to managed needed DMA sync
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2020 02:36:59 +0000 (18:36 -0800)]
Merge branch 'QRTR-flow-control-improvements'
Bjorn Andersson says:
====================
QRTR flow control improvements
In order to prevent overconsumption of resources on the remote side QRTR
implements a flow control mechanism.
Move the handling of the incoming confirm_rx to the receiving process to
ensure incoming flow is controlled. Then implement outgoing flow
control, using the recommended algorithm of counting outstanding
non-confirmed messages and blocking when hitting a limit. The last three
patches refactors the node assignment and port lookup, in order to
remove the worker in the receive path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjorn Andersson [Tue, 14 Jan 2020 07:57:03 +0000 (23:57 -0800)]
net: qrtr: Remove receive worker
Rather than enqueuing messages and scheduling a worker to deliver them
to the individual sockets we can now, thanks to the previous work, move
this directly into the endpoint callback.
This saves us a context switch per incoming message and removes the
possibility of an opportunistic suspend to happen between the message is
coming from the endpoint until it ends up in the socket's receive
buffer.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjorn Andersson [Tue, 14 Jan 2020 07:57:02 +0000 (23:57 -0800)]
net: qrtr: Make qrtr_port_lookup() use RCU
The important part of qrtr_port_lookup() wrt synchronization is that the
function returns a reference counted struct qrtr_sock, or fail.
As such we need only to ensure that an decrement of the object's
refcount happens inbetween the finding of the object in the idr and
qrtr_port_lookup()'s own increment of the object.
By using RCU and putting a synchronization point after we remove the
mapping from the idr, but before it can be released we achieve this -
with the benefit of not having to hold the mutex in qrtr_port_lookup().
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>