Merge branch 'xps-symmretric-queue-selection'
Amritha Nambiar says:
====================
Symmetric queue selection using XPS for Rx queues
This patch series implements support for Tx queue selection based on
Rx queue(s) map. This is done by configuring Rx queue(s) map per Tx-queue
using sysfs attribute. If the user configuration for Rx queues does
not apply, then the Tx queue selection falls back to XPS using CPUs and
finally to hashing.
XPS is refactored to support Tx queue selection based on either the
CPUs map or the Rx-queues map. The config option CONFIG_XPS needs to be
enabled. By default no receive queues are configured for the Tx queue.
- /sys/class/net/<dev>/queues/tx-*/xps_rxqs
A set of receive queues can be mapped to a set of transmit queues (many:many),
although the common use case is a 1:1 mapping. This will enable sending
packets on the same Tx-Rx queue association as this is useful for busy polling
multi-threaded workloads where it is not possible to pin the threads to
a CPU. This is a rework of Sridhar's patch for symmetric queueing via
socket option:
https://www.spinics.net/lists/netdev/msg453106.html
Testing Hints:
Kernel: Linux 4.17.0-rc7+
Interface:
driver: ixgbe
version: 5.1.0-k
firmware-version: 0x00015e0b
Configuration:
ethtool -L $iface combined 16
ethtool -C $iface rx-usecs 1000
sysctl net.core.busy_poll=1000
ATR disabled:
ethtool -K $iface ntuple on
Workload:
Modified memcached that changes the thread selection policy to be based
on the incoming rx-queue of a connection using SO_INCOMING_NAPI_ID socket
option. The default is round-robin.
Default: No rxqs_map configured
Symmetric queues: Enable rxqs_map for all queues 1:1 mapped to Tx queue
System:
Architecture: x86_64
CPU(s): 72
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
16 threads 400K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 4/51/2215 2/30/5163
(usec)
intr/sec 26655 18606
contextswitch/sec 5145 4044
insn per cycle 0.43 0.72
cache-misses 6.919 4.310
(% of all cache refs)
L1-dcache-load- 4.49 3.29
-misses
(% of all L1-dcache hits)
LLC-load-misses 13.26 8.96
(% of all LL-cache hits)
-------------------------------------------------------------------------------
32 threads 400K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 10/112/5562 9/46/4637
(usec)
intr/sec 30456 27666
contextswitch/sec 7552 5133
insn per cycle 0.41 0.49
cache-misses 9.357 2.769
(% of all cache refs)
L1-dcache-load- 4.09 3.98
-misses
(% of all L1-dcache hits)
LLC-load-misses 12.96 3.96
(% of all LL-cache hits)
-------------------------------------------------------------------------------
16 threads 800K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 5/151/4989 9/69/2611
(usec)
intr/sec 35686 22907
contextswitch/sec 25522 12281
insn per cycle 0.67 0.74
cache-misses 8.652 6.38
(% of all cache refs)
L1-dcache-load- 3.19 2.86
-misses
(% of all L1-dcache hits)
LLC-load-misses 16.53 11.99
(% of all LL-cache hits)
-------------------------------------------------------------------------------
32 threads 800K requests/sec
=============================
-------------------------------------------------------------------------------
Default Symmetric queues
-------------------------------------------------------------------------------
RTT min/avg/max 6/163/6152 8/88/4209
(usec)
intr/sec 47079 26548
contextswitch/sec 42190 39168
insn per cycle 0.45 0.54
cache-misses 8.798 4.668
(% of all cache refs)
L1-dcache-load- 6.55 6.29
-misses
(% of all L1-dcache hits)
LLC-load-misses 13.91 10.44
(% of all LL-cache hits)
-------------------------------------------------------------------------------
v6:
- Changed the names of some functions to begin with net_if.
- Cleaned up sk_tx_queue_set/sk_rx_queue_set functions.
- Added sk_rx_queue_clear to make it consistent with tx_queue_mapping
initialization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>