A loaded XDP program might write to the xdp_frame data area,
prefetchw() it to avoid a potential cache miss.
Performance tests:
ConnectX-5, XDP_TX packet rate, single ring.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Before: 13,172,976 pps
After: 13,456,248 pps
2% gain.
Fixes: 22f453988194 ("net/mlx5e: Support XDP over Striding RQ")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
dma_sync_single_range_for_cpu(rq->pdev, di->addr, head_offset,
frag_size, DMA_FROM_DEVICE);
+ prefetchw(va); /* xdp_frame data area */
prefetch(data);
rcu_read_lock();