1 From: Alexander Lobakin <alobakin@dlink.ru>
2 Date: Fri, 15 Nov 2019 12:11:35 +0300
3 Subject: [PATCH] net: core: allow fast GRO for skbs with Ethernet header in
6 Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely
7 non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
8 original condition '!skb_headlen(skb)' to
9 'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
10 the drivers that need this optimisation all provide completely
11 non-linear packets" (note that this condition has become the current
12 'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
13 ced14f6804a9 ("net: Correct comparisons and calculations using
14 skb->tail and skb-transport_header") without any functional changes).
16 For now, we have the following rough statistics for v5.4-rc7:
18 2) napi_gro_receive with skb->head containing (most of) payload: 83
19 3) napi_gro_receive with skb->head containing all the headers: 20
20 4) napi_gro_receive with skb->head containing only Ethernet header: 2
22 With the current condition, fast GRO with the usage of
23 NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
24 Packets pushed by [2] and [3] go through the 'slow' path, but
25 it's not a problem for them as they already contain all the needed
26 headers in skb->head, so pskb_may_pull() only moves skb->data.
28 The layout of skbs in the fourth [4] case at the moment of
29 dev_gro_receive() is identical to skbs that have come through [1],
30 as napi_frags_skb() pulls Ethernet header to skb->head. The only
31 difference is that the mentioned condition is always false for them,
32 because skb_put() and friends irreversibly alter the tail pointer.
33 They also go through the 'slow' path, but now every single
34 pskb_may_pull() in every single .gro_receive() will call the *really*
35 slow __pskb_pull_tail() to pull headers to head. This significantly
36 decreases the overall performance for no visible reasons.
38 The only two users of method [4] is:
39 * drivers/staging/qlge
40 * drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)
42 Note that in case with wireless drivers we can't use [1]
43 (napi_gro_frags()) at least for now and mac80211 stack always
44 performs pushes and pulls anyways, so performance hit is inavoidable.
46 At the moment of v2.6.31 the mentioned change was necessary (that's
47 why I don't add the "Fixes:" tag), but it became obsolete since
48 skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro:
49 restore frag0 optimization"), so we can simply revert the condition
50 in gro_reset_offset() to allow skbs from [4] go through the 'fast'
51 path just like in case [1].
53 This was tested on a 600 MHz MIPS CPU and a custom driver and this
54 patch gave boosts up to 40 Mbps to method [4] in both directions
55 comparing to net-next, which made overall performance relatively
56 close to [1] (without it, [4] is the slowest).
59 - Add more references and explanations to commit message
61 - No functional changes
63 Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
64 Signed-off-by: David S. Miller <davem@davemloft.net>
69 @@ -5476,8 +5476,7 @@ static inline void skb_gro_reset_offset(
70 NAPI_GRO_CB(skb)->frag0 = NULL;
71 NAPI_GRO_CB(skb)->frag0_len = 0;
73 - if (skb_mac_header(skb) == skb_tail_pointer(skb) &&
75 + if (!skb_headlen(skb) && pinfo->nr_frags &&
76 !PageHighMem(skb_frag_page(frag0)) &&
77 (!NET_IP_ALIGN || !((skb_frag_off(frag0) + nhoff) & 3))) {
78 NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);