IPoIB: Call skb_dst_drop() once skb is enqueued for sending
authorRoland Dreier <roland@purestorage.com>
Wed, 19 Dec 2012 17:16:43 +0000 (09:16 -0800)
committerRoland Dreier <roland@purestorage.com>
Wed, 19 Dec 2012 17:16:43 +0000 (09:16 -0800)
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.

Signed-off-by: Roland Dreier <roland@purestorage.com>
drivers/infiniband/ulp/ipoib/ipoib_cm.c
drivers/infiniband/ulp/ipoib/ipoib_ib.c

index 72ae63f0072d45fd931377e4a6f354b271c4b261..03103d2bd641e715ce8ab18c147173bdc5624f21 100644 (file)
@@ -752,6 +752,9 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
                dev->trans_start = jiffies;
                ++tx->tx_head;
 
+               skb_orphan(skb);
+               skb_dst_drop(skb);
+
                if (++priv->tx_outstanding == ipoib_sendq_size) {
                        ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
                                  tx->qp->qp_num);
index f10221f40803959198a3b85e21b8b57ce01ea60a..a1bca70e20aa5b0ea33d4d5410e35b79531f77b6 100644 (file)
@@ -615,8 +615,9 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 
                address->last_send = priv->tx_head;
                ++priv->tx_head;
-               skb_orphan(skb);
 
+               skb_orphan(skb);
+               skb_dst_drop(skb);
        }
 
        if (unlikely(priv->tx_outstanding > MAX_SEND_CQE))