There is a race condition that can occur when calling ena_down().
The ena_clean_tx_irq() - which is a part of the napi handler -
function might wake up the tx queue when the queue is supposed
to be down (during recovery or changing the size of the queues
for example) This causes the ena_start_xmit() function to trigger
and possibly try to access the destroyed queues.
The race is illustrated below:
Flow A: Flow B(napi handler)
ena_down()
netif_carrier_off()
netif_tx_disable()
ena_clean_tx_irq()
netif_tx_wake_queue()
ena_napi_disable_all()
ena_destroy_all_io_queues()
After these flows the tx queue is active and ena_start_xmit() accesses
the destroyed queue which leads to a kernel panic.
fixes:
1738cd3ed342 (net: ena: Add a driver for Amazon Elastic Network Adapters (ENA))
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
above_thresh =
ena_com_sq_have_enough_space(tx_ring->ena_com_io_sq,
ENA_TX_WAKEUP_THRESH);
- if (netif_tx_queue_stopped(txq) && above_thresh) {
+ if (netif_tx_queue_stopped(txq) && above_thresh &&
+ test_bit(ENA_FLAG_DEV_UP, &tx_ring->adapter->flags)) {
netif_tx_wake_queue(txq);
u64_stats_update_begin(&tx_ring->syncp);
tx_ring->tx_stats.queue_wakeup++;