IB/mlx5: Fix implicit ODP race
authorArtemy Kovalyov <artemyko@mellanox.com>
Thu, 27 Feb 2020 11:39:18 +0000 (13:39 +0200)
committerJason Gunthorpe <jgg@mellanox.com>
Wed, 4 Mar 2020 17:25:00 +0000 (13:25 -0400)
commitde5ed007a03d71daaa505f5daa4d3666530c7090
tree919b5bb1d3cdf638fec20aa2da669949e19c1601
parent817a68a6584aa08e323c64283fec5ded7be84759
IB/mlx5: Fix implicit ODP race

Following race may occur because of the call_srcu and the placement of
the synchronize_srcu vs the xa_erase.

CPU0    CPU1

mlx5_ib_free_implicit_mr:    destroy_unused_implicit_child_mr:
 xa_erase(odp_mkeys)
 synchronize_srcu()
    xa_lock(implicit_children)
    if (still in xarray)
       atomic_inc()
       call_srcu()
    xa_unlock(implicit_children)
 xa_erase(implicit_children):
   xa_lock(implicit_children)
   __xa_erase()
   xa_unlock(implicit_children)

 flush_workqueue()
   [..]
    free_implicit_child_mr_rcu:
     (via call_srcu)
      queue_work()

 WARN_ON(atomic_read())
   [..]
    free_implicit_child_mr_work:
     (via wq)
      free_implicit_child_mr()
 mlx5_mr_cache_invalidate()
     mlx5_ib_update_xlt() <-- UMR QP fail
     atomic_dec()

The wait_event() solves the race because it blocks until
free_implicit_child_mr_work() completes.

Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy")
Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
drivers/infiniband/hw/mlx5/mlx5_ib.h
drivers/infiniband/hw/mlx5/odp.c