net/mlx5: E-switch, Protect eswitch mode changes
authorParav Pandit <parav@mellanox.com>
Wed, 18 Dec 2019 08:51:19 +0000 (02:51 -0600)
committerSaeed Mahameed <saeedm@mellanox.com>
Thu, 26 Mar 2020 06:19:25 +0000 (23:19 -0700)
commit8e0aa4bc959c98c14ed0aaee522d77ca52690189
tree0ba3b01fa8d4dd6fa042c410854edccdc63cc371
parentebf77bb83f635377ad7946b73490b18ecf50dc29
net/mlx5: E-switch, Protect eswitch mode changes

Currently eswitch mode change is occurring from 2 different execution
contexts as below.
1. sriov sysfs enable/disable
2. devlink eswitch set commands

Both of them need to access eswitch related data structures in
synchronized manner.
Without any synchronization below race condition exist.

SR-IOV enable/disable with devlink eswitch mode change:

          cpu-0                         cpu-1
          -----                         -----
mlx5_device_disable_sriov()        mlx5_devlink_eswitch_mode_set()
  mlx5_eswitch_disable()             esw_offloads_stop()
    esw_offloads_disable()             mlx5_eswitch_disable()
                                         esw_offloads_disable()

Hence, they are synchronized using a new mode_lock.
eswitch's state_lock is not used as it can lead to a deadlock scenario
below and state_lock is only for vport and fdb exclusive access.

ip link set vf <param>
  netlink rcv_msg() - Lock A
    rtnl_lock
    vfinfo()
      esw->state_lock() - Lock B
devlink eswitch_set
   devlink_mutex
     esw->state_lock() - Lock B
       attach_netdev()
         register_netdev()
           rtnl_lock - Lock A

Alternatives considered:
1. Acquiring rtnl lock before taking esw->state_lock to follow similar
locking sequence as ip link flow during eswitch mode set.
rtnl lock is not good idea for two reasons.
(a) Holding rtnl lock for several hundred device commands is not good
    idea.
(b) It leads to below and more similar deadlocks.

devlink eswitch_set
   devlink_mutex
     rtnl_lock - Lock A
       esw->state_lock() - Lock B
         eswitch_disable()
           reload()
             ib_register_device()
               ib_cache_setup_one()
                 rtnl_lock()

2. Exporting devlink lock may lead to undesired use of it in vendor
driver(s) in future.

3. Unloading representors outside of the mode_lock requires
serialization with other process trying to enable the eswitch.

4. Differing the representors life cycle to a different workqueue
requires synchronization with func_change_handler workqueue.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
drivers/net/ethernet/mellanox/mlx5/core/sriov.c