drm/amdgpu: Disable ras features on all IPs before gpu reset
authorxinhui pan <xinhui.pan@amd.com>
Thu, 4 Jul 2019 02:54:58 +0000 (10:54 +0800)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 5 Jul 2019 20:59:20 +0000 (15:59 -0500)
Perform a ras_suspend to disable ras on all IPs to workaround
some ROCm stability issue.

Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 802809aa801d2c73d05930dbaa2fe3189382f378..b4616853f46102b339ea1ef32a513383732edd83 100644 (file)
@@ -3719,6 +3719,10 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 
        /* block all schedulers and reset given job's ring */
        list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
+               /* disable ras on ALL IPs */
+               if (amdgpu_device_ip_need_full_reset(tmp_adev))
+                       amdgpu_ras_suspend(tmp_adev);
+
                for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                        struct amdgpu_ring *ring = tmp_adev->rings[i];