drm/amdgpu: fix race condition in amd_sched_entity_push_job
authorNicolai Hähnle <Nicolai.Haehnle@amd.com>
Wed, 2 Dec 2015 16:35:12 +0000 (17:35 +0100)
committerAlex Deucher <alexander.deucher@amd.com>
Fri, 4 Dec 2015 16:26:52 +0000 (11:26 -0500)
As soon as we leave the spinlock after the job has been added to the job
queue, we can no longer rely on the job's data to be available.

I have seen a null-pointer dereference due to sched == NULL in
amd_sched_wakeup via amd_sched_entity_push_job and
amd_sched_ib_submit_kernel_helper. Since the latter initializes
sched_job->sched with the address of the ring scheduler, which is
guaranteed to be non-NULL, this race appears to be a likely culprit.

Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?bugid=93079
Reviewed-by: Christian König <christian.koenig@amd.com>
drivers/gpu/drm/amd/scheduler/gpu_scheduler.c

index e13b7a013a400670143bd3158d09143896d46ef3..5ace1a74071f2c23a98cc17b141c31bab2347def 100644 (file)
@@ -288,6 +288,7 @@ amd_sched_entity_pop_job(struct amd_sched_entity *entity)
  */
 static bool amd_sched_entity_in(struct amd_sched_job *sched_job)
 {
+       struct amd_gpu_scheduler *sched = sched_job->sched;
        struct amd_sched_entity *entity = sched_job->s_entity;
        bool added, first = false;
 
@@ -302,7 +303,7 @@ static bool amd_sched_entity_in(struct amd_sched_job *sched_job)
 
        /* first job wakes up scheduler */
        if (first)
-               amd_sched_wakeup(sched_job->sched);
+               amd_sched_wakeup(sched);
 
        return added;
 }
@@ -318,9 +319,9 @@ void amd_sched_entity_push_job(struct amd_sched_job *sched_job)
 {
        struct amd_sched_entity *entity = sched_job->s_entity;
 
+       trace_amd_sched_job(sched_job);
        wait_event(entity->sched->job_scheduled,
                   amd_sched_entity_in(sched_job));
-       trace_amd_sched_job(sched_job);
 }
 
 /**