Skip to content

Commit 3a0dc1f

Browse files
Liu01 Tonggregkh
authored andcommitted
drm/amdgpu: fix task hang from failed job submission during process kill
commit aa5fc43 upstream. During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: 1f02f20 ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f101c13) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent ac58c28 commit 3a0dc1f

File tree

2 files changed

+14
-4
lines changed

2 files changed

+14
-4
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1138,6 +1138,9 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p)
11381138
}
11391139
}
11401140

1141+
if (!amdgpu_vm_ready(vm))
1142+
return -EINVAL;
1143+
11411144
r = amdgpu_vm_clear_freed(adev, vm, NULL);
11421145
if (r)
11431146
return r;

drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -654,22 +654,29 @@ int amdgpu_vm_validate(struct amdgpu_device *adev, struct amdgpu_vm *vm,
654654
* Check if all VM PDs/PTs are ready for updates
655655
*
656656
* Returns:
657-
* True if VM is not evicting.
657+
* True if VM is not evicting and all VM entities are not stopped
658658
*/
659659
bool amdgpu_vm_ready(struct amdgpu_vm *vm)
660660
{
661-
bool empty;
662661
bool ret;
663662

664663
amdgpu_vm_eviction_lock(vm);
665664
ret = !vm->evicting;
666665
amdgpu_vm_eviction_unlock(vm);
667666

668667
spin_lock(&vm->status_lock);
669-
empty = list_empty(&vm->evicted);
668+
ret &= list_empty(&vm->evicted);
670669
spin_unlock(&vm->status_lock);
671670

672-
return ret && empty;
671+
spin_lock(&vm->immediate.lock);
672+
ret &= !vm->immediate.stopped;
673+
spin_unlock(&vm->immediate.lock);
674+
675+
spin_lock(&vm->delayed.lock);
676+
ret &= !vm->delayed.stopped;
677+
spin_unlock(&vm->delayed.lock);
678+
679+
return ret;
673680
}
674681

675682
/**

0 commit comments

Comments
 (0)