Skip to content

Commit b0a7ce5

Browse files
rajbharChristianKoenigAMD
authored andcommitted
drm/ttm: Schedule delayed_delete worker closer
Try to allocate system memory on the NUMA node the device is closest to and try to run delayed_delete workers on a CPU of this node as well. To optimize the memory clearing operation when a TTM BO gets freed by the delayed_delete worker, scheduling it closer to a NUMA node where the memory was initially allocated helps avoid the cases where the worker gets randomly scheduled on the CPU cores that are across interconnect boundaries such as xGMI, PCIe etc. This change helps USWC GTT allocations on NUMA systems (dGPU) and AMD APU platforms such as GFXIP9.4.3. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231111130856.1168304-1-rajneesh.bhardwaj@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
1 parent 38f922a commit b0a7ce5

File tree

2 files changed

+11
-3
lines changed

2 files changed

+11
-3
lines changed

drivers/gpu/drm/ttm/ttm_bo.c

+7-1
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,13 @@ static void ttm_bo_release(struct kref *kref)
370370
spin_unlock(&bo->bdev->lru_lock);
371371

372372
INIT_WORK(&bo->delayed_delete, ttm_bo_delayed_delete);
373-
queue_work(bdev->wq, &bo->delayed_delete);
373+
374+
/* Schedule the worker on the closest NUMA node. This
375+
* improves performance since system memory might be
376+
* cleared on free and that is best done on a CPU core
377+
* close to it.
378+
*/
379+
queue_work_node(bdev->pool.nid, bdev->wq, &bo->delayed_delete);
374380
return;
375381
}
376382

drivers/gpu/drm/ttm/ttm_device.c

+4-2
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
204204
if (ret)
205205
return ret;
206206

207-
bdev->wq = alloc_workqueue("ttm", WQ_MEM_RECLAIM | WQ_HIGHPRI, 16);
207+
bdev->wq = alloc_workqueue("ttm",
208+
WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 16);
208209
if (!bdev->wq) {
209210
ttm_global_release();
210211
return -ENOMEM;
@@ -213,7 +214,8 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
213214
bdev->funcs = funcs;
214215

215216
ttm_sys_man_init(bdev);
216-
ttm_pool_init(&bdev->pool, dev, NUMA_NO_NODE, use_dma_alloc, use_dma32);
217+
218+
ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
217219

218220
bdev->vma_manager = vma_manager;
219221
spin_lock_init(&bdev->lru_lock);

0 commit comments

Comments
 (0)