Why is my InMemoryDataset iteration slower when my DataList is on CUDA? #10016

eugeborzone · 2025-02-11T15:06:04Z

eugeborzone
Feb 11, 2025

I'm implementing a Dataset in PyTorch Geometric that inherits from InMemoryDataset, where I store my data in a DataList. I noticed that when I move all the data to CUDA before iterating with a DataLoader, the process becomes significantly slower compared to keeping them on the CPU and moving them to the GPU within each batch.

Observations:

If I keep the data on the CPU and move them to CUDA within each batch, iteration is faster.
If the data is already on CUDA from the start, iteration is slower.

Test

DataList on CUDA

tdl = dataloader_new_cuda.train_dataloader(num_workers=0)
for batch in tdl:
    n +=1

result:

Elapsed time: 73.8706841468811 seconds 
Number of batches: 530 
Average time per batch: 0.13937864933373792 seconds

The profiling says that there so many calls to the method .to()

 42692656 function calls (37803936 primitive calls) in 31.887 seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function)
271360 3.831 0.000 3.831 0.000 {method 'to' of 'torch._C.TensorBase' objects}
530 3.302 0.006 7.134 0.013 collate.py:171()
271360 2.223 0.000 2.223 0.000 {built-in method torch.full}
2171410 1.429 0.000 2.117 0.000 {built-in method builtins.issubclass}
542720 1.273 0.000 4.236 0.000 storage.py:418(num_nodes)

DataList on cpu

tdl = dataloader_new.train_dataloader(num_workers=8)
for batch in tdl:
    batch.to('cuda')
    n +=1

Elapsed time: 13.416890859603882 seconds
Number of batches: 530
Average time per batch: 0.025314888414346946 seconds

Does anyone know why this happens and what would be the best way to optimize data loading in this case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is my InMemoryDataset iteration slower when my DataList is on CUDA? #10016

{{title}}

Replies: 0 comments

Select a reply

Why is my InMemoryDataset iteration slower when my DataList is on CUDA? #10016

eugeborzone Feb 11, 2025

Observations:

Test

DataList on CUDA

DataList on cpu

Replies: 0 comments

eugeborzone
Feb 11, 2025