Why is my InMemoryDataset iteration slower when my DataList is on CUDA? #10016
Unanswered
eugeborzone
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm implementing a
Dataset
in PyTorch Geometric that inherits fromInMemoryDataset
, where I store my data in aDataList
. I noticed that when I move all the data toCUDA
before iterating with aDataLoader
, the process becomes significantly slower compared to keeping them on the CPU and moving them to the GPU within each batch.Observations:
Test
DataList on CUDA
result:
The profiling says that there so many calls to the method .to()
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
271360 3.831 0.000 3.831 0.000 {method 'to' of 'torch._C.TensorBase' objects}
530 3.302 0.006 7.134 0.013 collate.py:171()
271360 2.223 0.000 2.223 0.000 {built-in method torch.full}
2171410 1.429 0.000 2.117 0.000 {built-in method builtins.issubclass}
542720 1.273 0.000 4.236 0.000 storage.py:418(num_nodes)
DataList on cpu
Elapsed time: 13.416890859603882 seconds
Number of batches: 530
Average time per batch: 0.025314888414346946 seconds
Does anyone know why this happens and what would be the best way to optimize data loading in this case?
Beta Was this translation helpful? Give feedback.
All reactions