-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Dataset.to_datapipe
for converting PyG datasets into a torchdata DataPipe
#6141
Conversation
ed0e41c
to
93d4d7a
Compare
93d4d7a
to
7109982
Compare
Codecov Report
@@ Coverage Diff @@
## master #6141 +/- ##
=======================================
Coverage 84.32% 84.33%
=======================================
Files 387 387
Lines 21364 21385 +21
=======================================
+ Hits 18016 18035 +19
- Misses 3348 3350 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
a37bc51
to
ac37f45
Compare
ac37f45
to
13748dd
Compare
|
9539648
to
7bf95d8
Compare
a8880f4
to
b0be7f6
Compare
b0be7f6
to
211c16e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long to merge this. I changed the to_datapipe
method to an instance method - I think this is a bit more intuitive.
Thank you for merging this!
I agree it is a bit more intuitive as a method but I will need to double check the behaviour with multiprocessing. I think I wanted to avoid having the dataset as a class property of the pipe since I recall this as being more efficient when using multiprocessing since the dataset can be lazily initialised within each worker....at least in theory. Will check the behaviour with the latest pyg-nightly. |
What's the benefit of initializing the dataset in each worker rather than making use of shared memory? Sorry I went ahead with this. If it does not fit your needs, please feel free to adjust this :( |
There is some overlap in functionality in the torch DataPipe API with features that are already available in PyG's dataset implementations:
This patch adds a
@classmethod
onto the PyGDataset
interface to support converting any PyG dataset into a DataPipe.Todo:
DatasetAdapter
: should it beMapDataPipe
or isIterDataPipe
len
method