-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what dataloader to use for torchdata.nodes nodes? #1442
Comments
Hi @keunwoochoi , For your failing examples, can you share a minimum working example, and we can look into it. |
i see. yes but, to clarify @ramanishsingh - you meant, |
from #1389, it seems like |
@keunwoochoi thanks for trying this out! We shoudl clarify this in the documentation, but right now the idea is that torchdata.nodes is a super-set of StatefulDataLoader, ie nodes should be able to do everything torch.utils.DataLoader and StatefulDataLoader should do, but nodes are not designed to be plugged into StatefulDataLoader. cc @scotts on confusion around torchdata vs dataloader v1. |
@andrewkho i see, thanks. so.. guess i'm still not sure after instantiating a |
@keunwoochoi Maybe |
based on this official example, my guess is that we're supposed to compose nodes like this then it'll work like a
but i still wonder how we can implement an early stage sharding. well ok, technically it's possible by instantiating same node (that performs from sharded file listing to loading and processing) many times (e.g., 4 of them) and then multiplexing them + with prefetch it would work. |
Actually, I'm also trying to figure out how to make multiple parallel initial nodes - each with its own worker. I haven't found a straightforward solution though. I asked about it here #1334 (comment), so hopefully we will get a reply soon. |
@prompteus i think one way is to just i) instantiate multiple nodes that has the same, common processing. (perhaps with some sharding in the early stage if needed) maybe that's it?? |
@keunwoochoi I guess one way to implement it is to generally follow what you suggested, but make a custom subclass of However, I'm still not sure if I'm not just overlooking a feature that solves this problem more systematically. |
hi, thanks for reviving torchdata. i was able to move on to
0.10.1
for lots of my existing datapipes. it seems to work pretty nicely.question - am i supposed to use
torchdata.nodes.Loader
ortorchdata.stateful_dataloader.StatefulDataLoader
for my data nodes? or justtorch.utils.data.DataLoader
? i'm getting confused a bit after reading the docs and code. currentlyLoader
works for my iterable data nodes, but with some caveats (no multi processing).The text was updated successfully, but these errors were encountered: