Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom DataLoader with two levels of subprocess workers #343

Merged
merged 9 commits into from
Jul 23, 2021

Conversation

pzelasko
Copy link
Collaborator

... for the lack of a better name, LhotseDataLoader. The main difference between this and torch.utils.data.DataLoader is that LhotseDataLoader allows to launch subprocesses inside of its workers. This is useful for working with dataset classes which perform dynamic batching and need to perform concurrent I/O to read all the necessary data from disk/network.

This is tested locally with good speed-ups from the inner worker pool, I'll have to test it on the real thing next.

@pzelasko
Copy link
Collaborator Author

As this is unlikely to break anything and tests are passing I'll just merge -- the LhotseDataLoader works with GigaSpeech and seems to alleviate some of the I/O issues, but I still don't know if it's going to be quirky or not. Users do not have to switch to it, our stuff still works perfectly fine with standard DataLoader as before.

@pzelasko pzelasko merged commit e95d134 into master Jul 23, 2021
@pzelasko pzelasko added this to the v0.8 milestone Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant