Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement num_workers on trainer to boost performance #309

Closed
Lordmau5 opened this issue Apr 12, 2023 · 2 comments · Fixed by #313
Closed

Implement num_workers on trainer to boost performance #309

Lordmau5 opened this issue Apr 12, 2023 · 2 comments · Fixed by #313
Labels
enhancement New feature or request

Comments

@Lordmau5
Copy link
Contributor

Lordmau5 commented Apr 12, 2023

Is your feature request related to a problem? Please describe.
We currently don't use any num_workers for sub-processes as per torch documentation, resulting in warnings in the console that the validation and the training doesn't have any available.

Adding that variable to both and setting it to 4 for each results in a jump from about 1.40it/s to 2.50it/s, at least in my testing.

https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

Describe the solution you'd like
Implement support for num_workers, perhaps as extra fields for both the validator and the trainer in the config.

Additional context
There is currently one downside:
Doing so will, at least with the current pytorch version, throw multiple warnings in the console (and logs) mentioning the wrong usage of TypedStorage and how it's deprecated.
image


I have also noticed that with this change it is staying at a steady VRAM usage (with batch size 16 it stays at 14.8GB, with 20 it stays at around 16.8GB).

It drops to a lower usage whenever it's saving the checkpoints though, but that's fine.

I assume this is also part of where the improvements come from since it doesn't have to constantly reload through the main thread / process and can load the new stuff in the background.

@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add Lordmau5 ideas, maintenance, question, userTesting

@allcontributors
Copy link
Contributor

@34j

I've put up a pull request to add @Lordmau5! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants