Enable AutoAugment and modernize DALI pipeline for ConvNets #1343
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update DALI implementation to use modern "fn" API instead of old class approach.
Add a codepath using AutoAugment in DALI training pipeline. It can be easily extended to use other Automatic Augmentations.
You can read more about DALI's support of Automatic Augmentations here: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/auto_aug/auto_aug.html
The integration of DALI Pipeline with PyTorch additionally skips the transposition when exposing NHWC data.
Extract DALI implementation to a separate file.
Update the readme and some configuration files for EfficientNet:
Fix a typo in the readme files:
--data-backends -> --data-backend
This PR is a backport of the changes made to this example, when it was introduced into DALI codebase:
https://github.com/NVIDIA/DALI/tree/main/docs/examples/use_cases/pytorch/efficientnet
The changes were tested with the smallest EfficientNet only.
The usage of DALI GPU pipeline in the training can remove the CPU bottlneck and improve GPU utilization on both DGX-1V and DGX-A100 when running with AMP which was covered in this blogpost:
https://developer.nvidia.com/blog/why-automatic-augmentation-matters/
Please note, that in the DALI's example we reduced the number of worker threads to half of what is currently setup for PyTorch. This change was not reflected in this PR - optimal default of worker threads for different data-backends is not the same, so it can be set conditionally, I don't know what would be the recommended way to do it.