Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable AutoAugment and modernize DALI pipeline for ConvNets #1343

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

klecki
Copy link

@klecki klecki commented Aug 28, 2023

Update DALI implementation to use modern "fn" API instead of old class approach.

Add a codepath using AutoAugment in DALI training pipeline. It can be easily extended to use other Automatic Augmentations.

You can read more about DALI's support of Automatic Augmentations here: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/auto_aug/auto_aug.html

The integration of DALI Pipeline with PyTorch additionally skips the transposition when exposing NHWC data.

Extract DALI implementation to a separate file.
Update the readme and some configuration files for EfficientNet:

  • dali-gpu is the default data-backend, instead of PyTorch
  • DALI supports AutoAugment (+ a mention of other Automatic Augmentations)

Fix a typo in the readme files:
--data-backends -> --data-backend

This PR is a backport of the changes made to this example, when it was introduced into DALI codebase:
https://github.com/NVIDIA/DALI/tree/main/docs/examples/use_cases/pytorch/efficientnet

The changes were tested with the smallest EfficientNet only.

The usage of DALI GPU pipeline in the training can remove the CPU bottlneck and improve GPU utilization on both DGX-1V and DGX-A100 when running with AMP which was covered in this blogpost:
https://developer.nvidia.com/blog/why-automatic-augmentation-matters/

Please note, that in the DALI's example we reduced the number of worker threads to half of what is currently setup for PyTorch. This change was not reflected in this PR - optimal default of worker threads for different data-backends is not the same, so it can be set conditionally, I don't know what would be the recommended way to do it.

Update DALI implementation to use modern "fn" API
instead of old class approach.

Add a codepath using AutoAugment in DALI training pipeline.
It can be easily extended to use other Automatic Augmentations.

The integration of DALI Pipeline with PyTorch additionally skips
the transposition when exposing NHWC data.

Extract the DALI implementation to separate file.
Update the readme and some configuration files for EfficientNet:
* dali-gpu is the default one, instead of PyTorch
* DALI supports AutoAugment (+ a mention of other Automatic Augmentations)

Fix a typo in the readme files:
--data-backends -> --data-backend

This PR is a backport of the changes made to this example, when it was
introduced into DALI codebase:
https://github.com/NVIDIA/DALI/tree/main/docs/examples/use_cases/pytorch/efficientnet

The changes were tested with the smallest EfficientNet only.

The usage od DALI GPU pipeline in the training can remove the CPU bottlneck
on both DGX-1V and DGX-A100 when running using AMP which was covered
in the blogpost:
https://developer.nvidia.com/blog/why-automatic-augmentation-matters/

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki klecki force-pushed the dali-efficientnet-aa branch from ddbf5ed to d6c8f05 Compare August 29, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant