Add EfficientNet example using automatic augmentations with DALI #4678

klecki · 2023-02-28T18:37:14Z

Category: New feature, Other

Description:

This example ports the EfficientNet sample from DeepLearningExamples repository.

The example is limited to efficientnet-b0 variant for simplicity. DALI pipeline is updated to use fn API and to use new automatic augmentations: adding options to select both AutoAugment and TrivialAugment.

The main.py is adjusted so the defaults are suitable for EfficientNet training immediately (previously they were the defaults for RN50 training) and launch.py is no longer needed - the original example was started via launch.py, that looked up default values for specific network in an .yml config and passed them to the main.py. This way we can use main.py directly without the layers of intermediate scripts.

The benchmarks from readme are used to implement the L3 test.

The automatic augmentations come from: #4648. It can already be reviewed as the API is basically one-line invocation within the pipeline definition.

Additional information:

Affected modules and functionalities:

Docs/examples PR with L3 test.

Key points relevant for the review:

❗ Please check if the defaults in main.py match the ones in https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/configs.yml for DGX-1V, efficientnet-b0.

How to review this PR

I suggest checking out the code and running diff tool like meld to compare directories:

docs/examples/use_cases/pytorch/efficientnet/ from this PR
PyTorch/Classification/ConvNets/ from DeepLearningExamples

that way you can see that most of this PR are just files copied over.

Individual commits description:

I tried my best to split the PR into commits that are easier to review. Those are the steps (and specific commits):

Copy the contents of DeepLearningExamples PyTorch/Classification/ConvNets/
Remove as many files as possible from the resulting directory.
Remove the usage of layouts other than efficientnet-b0
Propagating the defaults from https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Classification/ConvNets/configs.yml to main.py
Add new DALI pipeline, readme, new arguments <- most changes here
- new pipeline is defined in dali.py
- old is removed from dataloaders.py
- main.py gets some argument changes
- readme.rst is introduced.
Add new L3 test + some small fixes for pipeline.

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: 3194

github-advanced-security

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

klecki · 2023-03-06T18:50:45Z

Todo,

apply in the test:

[PASS] bench_report_synthetic.json above threshold: 11213.31192785647 >= 10800                                                                                                                                                                                                                [PASS] bench_report_dali.json above threshold: 9440.822217370116 >= 6000                                                                                                                                                                                                                      [PASS] bench_report_dali_aa.json above threshold: 9535.54048843674 >= 9000                                                                                                                                                                                                                    [PASS] bench_report_dali_ta.json above threshold: 9485.628250563821 >= 9000                                                                                                                                                                                                                   [PASS] bench_report_pytorch.json above threshold: 7501.932983753756 >= 7200                                                                                                                                                                                                                   [PASS] bench_report_pytorch_aa.json above threshold: 7329.357160196307 >= 7200

reduce the number of epochs, possibly disable some variants.

jantonguirao · 2023-03-07T17:07:48Z

Can you make this PR to have a first commit with a copy of what's in DeepLearningExamples so that I can see what you actually changed?

klecki · 2023-03-07T18:23:52Z

Can you make this PR to have a first commit with a copy of what's in DeepLearningExamples so that I can see what you actually changed?

It is already done this way, look at the PR description for details about individual commits.

jantonguirao · 2023-03-08T10:51:26Z

docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py

+    else:
+        output = images
+
+    output = fn.crop_mirror_normalize(output, dtype=types.FLOAT, output_layout=types.NCHW,


Suggested change

output = fn.crop_mirror_normalize(output, dtype=types.FLOAT, output_layout=types.NCHW,

output = fn.crop_mirror_normalize(output, dtype=types.FLOAT, output_layout="CHW",

types.NCHW was deprecated years ago

I just realized, that we should have NHWC as a parameter here, and I wonder how it even got a good training result.

So we are doing a double transposition unnecessarily, I wonder if and how much faster we can get without it.

We can get around the transposition, I am not sure if it gives any benefit, the benchmarks give me a bit more samples/s but that might be just noise.

Either way, I use "CHW" and "HWC" now and produce the memory in target layout for NHWC case.

jantonguirao · 2023-03-08T10:52:07Z

docs/examples/use_cases/pytorch/efficientnet/image_classification/dali.py

+    images = fn.decoders.image(jpegs, device="mixed", output_type=types.RGB)
+
+    images = fn.resize(images, resize_shorter=image_size, interp_type=interpolation,
+                       antialias=False)


do you really want to disable antialiasing? Just asking

I took it from the original pipeline.

jantonguirao

LGTM, minor comments only

szalpal

Put some comments in the readme. I still have the sh file left to review

docs/examples/use_cases/pytorch/efficientnet/image_classification/models/__init__.py

szalpal · 2023-03-08T10:44:41Z