New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Optimizing distributed Adam when running with one work queue #5560

Merged

ericharper merged 14 commits into NVIDIA:main from timmoon10:dist-adam-variable-buckets

Jan 30, 2023

Commits on Dec 2, 2022

Dist Adam constructs a single param bucket for each GPT layer
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 2, 2022
Configuration menu
View commit details

Copy full SHA for 67eb769

Browse repository at this point
Copy the full SHA

67eb769 View commit details

Browse the repository at this point in the history

Commits on Dec 5, 2022

Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Dec 5, 2022
Configuration menu
View commit details

Copy full SHA for de2d5c0

Browse repository at this point
Copy the full SHA

de2d5c0 View commit details

Browse the repository at this point in the history

Commits on Dec 6, 2022

Synchronize dist Adam reduce-scatters before launching model-parallel…
```
… all-reduces

Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 6, 2022
Configuration menu
View commit details

Copy full SHA for df5b609

Browse repository at this point
Copy the full SHA

df5b609 View commit details

Browse the repository at this point in the history

Commits on Dec 7, 2022

Configure per-layer dist Adam buckets for BERT and T5
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 7, 2022
Configuration menu
View commit details

Copy full SHA for b776896

Browse repository at this point
Copy the full SHA

b776896 View commit details

Browse the repository at this point in the history
Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Dec 7, 2022
Configuration menu
View commit details

Copy full SHA for f032168

Browse repository at this point
Copy the full SHA

f032168 View commit details

Browse the repository at this point in the history

Commits on Dec 12, 2022

Remove unused variables
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 12, 2022
Configuration menu
View commit details

Copy full SHA for ce9602f

Browse repository at this point
Copy the full SHA

ce9602f View commit details

Browse the repository at this point in the history

Commits on Dec 20, 2022

Configure GPT with one dist Adam bucket per virtual pipeline stage
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 20, 2022
Configuration menu
View commit details

Copy full SHA for 3bb4b8e

Browse repository at this point
Copy the full SHA

3bb4b8e View commit details

Browse the repository at this point in the history
Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Dec 20, 2022
Configuration menu
View commit details

Copy full SHA for a087029

Browse repository at this point
Copy the full SHA

a087029 View commit details

Browse the repository at this point in the history
Configure BERT with one dist Adam bucket per virtual pipeline stage
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Dec 20, 2022
Configuration menu
View commit details

Copy full SHA for 6301e53

Browse repository at this point
Copy the full SHA

6301e53 View commit details

Browse the repository at this point in the history

Commits on Dec 23, 2022

Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Dec 23, 2022
Configuration menu
View commit details

Copy full SHA for 491d673

Browse repository at this point
Copy the full SHA

491d673 View commit details

Browse the repository at this point in the history

Commits on Jan 5, 2023

Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Jan 5, 2023
Configuration menu
View commit details

Copy full SHA for cbcc82b

Browse repository at this point
Copy the full SHA

cbcc82b View commit details

Browse the repository at this point in the history

Commits on Jan 20, 2023

Merge branch 'main' into dist-adam-variable-buckets

timmoon10 committed Jan 20, 2023
Configuration menu
View commit details

Copy full SHA for 1cb776a

Browse repository at this point
Copy the full SHA

1cb776a View commit details

Browse the repository at this point in the history
Update Apex commit in Dockerfile
```
Need recent updates to Apex distributed Adam optimizer.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Jan 20, 2023
Configuration menu
View commit details

Copy full SHA for 75a1a66

Browse repository at this point
Copy the full SHA

75a1a66 View commit details

Browse the repository at this point in the history
Remove logic for per-virtual-pipeline distopt buckets from T5
```
Signed-off-by: Tim Moon <tmoon@nvidia.com>
```
timmoon10 committed Jan 20, 2023
Configuration menu
View commit details

Copy full SHA for 4b639a5

Browse repository at this point
Copy the full SHA

4b639a5 View commit details

Browse the repository at this point in the history