Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing distributed Adam when running with one work queue #5560

Merged
merged 14 commits into from
Jan 30, 2023

Commits on Dec 2, 2022

  1. Dist Adam constructs a single param bucket for each GPT layer

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 2, 2022
    Configuration menu
    Copy the full SHA
    67eb769 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2022

  1. Configuration menu
    Copy the full SHA
    de2d5c0 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2022

  1. Synchronize dist Adam reduce-scatters before launching model-parallel…

    … all-reduces
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 6, 2022
    Configuration menu
    Copy the full SHA
    df5b609 View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2022

  1. Configure per-layer dist Adam buckets for BERT and T5

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 7, 2022
    Configuration menu
    Copy the full SHA
    b776896 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f032168 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2022

  1. Remove unused variables

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 12, 2022
    Configuration menu
    Copy the full SHA
    ce9602f View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2022

  1. Configure GPT with one dist Adam bucket per virtual pipeline stage

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 20, 2022
    Configuration menu
    Copy the full SHA
    3bb4b8e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a087029 View commit details
    Browse the repository at this point in the history
  3. Configure BERT with one dist Adam bucket per virtual pipeline stage

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Dec 20, 2022
    Configuration menu
    Copy the full SHA
    6301e53 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2022

  1. Configuration menu
    Copy the full SHA
    491d673 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2023

  1. Configuration menu
    Copy the full SHA
    cbcc82b View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2023

  1. Configuration menu
    Copy the full SHA
    1cb776a View commit details
    Browse the repository at this point in the history
  2. Update Apex commit in Dockerfile

    Need recent updates to Apex distributed Adam optimizer.
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Jan 20, 2023
    Configuration menu
    Copy the full SHA
    75a1a66 View commit details
    Browse the repository at this point in the history
  3. Remove logic for per-virtual-pipeline distopt buckets from T5

    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    timmoon10 committed Jan 20, 2023
    Configuration menu
    Copy the full SHA
    4b639a5 View commit details
    Browse the repository at this point in the history