ZeRO 2+3 memory estimators#965
Merged
jeffra merged 12 commits intodeepspeedai:masterfrom Jun 23, 2021
Merged
Conversation
samyam
reviewed
Apr 16, 2021
Contributor
|
@stas00 The numbers look slightly smaller than what I would expect for a 3b parameter model but the code looks fine to me. I am guessing the model is not exactly 3b here but slightly smaller? |
Collaborator
Author
It's 2851M - i will make the script dump the exact number of params. |
Collaborator
Author
|
I'm also thinking that I should probably match the launcher API and ask for |
Collaborator
Author
|
Collaborator
Author
|
Added zero 2 estimators and full in-depth docs |
samyam
approved these changes
Jun 4, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With @samyam and @tjruwase's help I have been working on having utils to estimate how much cpu+gpu ram is needed for a given model on a given setup.
This PR adds memory estimators for ZeRO 2+3 params, optim states and gradients for a given
modeland hardware setup:estimate_zero3_model_states_mem_needs_all_live- requires an actual model objectestimate_zero3_model_states_mem_needs_all_cold- requires total_params and largest_layer_paramsestimate_zero2_model_states_mem_needs_all_live- requires an actual model objectestimate_zero2_model_states_mem_needs_all_cold- requires total_params and largest_layer_paramsZeRO-3
Let's try a 3B model with just 1 node with 8 gpus, using live model:
Now, w/o the actual model, which requires us to know
total_paramsandlargest_layer_params, but we got those from the run above, so future estimators are now much faster as we don't need to load the model.a slight difference due to rounding - the actual live model has a few more params
Let's try a 3B model on 8 nodes with 8 gpus each (cold)
Let's try a different setup with just 1 node with 1 gpu:
ZeRO-2
Live:
Cold:
@samyam