Merged
Conversation
tjruwase
approved these changes
Jun 8, 2021
samyam
approved these changes
Jun 23, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
While working on integration of wav2vec2 huggingface/transformers#11638 which is dramatically different from transformers models, I run into multiple problems in the model and deepspeed and the current debug tools were very difficult to use. So I developed a whole set of tools which I placed into a new file:
deepspeed/utils/debug.py. The key change is that now you don't need to compare param ids, which is hard especially when params aren't in sync across different gpus. Instead now you get fully qualified param names in the logs, plus ids, classes, etc.I already integrated a lot of those in zero3, engine and partitioning modules as this is where I was debugging, but it should be easy to start using those in other modules as need arises.
One other addition is instrumenting
print_rank_0to be able toe.g. wav2vec2 skips layers w/o a sync and that was a hard one to track. It probably should also be moved to
debug.pyas now there are multiple copies of its variations.Hopefully the new debug utils are self-explanatory, but please feel free to further rename them to fit your standards/needs.