Skip to content

Comments

introduce debug utils#1136

Merged
jeffra merged 5 commits intodeepspeedai:masterfrom
stas00:deepspeed-partition-debug
Jun 23, 2021
Merged

introduce debug utils#1136
jeffra merged 5 commits intodeepspeedai:masterfrom
stas00:deepspeed-partition-debug

Conversation

@stas00
Copy link
Collaborator

@stas00 stas00 commented Jun 5, 2021

While working on integration of wav2vec2 huggingface/transformers#11638 which is dramatically different from transformers models, I run into multiple problems in the model and deepspeed and the current debug tools were very difficult to use. So I developed a whole set of tools which I placed into a new file: deepspeed/utils/debug.py. The key change is that now you don't need to compare param ids, which is hard especially when params aren't in sync across different gpus. Instead now you get fully qualified param names in the logs, plus ids, classes, etc.

I already integrated a lot of those in zero3, engine and partitioning modules as this is where I was debugging, but it should be easy to start using those in other modules as need arises.

One other addition is instrumenting print_rank_0 to be able to

  1. print for all ranks w/o interleaving prints and
  2. to log into a file per rank to enable debugging synchronization bugs which requires comparing the logs on each gpu.

e.g. wav2vec2 skips layers w/o a sync and that was a hard one to track. It probably should also be moved to debug.py as now there are multiple copies of its variations.

Hopefully the new debug utils are self-explanatory, but please feel free to further rename them to fit your standards/needs.

@jeffra jeffra merged commit c0c4ebf into deepspeedai:master Jun 23, 2021
@stas00 stas00 deleted the deepspeed-partition-debug branch June 23, 2021 20:31
This was referenced Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants