-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding consistency calculations/checks at init time #124
Comments
Hello. I'd like to take this issue, please. |
Are we specifically looking for an implementation of consistency checks to be applied directly to Megatron-DeepSpeed? Or is it just a study to better implement checks? |
Yes. I think this is mainly the code starting from: Megatron-LM and thus Meg-DS has already all kinds of validations as well, so I suppose the above ones are additional checks. Additionally, I have started compiling various constraints here: |
Hi @stas00, can I work on this issue? |
Yes, of course, as I it doesn't look that @jtboing is working on it. I could be wrong of course. As I mentioned earlier it was Stella's recommendation so I don't really know what can be done here. So please have a look and if things looks interesting to work on then by all means. It's hard to work without a concrete spec. The only spec I have is the doc I created here: https://github.com/bigscience-workshop/bigscience/blob/master/train/sanity-checks.md to do manual checks so perhaps this could be a good starting point - so that we won't need to do that manually. |
Sure, I'll look into the doc and add relevant checks for those parameters. Will update here! |
I was thinking if we can add a function to validate args before this line: Megatron-DeepSpeed/megatron/arguments.py Line 319 in 04c461e
|
Sorry for the lack of response. Please take over this issue for me. Thanks. |
Whatever you propose we don't want to remove anything but extend improve and refactor if need be.
why? those are available in args. e.g.
|
Exactly, my plan is to extend and provide most |
Can you give me write access for this repo (to raise the PR), @stas00? I made some basic changes and wanted your feedback for the same |
Stella pointed out to how they do consistency calculations/checks with NeoX:
https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py
It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that can help our work, since some good checks can save days of running a model under a wrong setup thinking it's doing something else.
I haven't studied what they did, so I don't have any specific suggestions here.
Thank you.
The text was updated successfully, but these errors were encountered: