Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding consistency calculations/checks at init time #124

Open
stas00 opened this issue Oct 4, 2021 · 11 comments · May be fixed by #233
Open

adding consistency calculations/checks at init time #124

stas00 opened this issue Oct 4, 2021 · 11 comments · May be fixed by #233
Labels
Good First Issue Good for newcomers Good Second Issue For harder tasks

Comments

@stas00
Copy link
Contributor

stas00 commented Oct 4, 2021

Stella pointed out to how they do consistency calculations/checks with NeoX:
https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py

It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that can help our work, since some good checks can save days of running a model under a wrong setup thinking it's doing something else.

I haven't studied what they did, so I don't have any specific suggestions here.

Thank you.

@stas00 stas00 added Good First Issue Good for newcomers Good Second Issue For harder tasks labels Oct 4, 2021
@jtboing
Copy link

jtboing commented Nov 26, 2021

Hello. I'd like to take this issue, please.

@jtboing
Copy link

jtboing commented Nov 26, 2021

Are we specifically looking for an implementation of consistency checks to be applied directly to Megatron-DeepSpeed? Or is it just a study to better implement checks?

@stas00
Copy link
Contributor Author

stas00 commented Nov 26, 2021

Are we specifically looking for an implementation of consistency checks to be applied directly to Megatron-DeepSpeed?

Yes.

I think this is mainly the code starting from:

https://github.com/EleutherAI/gpt-neox/blob/49e60fe7ad14f6991a7fa678d3a0c330d09b9ff4/megatron/neox_arguments/arguments.py#L641

Megatron-LM and thus Meg-DS has already all kinds of validations as well, so I suppose the above ones are additional checks.

Additionally, I have started compiling various constraints here:
https://github.com/bigscience-workshop/bigscience/blob/master/train/sanity-checks.md
so it'd be good to make sure to encompass this list as well - it's possible that it's already covered by the code in neox.

@bhavitvyamalik
Copy link
Collaborator

Hi @stas00, can I work on this issue?

@stas00
Copy link
Contributor Author

stas00 commented Jan 13, 2022

Yes, of course, as I it doesn't look that @jtboing is working on it. I could be wrong of course.

As I mentioned earlier it was Stella's recommendation so I don't really know what can be done here.

So please have a look and if things looks interesting to work on then by all means.

It's hard to work without a concrete spec.

The only spec I have is the doc I created here: https://github.com/bigscience-workshop/bigscience/blob/master/train/sanity-checks.md to do manual checks so perhaps this could be a good starting point - so that we won't need to do that manually.

@bhavitvyamalik
Copy link
Collaborator

Sure, I'll look into the doc and add relevant checks for those parameters. Will update here!

@bhavitvyamalik
Copy link
Collaborator

bhavitvyamalik commented Jan 15, 2022

I was thinking if we can add a function to validate args before this line:

_print_args(args)

  1. There’s a function by name of _check_arg_is_not_none which checks if required_args are not None. Can I make this check in validator only and remove _check_arg_is_not_none?
  2. I looked into the sanity check list and found that PP_SIZE, DP_SIZE were not available with args since they're dependent on get_3d_dimensions. So should I use it get_3d_dimensionshere to get these vars?
  3. How can I fetch path of ds_config_cl.json for (5) point in sanity check file?

@jtboing
Copy link

jtboing commented Jan 15, 2022

Sorry for the lack of response. Please take over this issue for me. Thanks.

@stas00
Copy link
Contributor Author

stas00 commented Jan 16, 2022

I was thinking if we can add a function to validate args before this line:

_print_args(args)

  1. There’s a function by name of _check_arg_is_not_none which checks if required_args are not None. Can I make this check in validator only and remove _check_arg_is_not_none?

Whatever you propose we don't want to remove anything but extend improve and refactor if need be.

  1. I looked into the sanity check list and found that PP_SIZE, DP_SIZE were not available with args since they're dependent on get_3d_dimensions. So should I use it get_3d_dimensionshere to get these vars?

why? those are available in args. e.g. args.tensor_model_parallel_size

  1. How can I fetch path of ds_config_cl.json for (5) point in sanity check file?

args.deepspeed_config? I think it gets parsed on the deepspeed side so perhaps it's not in args? the deepspeed engine object should definitely have it after deepspeed.initialize - not sure if it's too late for validation.

@bhavitvyamalik
Copy link
Collaborator

Exactly, my plan is to extend and provide most asserts into validate_args function instead of keeping them in parse_args

@bhavitvyamalik
Copy link
Collaborator

Can you give me write access for this repo (to raise the PR), @stas00? I made some basic changes and wanted your feedback for the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Good for newcomers Good Second Issue For harder tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants