-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] MP-sharded checkpoint loading does not work for models except BLOOM #2442
Comments
Hi @pai4451, Thanks for pointing this out. I am gonna work on this and send a PR for you to try this for GPT-j. Just, I want to better understand the usage of this feature on your side. If what you need to do is just running GPT-J with MP, DeepSpeed-Inference already supports that. You only need to initialize the model on CPU and call the |
Hi @pai4451, Sorry for my delay. DeepSpeed-Inference is going through some reorganization, and we are working on a solution to make this feature supported. Best, |
Thanks @RezaYazdaniAminabadi, I am glad that the DeepSpeed team is working on this feature :D I saw a similar issue #2466, but it seems that the issue is closed. I will try the latest DeepSpeed version to check if I can I initialize GPT-J on CPU first and then let the |
Describe the bug
We currently want to run inference on EleutherAI/gpt-j-6B model with tensor parallelism on multiple GPUs, similarly to what BLOOM model does. But it seems the way DeepSpeed inference saves and loads the pre-shared checkpoints are not consistent and general enough for other models.
To Reproduce
I tried using the DeepSpeed inference script for BLOOM and modifying lines 140-141 to
and line 100 to
After the first run, on my 2x A6000 server, I was able to get the tensor parallelism-sharded checkpoints under the path
<some path to save mp checkpoint>
and a configuration fileds_inference_config.json
shown belowFor the second round, I undo the changes for lines 140-141 as well as
save_mp_checkpoint_path
and usecheckpoint=<some path to save mp checkpoint>/ds_inference_config.json
indeepspeed.init_inference
. This is the standard way for loading the preshared model for BLOOM which speeds up the loading process. However, the above code raises the following errorAssertionError: ds_model checkpoint type is not supported
, which comes from the following code that DeepSpeed inference loads the JSON state_dict
DeepSpeed/deepspeed/runtime/state_dict_factory.py
Line 34 in a524864
DeepSpeed/deepspeed/runtime/state_dict_factory.py
Line 42 in a524864
I also tried to change the type in
ds_inference_config.json
to BLOOM, since the only supported format for JSON checkpoints are BLOOM and Megatron, but this time the following line cause errorDeepSpeed/deepspeed/module_inject/load_checkpoint.py
Line 199 in a524864
AttributeError: 'NoneType' object has no attribute 'is_meta'
Is the preshared checkpoints loading feature only limited to the BLOOM model? How can I use tensor parallelism to split a single model to run on multiple GPUs?
Similar threads:
#2379
#2132
The text was updated successfully, but these errors were encountered: