Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

Open
Vintage-Echo opened this issue Mar 12, 2025 · 5 comments

Comments

@Vintage-Echo
Copy link

Great job!

When I try to perform ViT weight conversion on the GPU-Megatron framework, I do not find the corresponding script.

> bash scripts/megatron/convert_model_intern_vit.sh

It’s seem to have some problems when I use other scripts. Could you provide the verified script for internViT-convert? Thanks!

> RuntimeError: Error(s) in loading state_dict for InternViTModel

@shenyunhang
Copy link
Collaborator

HI @Vintage-Echo
Please use this script to convert the vision model.

bash scripts/modellink/convert_model_intern_vit.sh

The vision model parts are the same in the Megatron and Modellink training framework.

@Vintage-Echo
Copy link
Author

Vintage-Echo commented Mar 12, 2025

Yes, I have used this script before, but the problem still occurs. Is the environment configuration incorrect? When I used the docker shenyunhang/pytorch:24.11-py3_2024-1224,it's still occurd the similiar error:

RuntimeError:Error(s) in loading state_dict for GPTVLModel.

Where should I start to locate the problem? Looking forward to your reply, thanks a lot.

@shenyunhang
Copy link
Collaborator

I have verified this script and it works well.
And nothing should be related to GPTVLModel in the convert_model_intern_vit.sh.
It would be better to provide more logs to locate the problem.

@Vintage-Echo Vintage-Echo changed the title script for internViT-convert RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel Mar 13, 2025
@Vintage-Echo
Copy link
Author

Vintage-Echo commented Mar 13, 2025

Yes, the problem might be caused by the loading of Qwen2.5-14B-Instruct_tp8pp1_te. The log file is as follows:

> [rank6]: Traceback (most recent call last):
> [rank6]:   File "/docker-longvita/data_local//Long-VITA//long_vita_megatron/pretrain_long_vita.py", line 1054, in <module>
> [rank6]:     pretrain(train_valid_test_datasets_provider,
> [rank6]:   File "/docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/training.py", line 227, in pretrain
> [rank6]:     model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
> [rank6]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
> [rank6]:   File "docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/training.py", line 517, in setup_model_and_optimizer
> [rank6]:     args.iteration, args.num_floating_point_operations_so_far = load_checkpoint(
> [rank6]:                                                                 ^^^^^^^^^^^^^^^^
> [rank6]:   File "docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/checkpointing.py", line 767, in load_checkpoint
> [rank6]:     model[0].load_state_dict(state_dict['model'], strict=strict)
> [rank6]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
> [rank6]:     raise RuntimeError(
> [rank6]: RuntimeError: Error(s) in loading state_dict for GPTVLModel:
> [rank6]: 	Unexpected key(s) in state_dict: "decoder.layers.1.self_attention.linear_proj.weight", "decoder.layers.1.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.1.self_attention.linear_qkv.weight", "decoder.layers.1.self_attention.linear_qkv.bias", "decoder.layers.1.mlp.linear_fc1.layer_norm_weight", "decoder.layers.1.mlp.linear_fc1.weight", "decoder.layers.1.mlp.linear_fc2.weight", "decoder.layers.2.self_attention.linear_proj.weight", "decoder.layers.2.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.2.self_attention.linear_qkv.weight", "decoder.layers.2.self_attention.linear_qkv.bias", "decoder.layers.2.mlp.linear_fc1.layer_norm_weight", "decoder.layers.2.mlp.linear_fc1.weight", "decoder.layers.2.mlp.linear_fc2.weight", "decoder.layers.3.self_attention.linear_proj.weight", "decoder.layers.3.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.3.self_attention.linear_qkv.weight", "decoder.layers.3.self_attention.linear_qkv.bias", "decoder.layers.3.mlp.linear_fc1.layer_norm_weight", "decoder.layers.3.mlp.linear_fc1.weight", "decoder.layers.3.mlp.linear_fc2.weight",

I temporarily bypass this problem by setting it to null, but I still do not know why the error is reported.

@shenyunhang
Copy link
Collaborator

shenyunhang commented Mar 15, 2025

It seems the model is not built with Transformer Engine, but the weights are converted for the Transformer Engine.
You can find the model structure in the log file and check for mismatches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants