RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

Vintage-Echo · 2025-03-12T08:03:53Z

Great job!

When I try to perform ViT weight conversion on the GPU-Megatron framework, I do not find the corresponding script.

> bash scripts/megatron/convert_model_intern_vit.sh

It’s seem to have some problems when I use other scripts. Could you provide the verified script for internViT-convert? Thanks!

> RuntimeError: Error(s) in loading state_dict for InternViTModel

The text was updated successfully, but these errors were encountered:

shenyunhang · 2025-03-12T09:01:34Z

HI @Vintage-Echo
Please use this script to convert the vision model.

bash scripts/modellink/convert_model_intern_vit.sh

The vision model parts are the same in the Megatron and Modellink training framework.

Vintage-Echo · 2025-03-12T09:52:47Z

Yes, I have used this script before, but the problem still occurs. Is the environment configuration incorrect? When I used the docker shenyunhang/pytorch:24.11-py3_2024-1224，it's still occurd the similiar error:

RuntimeError:Error(s) in loading state_dict for GPTVLModel.

Where should I start to locate the problem? Looking forward to your reply, thanks a lot.

shenyunhang · 2025-03-12T10:49:01Z

I have verified this script and it works well.
And nothing should be related to GPTVLModel in the convert_model_intern_vit.sh.
It would be better to provide more logs to locate the problem.

Vintage-Echo · 2025-03-13T11:50:12Z

Yes, the problem might be caused by the loading of Qwen2.5-14B-Instruct_tp8pp1_te. The log file is as follows:

> [rank6]: Traceback (most recent call last):
> [rank6]:   File "/docker-longvita/data_local//Long-VITA//long_vita_megatron/pretrain_long_vita.py", line 1054, in <module>
> [rank6]:     pretrain(train_valid_test_datasets_provider,
> [rank6]:   File "/docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/training.py", line 227, in pretrain
> [rank6]:     model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
> [rank6]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
> [rank6]:   File "docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/training.py", line 517, in setup_model_and_optimizer
> [rank6]:     args.iteration, args.num_floating_point_operations_so_far = load_checkpoint(
> [rank6]:                                                                 ^^^^^^^^^^^^^^^^
> [rank6]:   File "docker-longvita/data_local/Long-VITA/third_party/Megatron-LM_core_r0.7.0/megatron/training/checkpointing.py", line 767, in load_checkpoint
> [rank6]:     model[0].load_state_dict(state_dict['model'], strict=strict)
> [rank6]:   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
> [rank6]:     raise RuntimeError(
> [rank6]: RuntimeError: Error(s) in loading state_dict for GPTVLModel:
> [rank6]: 	Unexpected key(s) in state_dict: "decoder.layers.1.self_attention.linear_proj.weight", "decoder.layers.1.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.1.self_attention.linear_qkv.weight", "decoder.layers.1.self_attention.linear_qkv.bias", "decoder.layers.1.mlp.linear_fc1.layer_norm_weight", "decoder.layers.1.mlp.linear_fc1.weight", "decoder.layers.1.mlp.linear_fc2.weight", "decoder.layers.2.self_attention.linear_proj.weight", "decoder.layers.2.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.2.self_attention.linear_qkv.weight", "decoder.layers.2.self_attention.linear_qkv.bias", "decoder.layers.2.mlp.linear_fc1.layer_norm_weight", "decoder.layers.2.mlp.linear_fc1.weight", "decoder.layers.2.mlp.linear_fc2.weight", "decoder.layers.3.self_attention.linear_proj.weight", "decoder.layers.3.self_attention.linear_qkv.layer_norm_weight", "decoder.layers.3.self_attention.linear_qkv.weight", "decoder.layers.3.self_attention.linear_qkv.bias", "decoder.layers.3.mlp.linear_fc1.layer_norm_weight", "decoder.layers.3.mlp.linear_fc1.weight", "decoder.layers.3.mlp.linear_fc2.weight",

I temporarily bypass this problem by setting it to null, but I still do not know why the error is reported.

shenyunhang · 2025-03-15T07:51:46Z

It seems the model is not built with Transformer Engine, but the weights are converted for the Transformer Engine.
You can find the model structure in the log file and check for mismatches.

Vintage-Echo changed the title ~~script for internViT-convert~~ RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

Vintage-Echo commented Mar 12, 2025

shenyunhang commented Mar 12, 2025

Vintage-Echo commented Mar 12, 2025 •

edited

Loading

shenyunhang commented Mar 12, 2025

Vintage-Echo commented Mar 13, 2025 •

edited

Loading

shenyunhang commented Mar 15, 2025 •

edited

Loading

RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

RuntimeError: Error(s) in loading state_dict for InternViTModel/GPTVLModel #8

Comments

Vintage-Echo commented Mar 12, 2025

shenyunhang commented Mar 12, 2025

Vintage-Echo commented Mar 12, 2025 • edited Loading

shenyunhang commented Mar 12, 2025

Vintage-Echo commented Mar 13, 2025 • edited Loading

shenyunhang commented Mar 15, 2025 • edited Loading

Vintage-Echo commented Mar 12, 2025 •

edited

Loading

Vintage-Echo commented Mar 13, 2025 •

edited

Loading

shenyunhang commented Mar 15, 2025 •

edited

Loading