Add support for vlm checkpoints conversion #1475

cklxx · 2026-01-21T04:03:45Z

Motivation

Enable conversion of Vision-Language-Model (VLM) FSDP checkpoints to Hugging Face format by selecting the correct HF model class based on the model config.

Description

Updated tools/convert_fsdp_to_hf.py to import AutoModelForImageTextToText and added _build_hf_model(config) which prints the detected config.model_type and returns either AutoModelForCausalLM or AutoModelForImageTextToText using trust_remote_code=True.

Testing

Ran linting and formatting checks: ruff check ., black --check ., and isort --check ., all passed.
Ran pytest, which failed during collection with ModuleNotFoundError: No module named 'slime' (test environment import issue).

Add VLM support to FSDP conversion

7627680

cklxx mentioned this pull request Jan 21, 2026

tools/convert_fsdp_to_hf.py Don't support VLM model #1416

Open

cklxx force-pushed the add-support-for-vlm-checkpoints-conversion branch from 499c814 to 7627680 Compare January 21, 2026 04:06

Merge branch 'main' into add-support-for-vlm-checkpoints-conversion

ace08d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for vlm checkpoints conversion #1475

Add support for vlm checkpoints conversion #1475

Uh oh!

cklxx commented Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add support for vlm checkpoints conversion #1475

Are you sure you want to change the base?

Add support for vlm checkpoints conversion #1475

Uh oh!

Conversation

cklxx commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cklxx commented Jan 21, 2026 •

edited

Loading