Skip to content

Conversation

@cklxx
Copy link
Contributor

@cklxx cklxx commented Jan 21, 2026

Motivation

Enable conversion of Vision-Language-Model (VLM) FSDP checkpoints to Hugging Face format by selecting the correct HF model class based on the model config.

Description

Updated tools/convert_fsdp_to_hf.py to import AutoModelForImageTextToText and added _build_hf_model(config) which prints the detected config.model_type and returns either AutoModelForCausalLM or AutoModelForImageTextToText using trust_remote_code=True.

Testing

Ran linting and formatting checks: ruff check ., black --check ., and isort --check ., all passed.
Ran pytest, which failed during collection with ModuleNotFoundError: No module named 'slime' (test environment import issue).

@cklxx cklxx force-pushed the add-support-for-vlm-checkpoints-conversion branch from 499c814 to 7627680 Compare January 21, 2026 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant