VideoStudio: Novel framework by leveraging LLM for consistent and multi-scene video generation.
- Release the training code of VideoStuio-Img and VideoStudio-Vid.
- Release the complete inference pipeline for long video generation.
-
2024/07/05
Release inference code of independent componets in VideoStudio: LLM instructions for video script generation, the code and weights of VideoStudio-Img and VideoStudio-Vid.
Please install the python packages listed in the requirements.txt
Please download the huggingface models listed in weights/put-the-huggingface-models-in-this-folder and put them in to the weights
folder.
The folder organization is
└── weights
├── FireCRT/VideoStudio
│ ├── videostudio-img-encoder
│ ├── videostudio-vid
│ ├── videostudio-img-combine.bin
│ ├── ...
├── SG161222/Realistic_Vision_V4.0_noVAE
│ ├── ...
├── stabilityai/sd-vae-ft-mse
│ ├── ...
├── laion/CLIP-ViT-H-14-laion2B-s32B-b79K
│ ├── ...
├── THUDM/chatglm3-6b
│ ├── ...
└────
cd videostudio_img
bash infer_videostudio_img.sh
Results:
Foreground Image |
Background Image |
Combined Image |
cd videostudio_vid
bash infer_videostudio_vid.sh
Results:
Input Image |
Output Video |
cd script_generation
bash script_generation.sh
Please check Apache-2.0 license for details.
The code is built upon IPAdapter, U-2-Net and video generation pipeline in Diffusers .
If you use these models in your research, please cite:
@inproceedings{Long:ECCV24,
title={VideoStudio: Generating Consistent-Content and Multi-Scene Videos},
author={Fuchen Long, Zhaofan Qiu, Ting Yao and Tao Mei},
booktitle={ECCV},
year={2024}
}