Skip to content

Blending Custom Photos with Video Diffusion Transformers

License

Notifications You must be signed in to change notification settings

feizc/Ingredients

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arXiv Model Dataset

This repository is the official implementation of Ingredients, a powerful way to customize video creations by incorporating multiple specific identity (ID) photos, with advanced video diffusion Transformers. This is a research project, and it is recommended to try advanced products:

📷 1. Gallery

case.mp4

⚙️ 2. Environments

We recommend the requirements as follows.

conda create -n ingredients python=3.11.0
conda activate ingredients
pip install -r requirements.txt

The weights of model are available at 🤗HuggingFace.

🗝️ 3. Inference

We provide the inference scripts inference.py for simple testing. Run the command as examples:

python infer.py \
    --prompt "Two men in half bodies, are seated in a dimly lit room, possibly an office or meeting room, with a formal atmosphere." \
    --model_path "\path\to\model" \
    --seed 2025 \
    --img_file_path 'asserts/0.jpg' 'asserts/1.jpg'

We also include the evaluation metrics code at metric folder and evaluation data at Dataset for results comparison in multi-id customization tasks.

Similar to ConsisID, Ingredients also has high requirements for prompt quality. We suggest referring to formation in the link.

Gradio Web UI

Highly recommend trying out our web demo by the following command, which incorporates all features currently supported by Ingredients.

python app.py

image

⏰ 4. Training

Coming soon, including multi-stage training scripts and multi-ID text-video datasets.

You can prepare the video-text pair data as formation and our experiments can be repeated by simply run the training scripts as:

# For stage 1
bash train_face.sh
# For stage 2
bash train_router.sh

🚀 5. Cite

If you find this work useful for your research and applications, please cite us using this BibTeX:

@article{fei2025ingredients,
    title={Ingredients: Blending Custom Photos with Video Diffusion Transformers},
    author={Fei, Zhengcong and Li, Debang and Qiu, Di and Yu, Changqian and Fan, Mingyuan},
    journal={arXiv preprint arXiv:2501.01790},
    year={2025}
}

For any question, please feel free to open an issue.

Acknowledgement

This project wouldn't be possible without the following open-sourced repositories: CogVideoX, ConsisID, Uniportrait, and Hunyuan Video.

About

Blending Custom Photos with Video Diffusion Transformers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published