Skip to content

An innovative method designed to augment the capabilities of existing video diffusion models

Notifications You must be signed in to change notification settings

liujianzhi/EchoReel

Repository files navigation

EchoReel: Enhancing Action Generation of Existing Video Diffusion Models

     

University of Electronic Science and Technology of China

An innovative method designed to augment the capabilities of existing video diffusion models that can:
1️⃣ utilize multiple reference videos to achieve a broader spectrum of action imitation and generate novel actions without fine-tuning;
2️⃣ distill effective and related visual motion features instead of replicating the referred content.

"Imitation is the sincerest form of flattery that mediocrity can pay to greatness." — Oscar Wilde

✌️ Results

input text Original VideoCrafter2 + EchoReel
"A man is studying in the library"
"A man is skiing"
"A man is running"
"Couple walking on the beach"
"A man is carving a stone statue"

📝 Changelog

  • [2024.4.21] Release pretrain weight
  • [2024.3.18] Release train and inference code

⏳ TODO

  • Release code of LVDM text-to-video with EchoReel
  • Release training code
  • Release pretrained weight
  • Release image-to-video VideoCrafter code with EchoReel

⚙️ Setup

Please prepare .json data in the following format:

[
	{
		"input_text": ...,
		"gt_video_path": ...,
		"reference_text": ...,
		"reference_video_path": ...
	},
    ...
]

Install Environment via Anaconda

conda create -n EchoReel python=3.10.13
conda activate EchoReel
pip install -r requirements.txt

💫 For Try

Please ensure the pretrained weights are downloaded from our Hugging Face repository and subsequently placed in the designated 'checkpoint' folder. To optimize functionality, it is strongly advised to download the WebVid .csv file into the specified 'dataset' directory, thereby enabling seamless automatic reference video selection.

mkdir checkpoint
cd checkpoint
wget https://huggingface.co/cscrisp/EchoReel/resolve/main/checkpoint/checkpoint.pt
cd ..
mkdir dataset
cd datset
wget wget http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv
cd ..
python gr.py

💫 For Train

% use original LVDM pretrain weight to initialize model
wget -O models/t2v/model.ckpt https://huggingface.co/Yingqing/LVDM/resolve/main/lvdm_short/t2v.ckpt
bash train_EchoReel.sh

💫 For Sample

bash sample_EchoReel.sh

🔮 Pipeline

😉 Citation

@article{Liu2024EchoReel,
      title={EchoReel: Enhancing Action Generation of Existing Video Diffusion Models}, 
      author={Jianzhi Liu, Junchen Zhu, Lianli Gao, Jingkuan Song},
      year={2024},
      eprint={2403.11535},
      archivePrefix={arXiv},
}

🤗 Acknowledgements

We built our code partially based on latent video diffusion models. Thanks for their wonderful work!

About

An innovative method designed to augment the capabilities of existing video diffusion models

Resources

Stars

Watchers

Forks