Environment Setup
conda env create -f environment.yaml
conda activate story
External Package
# Use Lavis BLIP2 for Text-Image alignment evaluation
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
cp -r lavis eval/lavis
Download dataset and put them under data/flintstones
and data/pororo
First Stage: Char-LDM
bash scripts/train_ldm.sh DATASET
Prepare CLIP embedding after first stage
bash scripts/clip.sh DATASET CKPT_PATH
Second Stage: Align LLM with Char-LDM, you can choose OPT or Llama2
bash scripts/train_llm_v2.sh DATASET LLM_CKPT 1st_CKPT_PATH
First prepare finetuned weight of BLIP2 on FlintStonesSV and PororoSV. Finetune BLIP2 by yourself or use our provided finetuned checkpoint captioner.pth
under each dataset folder: [BLIP2 FlintStonesSV], [BLIP2 PororoSV].
We also provide the pretrained character or background classifier for evaluation: [FlintStonesSV character] [FlintStonesSV background] [PororoSV character].
Reproduce results using our model checkpoints:
FlintStonesSV: [First Stage] [Second Stage (OPT)] [Second Stage (Llama2)]
PororoSV: [First Stage] [Second Stage (OPT)]
To use Llama2, please first download the Llama2 checkpoints from Llama2. Then, in the 2nd checkpoints folder we provided, update the "llm_model" field in both args.json
and model_args.json
to the path of your local Llama2 folder.
# First Stage Evaluation
bash scripts/eval.sh DATASET 1st_CKPT_PATH
# Second Stage Evaluation
bash scripts/eval_llm.sh DATASET 1st_CKPT_PATH 2nd_CKPT_PATH
- Training code
- Evaluation code
- Finetuned BLIP2 checkpoints for Evaluation
- Model checkpoints
Related repos BLIP2, FastComposer, GILL, SAM, DAAM
Baseline codes are from LDM, Story-LDM, StoryDALL-E