Skip to content

Commit 60ab7f0

Browse files
authored
Webdataset improvements; CogView4 example with The Simpsons webdataset (#305)
* enable online dataprocessing without forcing precomputation * add tests * make style * update * update * update * update * update * update * update * update * Update README.md * duplicate changes in video for wds improvements * make style
1 parent 41c3c40 commit 60ab7f0

File tree

10 files changed

+485
-46
lines changed

10 files changed

+485
-46
lines changed

README.md

+22-17
Original file line numberDiff line numberDiff line change
@@ -2,31 +2,23 @@
22

33
Finetrainers is a work-in-progress library to support (accessible) training of diffusion models. Our first priority is to support LoRA training for all popular video models in [Diffusers](https://github.com/huggingface/diffusers), and eventually other methods like controlnets, control-loras, distillation, etc.
44

5-
`cogvideox-factory` was renamed to `finetrainers`. If you're looking to train CogVideoX or Mochi with the legacy training scripts, please refer to [this](./training/README.md) README instead. Everything in the `training/` directory will be eventually moved and supported under `finetrainers`.
6-
75
<table align="center">
86
<tr>
97
<td align="center"><video src="https://github.com/user-attachments/assets/aad07161-87cb-4784-9e6b-16d06581e3e5">Your browser does not support the video tag.</video></td>
8+
<td align="center"><video src="https://github.com/user-attachments/assets/c23d53e2-b422-4084-9156-3fce9fd01dad">Your browser does not support the video tag.</video></td>
9+
</tr>
10+
<tr>
11+
<th align="center">CogVideoX LoRA training as the first iteration of this project</th>
12+
<th align="center">Replication of PikaEffects</th>
1013
</tr>
1114
</table>
1215

13-
## News
14-
15-
- 🔥 **2025-03-03**: Wan T2V support added!
16-
- 🔥 **2025-03-03**: We have shipped a complete refactor to support multi-backend distributed training, better precomputation handling for big datasets, model specification format (externally usable for training custom models), FSDP & more.
17-
- 🔥 **2025-02-12**: We have shipped a set of tooling to curate small and high-quality video datasets for fine-tuning. See [video-dataset-scripts](https://github.com/huggingface/video-dataset-scripts) documentation page for details!
18-
- 🔥 **2025-02-12**: Check out [eisneim/ltx_lora_training_i2v_t2v](https://github.com/eisneim/ltx_lora_training_i2v_t2v/)! It builds off of `finetrainers` to support image to video training for LTX-Video and STG guidance for inference.
19-
- 🔥 **2025-01-15**: Support for naive FP8 weight-casting training added! This allows training HunyuanVideo in under 24 GB upto specific resolutions.
20-
- 🔥 **2025-01-13**: Support for T2V full-finetuning added! Thanks to [@ArEnSc](https://github.com/ArEnSc) for taking up the initiative!
21-
- 🔥 **2025-01-03**: Support for T2V LoRA finetuning of [CogVideoX](https://huggingface.co/docs/diffusers/main/api/pipelines/cogvideox) added!
22-
- 🔥 **2024-12-20**: Support for T2V LoRA finetuning of [Hunyuan Video](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video) added! We would like to thank @SHYuanBest for his work on a training script [here](https://github.com/huggingface/diffusers/pull/10254).
23-
- 🔥 **2024-12-18**: Support for T2V LoRA finetuning of [LTX Video](https://huggingface.co/docs/diffusers/main/api/pipelines/ltx_video) added!
24-
2516
## Table of Contents
2617

2718
- [Quickstart](#quickstart)
19+
- [News](#news)
2820
- [Support Matrix](#support-matrix)
29-
- [Featured Projects](#featured-projects)
21+
- [Featured Projects](#featured-projects-)
3022
- [Acknowledgements](#acknowledgements)
3123

3224
## Quickstart
@@ -40,7 +32,7 @@ git fetch --all --tags
4032
git checkout tags/v0.0.1
4133
```
4234

43-
Follow the instructions mentioned in the [README](https://github.com/a-r-r-o-w/finetrainers/tree/v0.0.1) for the release tag.
35+
Follow the instructions mentioned in the [README](https://github.com/a-r-r-o-w/finetrainers/tree/v0.0.1) for the latest stable release.
4436

4537
#### Using the main branch
4638

@@ -59,6 +51,19 @@ Please checkout [`docs/models`](./docs/models/) and [`examples/training`](./exam
5951
> [!IMPORTANT]
6052
> It is recommended to use Pytorch 2.5.1 or above for training. Previous versions can lead to completely black videos, OOM errors, or other issues and are not tested. For fully reproducible training, please use the same environment as mentioned in [environment.md](./docs/environment.md).
6153
54+
## News
55+
56+
- 🔥 **2025-03-07**: CogView4 support added!
57+
- 🔥 **2025-03-03**: Wan T2V support added!
58+
- 🔥 **2025-03-03**: We have shipped a complete refactor to support multi-backend distributed training, better precomputation handling for big datasets, model specification format (externally usable for training custom models), FSDP & more.
59+
- 🔥 **2025-02-12**: We have shipped a set of tooling to curate small and high-quality video datasets for fine-tuning. See [video-dataset-scripts](https://github.com/huggingface/video-dataset-scripts) documentation page for details!
60+
- 🔥 **2025-02-12**: Check out [eisneim/ltx_lora_training_i2v_t2v](https://github.com/eisneim/ltx_lora_training_i2v_t2v/)! It builds off of `finetrainers` to support image to video training for LTX-Video and STG guidance for inference.
61+
- 🔥 **2025-01-15**: Support for naive FP8 weight-casting training added! This allows training HunyuanVideo in under 24 GB upto specific resolutions.
62+
- 🔥 **2025-01-13**: Support for T2V full-finetuning added! Thanks to [@ArEnSc](https://github.com/ArEnSc) for taking up the initiative!
63+
- 🔥 **2025-01-03**: Support for T2V LoRA finetuning of [CogVideoX](https://huggingface.co/docs/diffusers/main/api/pipelines/cogvideox) added!
64+
- 🔥 **2024-12-20**: Support for T2V LoRA finetuning of [Hunyuan Video](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video) added! We would like to thank @SHYuanBest for his work on a training script [here](https://github.com/huggingface/diffusers/pull/10254).
65+
- 🔥 **2024-12-18**: Support for T2V LoRA finetuning of [LTX Video](https://huggingface.co/docs/diffusers/main/api/pipelines/ltx_video) added!
66+
6267
## Support Matrix
6368

6469
> [!NOTE]
@@ -72,7 +77,7 @@ Please checkout [`docs/models`](./docs/models/) and [`examples/training`](./exam
7277
| [HunyuanVideo](./docs/models/hunyuan_video.md) | Text-to-Video | 32 GB | OOM |
7378
| [CogVideoX-5b](./docs/models/cogvideox.md) | Text-to-Video | 18 GB | 53 GB |
7479
| [Wan](./docs/models/wan.md) | Text-to-Video | TODO | TODO |
75-
| [CogView4](./docs/models/cogview4.md) | Text-to-Video | TODO | TODO |
80+
| [CogView4](./docs/models/cogview4.md) | Text-to-Image | TODO | TODO |
7681

7782
</div>
7883

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# CogView4-6B The Simpsons dataset
2+
3+
This example is only an experiment to verify if webdataset loading and streaming from the HF Hub works as expected. Do not expect meaningful results.
4+
5+
The dataset used for testing is available at [`bigdata-pw/TheSimpsons`](https://huggingface.co/datasets/bigdata-pw/TheSimpsons).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
#!/bin/bash
2+
3+
set -e -x
4+
5+
# export TORCH_LOGS="+dynamo,recompiles,graph_breaks"
6+
# export TORCHDYNAMO_VERBOSE=1
7+
export WANDB_MODE="offline"
8+
export NCCL_P2P_DISABLE=1
9+
export TORCH_NCCL_ENABLE_MONITORING=0
10+
export FINETRAINERS_LOG_LEVEL="INFO"
11+
12+
# Finetrainers supports multiple backends for distributed training. Select your favourite and benchmark the differences!
13+
# BACKEND="accelerate"
14+
BACKEND="ptd"
15+
16+
# In this setting, I'm using all 8 GPUs on a 8-GPU node for training
17+
NUM_GPUS=8
18+
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
19+
20+
# Check the JSON files for the expected JSON format
21+
TRAINING_DATASET_CONFIG="examples/training/sft/cogview4/the_simpsons/training.json"
22+
VALIDATION_DATASET_FILE="examples/training/sft/cogview4/the_simpsons/validation.json"
23+
24+
# Depending on how many GPUs you have available, choose your degree of parallelism and technique!
25+
DDP_1="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 1 --dp_shards 1 --cp_degree 1 --tp_degree 1"
26+
DDP_2="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 2 --dp_shards 1 --cp_degree 1 --tp_degree 1"
27+
DDP_4="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 4 --dp_shards 1 --cp_degree 1 --tp_degree 1"
28+
FSDP_2="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 1 --dp_shards 2 --cp_degree 1 --tp_degree 1"
29+
FSDP_4="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 1 --dp_shards 4 --cp_degree 1 --tp_degree 1"
30+
HSDP_2_2="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 2 --dp_shards 2 --cp_degree 1 --tp_degree 1"
31+
HSDP_4_2="--parallel_backend $BACKEND --pp_degree 1 --dp_degree 4 --dp_shards 2 --cp_degree 1 --tp_degree 1"
32+
33+
# Parallel arguments
34+
parallel_cmd=(
35+
$HSDP_4_2
36+
)
37+
38+
# Model arguments
39+
model_cmd=(
40+
--model_name "cogview4"
41+
--pretrained_model_name_or_path "THUDM/CogView4-6B"
42+
)
43+
44+
# Dataset arguments
45+
# Here, we know that the dataset size if about ~80 images. In `training.json`, we duplicate the same
46+
# dataset 3 times for multi-resolution training. This gives us a total of about 240 images. Since
47+
# we're using 2 GPUs for training, we can split the data into 120 images per GPU and precompute
48+
# all embeddings at once, instead of doing it on-the-fly which would be slower (the ideal usecase
49+
# of not using `--precomputation_once` is when you're training on large datasets)
50+
dataset_cmd=(
51+
--dataset_config $TRAINING_DATASET_CONFIG
52+
--dataset_shuffle_buffer_size 32
53+
)
54+
55+
# Dataloader arguments
56+
dataloader_cmd=(
57+
--dataloader_num_workers 0
58+
)
59+
60+
# Diffusion arguments
61+
diffusion_cmd=(
62+
--flow_weighting_scheme "logit_normal"
63+
)
64+
65+
# Training arguments
66+
# We target just the attention projections layers for LoRA training here.
67+
# You can modify as you please and target any layer (regex is supported)
68+
training_cmd=(
69+
--training_type "lora"
70+
--seed 42
71+
--batch_size 1
72+
--train_steps 5000
73+
--rank 128
74+
--lora_alpha 128
75+
--target_modules "transformer_blocks.*(to_q|to_k|to_v|to_out.0)"
76+
--gradient_accumulation_steps 1
77+
--gradient_checkpointing
78+
--checkpointing_steps 1000
79+
--checkpointing_limit 2
80+
# --resume_from_checkpoint 3000
81+
--enable_slicing
82+
--enable_tiling
83+
)
84+
85+
# Optimizer arguments
86+
optimizer_cmd=(
87+
--optimizer "adamw"
88+
--lr 1e-5
89+
--lr_scheduler "constant_with_warmup"
90+
--lr_warmup_steps 2000
91+
--lr_num_cycles 1
92+
--beta1 0.9
93+
--beta2 0.99
94+
--weight_decay 1e-4
95+
--epsilon 1e-8
96+
--max_grad_norm 1.0
97+
)
98+
99+
# Validation arguments
100+
validation_cmd=(
101+
--validation_dataset_file "$VALIDATION_DATASET_FILE"
102+
--validation_steps 500
103+
)
104+
105+
# Miscellaneous arguments
106+
miscellaneous_cmd=(
107+
--tracker_name "finetrainers-cogview4"
108+
--output_dir "/fsx/aryan/cogview4"
109+
--init_timeout 600
110+
--nccl_timeout 600
111+
--report_to "wandb"
112+
)
113+
114+
# Execute the training script
115+
if [ "$BACKEND" == "accelerate" ]; then
116+
117+
ACCELERATE_CONFIG_FILE=""
118+
if [ "$NUM_GPUS" == 1 ]; then
119+
ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_1.yaml"
120+
elif [ "$NUM_GPUS" == 2 ]; then
121+
ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_2.yaml"
122+
elif [ "$NUM_GPUS" == 4 ]; then
123+
ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_4.yaml"
124+
elif [ "$NUM_GPUS" == 8 ]; then
125+
ACCELERATE_CONFIG_FILE="accelerate_configs/uncompiled_8.yaml"
126+
fi
127+
128+
accelerate launch --config_file "$ACCELERATE_CONFIG_FILE" --gpu_ids $CUDA_VISIBLE_DEVICES train.py \
129+
"${parallel_cmd[@]}" \
130+
"${model_cmd[@]}" \
131+
"${dataset_cmd[@]}" \
132+
"${dataloader_cmd[@]}" \
133+
"${diffusion_cmd[@]}" \
134+
"${training_cmd[@]}" \
135+
"${optimizer_cmd[@]}" \
136+
"${validation_cmd[@]}" \
137+
"${miscellaneous_cmd[@]}"
138+
139+
elif [ "$BACKEND" == "ptd" ]; then
140+
141+
export CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES
142+
143+
torchrun \
144+
--standalone \
145+
--nnodes=1 \
146+
--nproc_per_node=$NUM_GPUS \
147+
--rdzv_backend c10d \
148+
--rdzv_endpoint="localhost:0" \
149+
train.py \
150+
"${parallel_cmd[@]}" \
151+
"${model_cmd[@]}" \
152+
"${dataset_cmd[@]}" \
153+
"${dataloader_cmd[@]}" \
154+
"${diffusion_cmd[@]}" \
155+
"${training_cmd[@]}" \
156+
"${optimizer_cmd[@]}" \
157+
"${validation_cmd[@]}" \
158+
"${miscellaneous_cmd[@]}"
159+
fi
160+
161+
echo -ne "-------------------- Finished executing script --------------------\n\n"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
"datasets": [
3+
{
4+
"data_root": "bigdata-pw/TheSimpsons",
5+
"dataset_type": "image",
6+
"id_token": "SMPSN",
7+
"image_resolution_buckets": [
8+
[960, 528],
9+
[720, 528],
10+
[720, 480]
11+
],
12+
"reshape_mode": "bicubic",
13+
"remove_common_llm_caption_prefixes": true,
14+
"caption_options": {
15+
"column_names": ["caption.txt", "detailed_caption.txt", "more_detailed_caption.txt"],
16+
"weights": {
17+
"caption.txt": 0.2,
18+
"detailed_caption.txt": 0.6,
19+
"more_detailed_caption.txt": 0.2
20+
}
21+
}
22+
}
23+
]
24+
}

0 commit comments

Comments
 (0)