Skip to content

CVL-UESTC/Internal-Guidance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Guiding a Diffusion Transformer with the Internal Dynamics of Itself (IG)
Official PyTorch Implementation

Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Xingyu Zhou¹, Qifan Li¹, Xiaobin Hu², Hai Chen3,4, Shuhang Gu¹*
1University of Electronic Science and Technology of China 2National University of Singapore
3Sun Yat-sen University 4North China Institute of Computer Systems Engineering
*Corresponding Author

LightningDiT+IG samples

💥 News

  • [2025.12.31] We have released the paper and code of IG.

🌟 Highlight

  • 🔥New SOTA on 256 × 256 ImageNet generation: LightningDiT-XL/1 + IG sets a new state of the art with FID = 1.07 (random sampling FID = 1.19) on ImageNet, while achieving FID = 1.24 (random sampling FID = 1.34) without classifier-free guidance.

  • Simple enough, powerful enough: We present Internal Guidance (IG), a simple yet powerful guidance mechanism for Diffusion Transformers. Just requiring an additional intermediate supervision is all that is needed.

  • Intermediate supervision: Only a simple intermediate supervision can achieve a similar effect to the additional designed self-supervised learning regularization.

  • Improved Performance: IG accelerates training and improves generation performance for DiTs, SiTs and LightningDiT.

📝 Results

  • State-of-the-art Performance on ImageNet 256x256 with FID=1.19 (random sampling). Random sampling
  • State-of-the-art Performance on ImageNet 256x256 with FID=1.07 (uniform balanced sampling). Uniform balanced sampling

🏡 Environment Setup

conda create -n IG python=3.12 -y
conda activate IG
pip install -r requirements.txt

📜 Dataset Preparation

Currently, we provide experiments for ImageNet. You can place the data that you want and can specify it via --data-dir arguments in training scripts.
Note that we preprocess the data for faster training. Please refer to preprocessing guide for SiTs and README.md for LightningDiTs for detailed guidance.

🔥 Training

Here we provide the training code for SiTs and LightningDiTs.

5.1.Training with SiT + IG
cd SiT
accelerate launch --config_file configs/default.yaml train.py \
  --mixed-precision="fp16" \
  --seed=0 \
  --path-type="linear" \
  --prediction="v" \
  --resolution=256 \
  --batch-size=32 \
  --weighting="uniform" \
  --model="SiT-XL/2" \
  --encoder-depth=8 \
  --output-dir="exps" \
  --exp-name="sitxl-ab820-t0.2-res256" \
  --data-dir=[YOUR_DATA_PATH]

Then this script will automatically create the folder in exps to save logs,samples, and checkpoints. You can adjust the following options:

  • --models: Choosing from [SiT-B/2, SiT-L/2, SiT-XL/2]
  • --encoder-depth: Intermediate output block layer for the auxiliary supervision
  • --output-dir: Any directory that you want to save checkpoints, samples, and logs
  • --exp-name: Any string name (the folder will be created under output-dir)
  • --batch-size: The local batch size (by default we use 1 node of 8 GPUs), you need to adjust this value according to your GPU number to make total batch size of 256
5.2.Training with LightningDiT + SRA
cd LightningDiT
bash run_train.sh configs/lightningdit_xl_vavae_f16d32.yaml

Then this script will automatically create the folder in output to save logs and checkpoints. You can adjust the following options by the original LightningDiT.

🌠 Evaluation

Here we provide the generating code (random sampling) for SiTs and LightningDiTs to get the samples for evaluation. (and the .npz file can be used for ADM evaluation suite) through the following script:

You can download our pretrained model here:

Model Image Resolution Epochs FID-50K Inception Score
SiT-XL/2 + IG 256x256 800 1.46 265.7
LightningDiT-XL/1 + IG 256x256 680 1.19 269.0
Sampling with SiT + IG
cd SiT
bash gen.sh

Note that there are several options in gen.sh file that you need to complete:

  • SAMPLE_DIR: Base directory to save the generated images and .npz file
  • CKPT: Checkpoint path (This can also be your downloaded local file of the ckpt file we provide above)
Sampling with LightningDiT + IG
cd LightningDiT
bash run_inference.sh configs/lightningdit_xl_vavae_f16d32.yaml

📣 Note

It's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release as well as the difference of the hardware facility. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us.

🤝🏻 Acknowledgement

This code is mainly built upon SRA, LightningDiT, RAE repositories. Thanks for their solid work!

🌺 Citation

If you find IG useful, please kindly cite our paper:

@article{zhou2025guiding,
  title={Guiding a Diffusion Transformer with the Internal Dynamics of Itself},
  author={Zhou, Xingyu and Li, Qifan and Hu, Xiaobin and Chen, Hai and Gu, Shuhang},
  journal={arXiv preprint arXiv:2512.24176},
  year={2025}
}

About

Guiding a Diffusion Transformer with the Internal Dynamics of Itself (IG)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published