Guiding a Diffusion Transformer with the Internal Dynamics of Itself (IG)
Official PyTorch Implementation
Guiding a Diffusion Transformer with the Internal Dynamics of Itself
Xingyu Zhou¹, Qifan Li¹, Xiaobin Hu², Hai Chen3,4, Shuhang Gu¹*
1University of Electronic Science and Technology of China 2National University of Singapore
3Sun Yat-sen University 4North China Institute of Computer Systems Engineering
*Corresponding Author
- [2025.12.31] We have released the paper and code of IG.
-
🔥New SOTA on 256 × 256 ImageNet generation: LightningDiT-XL/1 + IG sets a new state of the art with FID = 1.07 (random sampling FID = 1.19) on ImageNet, while achieving FID = 1.24 (random sampling FID = 1.34) without classifier-free guidance.
-
Simple enough, powerful enough: We present Internal Guidance (IG), a simple yet powerful guidance mechanism for Diffusion Transformers. Just requiring an additional intermediate supervision is all that is needed.
-
Intermediate supervision: Only a simple intermediate supervision can achieve a similar effect to the additional designed self-supervised learning regularization.
-
Improved Performance: IG accelerates training and improves generation performance for DiTs, SiTs and LightningDiT.
- State-of-the-art Performance on ImageNet 256x256 with FID=1.19 (random sampling).

- State-of-the-art Performance on ImageNet 256x256 with FID=1.07 (uniform balanced sampling).

conda create -n IG python=3.12 -y
conda activate IG
pip install -r requirements.txtCurrently, we provide experiments for ImageNet. You can place the data that you want and can specify it via --data-dir arguments in training scripts.
Note that we preprocess the data for faster training. Please refer to preprocessing guide for SiTs and README.md for LightningDiTs for detailed guidance.
Here we provide the training code for SiTs and LightningDiTs.
cd SiT
accelerate launch --config_file configs/default.yaml train.py \
--mixed-precision="fp16" \
--seed=0 \
--path-type="linear" \
--prediction="v" \
--resolution=256 \
--batch-size=32 \
--weighting="uniform" \
--model="SiT-XL/2" \
--encoder-depth=8 \
--output-dir="exps" \
--exp-name="sitxl-ab820-t0.2-res256" \
--data-dir=[YOUR_DATA_PATH]Then this script will automatically create the folder in exps to save logs,samples, and checkpoints. You can adjust the following options:
--models: Choosing from [SiT-B/2, SiT-L/2, SiT-XL/2]--encoder-depth: Intermediate output block layer for the auxiliary supervision--output-dir: Any directory that you want to save checkpoints, samples, and logs--exp-name: Any string name (the folder will be created underoutput-dir)--batch-size: The local batch size (by default we use 1 node of 8 GPUs), you need to adjust this value according to your GPU number to make total batch size of 256
cd LightningDiT
bash run_train.sh configs/lightningdit_xl_vavae_f16d32.yamlThen this script will automatically create the folder in output to save logs and checkpoints. You can adjust the following options by the original LightningDiT.
Here we provide the generating code (random sampling) for SiTs and LightningDiTs to get the samples for evaluation. (and the .npz file can be used for ADM evaluation suite) through the following script:
You can download our pretrained model here:
| Model | Image Resolution | Epochs | FID-50K | Inception Score |
|---|---|---|---|---|
| SiT-XL/2 + IG | 256x256 | 800 | 1.46 | 265.7 |
| LightningDiT-XL/1 + IG | 256x256 | 680 | 1.19 | 269.0 |
cd SiT
bash gen.shNote that there are several options in gen.sh file that you need to complete:
SAMPLE_DIR: Base directory to save the generated images and .npz fileCKPT: Checkpoint path (This can also be your downloaded local file of the ckpt file we provide above)
cd LightningDiT
bash run_inference.sh configs/lightningdit_xl_vavae_f16d32.yamlIt's possible that this code may not accurately replicate the results outlined in the paper due to potential human errors during the preparation and cleaning of the code for release as well as the difference of the hardware facility. If you encounter any difficulties in reproducing our findings, please don't hesitate to inform us.
This code is mainly built upon SRA, LightningDiT, RAE repositories. Thanks for their solid work!
If you find IG useful, please kindly cite our paper:
@article{zhou2025guiding,
title={Guiding a Diffusion Transformer with the Internal Dynamics of Itself},
author={Zhou, Xingyu and Li, Qifan and Hu, Xiaobin and Chen, Hai and Gu, Shuhang},
journal={arXiv preprint arXiv:2512.24176},
year={2025}
}