ImageNet Generation (w/o cfg or any other guidance techniques):
-
$256\times 256$ : ~$20\text{h}$ total training time ($160$ epochs) →$100$ NFE → FID$1.53$ -
$512\times 512$ : ~$50\text{h}$ total training time ($400$ epochs) →$100$ NFE → FID$1.89$
All training time measurements are obtained on an 8×H800 GPU cluster.
Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution.
These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models.
To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models.
Our approach GMem decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data.
The results are significant: GMem enhances both training, sampling efficiency, and diversity generation.
This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency.
On ImageNet at
-
Python and PyTorch:
- 64-bit Python 3.10 or later.
- PyTorch 2.4.0 or later (earlier versions might work but are not guaranteed).
-
Additional Python Libraries:
- A complete list of required libraries is provided in the requirements.txt file.
- To install them, execute the following command:
pip install -r requirements.txt
To set up the evaluation and sampling of images from the pretrained GMem-XL model, here are the steps to follow:
- Pretrained model: Download the pretrained weights for the network and corresponding memory bank from the provided link on Huggingface:
Backbone | Training Epoch | Dataset | Bank Size | FID | Download |
---|---|---|---|---|---|
LightningDiT-XL | 160 | ImageNet |
1.28M | 1.53 | Huggingface |
- VA-VAE Tokenizer: You also need the VA-VAE tokenizer. Download the tokenizer from the official repository at VA-VAE on GitHub.
-
Once you’ve downloaded the necessary pretrained models and tokenizers, modify the following configuration files with the correct paths:
-
For the GMem model (
configs/gmem_sde_xl.yaml
):- Update the
ckpt_path
with the location where you saved the pretrained weights. - Update the
GMem:bank_path
with the location of the bank size data. - Also, specify the path to the reference file (
VIRTUAL_imagenet256_labeled.npz
, see ADM for details) for FID calculation in thedata:fid_reference_file
argument.
- Update the
-
For the VA-VAE Tokenizer (
tokenizer/configs/vavae_f16d32.yaml
):- Specify the path to the tokenizer in the
ckpt_path
section of the configuration.
- Specify the path to the tokenizer in the
-
- Use the provided script to sample images and automatically calculate the FID score:
bash scripts/evaluation_gmem_xl.sh
To incorporate external knowledge using previously unseen images, follow the steps below:
- Store the new images in the
assets/novel_images
directory. - Execute the script to generate new images:
bash scripts/external_knowledge_generation.sh
To generate new memory snippets by interpolating between two existing images, follow these steps:
- Place the source images in the
assets/interpolation/lerp/a
andassets/interpolation/lerp/b
directories, ensuring both images have identical filenames. - Run the script to create interpolated images:
bash scripts/internal_knowledge_generation.sh
-
Set up VA-VAE: Follow the instructions in the Evaluation and VA-VAE tutorial to properly set up and configure the VA-VAE model.
-
Extract Latents: Once VA-VAE is set up, you can run the following script to extract the latents for all ImageNet images:
bash scripts/preprocessing.sh
This script will process all ImageNet images and store their corresponding latents.
-
Modify the Configuration: After extracting the latents, you need to update the
data:data_path
in theconfigs/gmem_sde_xl.yaml
file. Set this path to the location where the extracted latents are stored. This ensures that GMem-XL can access the processed latents during training.
With the data prepared and the latents extracted, you can proceed to train the GMem-XL model by simply run the following script:
bash scripts/train_gmem_xl.sh
If you find this repository helpful for your project, please consider citing our work:
@article{tang2024generative,
title={Generative Modeling with Explicit Memory},
author={Tang, Yi and Sun, Peng and Cheng, Zhenglin and Lin, Tao},
journal={arXiv preprint arXiv:2412.08781},
year={2024}
}
This code is mainly built upon VA-VAE, SiT, edm2, and REPA repositories.