Inspired by the cortical modularization and hippocampal pattern completion, we propose a self-supervised controllable generation (SCG) framework to achieve pattern completion and generate images.
Learning from Pattern Completion: Self-supervised Controllable Generation (NeurIPS 2024) [Arxiv] [OpenReview]
Zhiqiang Chen*, Guofan Fan*, Jinying Gao*, Lei Ma, Bo Lei, Tiejun Huang, Shan Yu
- 🍾 Sep, 2024: SCG is accepted by NeurIPS 2024, congratulations! We will release offical version as soon, please check in homepage.
- 🎉 Apr, 2024: For those interested, we update a pre-release code in Gitee.
Our original motivation is to propose and validate a novel self-supervised pipeline to achieve broad generalizations.
This framework comprises two components: a modular autoencoder and a conditional generator. Given the extensive research on conditional generation, we leverage the existing, mature ControlNet for this aspect. Our core contribution lies in designing a modular autoencoder based on proposed equivariance constraint, successfully enabling the network to spontaneously develop relatively independent and highly complementary modular features. These features are crucial for subsequent conditional generation.
Cause our work is based on ControlNet, please refer to ControlNet Docs to install environment.
python tutorial_train.py
You can modify model and dataset in tutorial_train.py
:
modelarch_path = './models/cldm_v15.yaml'
resume_path = './image_log/checkpoint_deconv_down2_3/last.ckpt'
logger_path = 'shuimo_deconv2_3_test'
dataset_name = 'MyDatasetShuimo'
To select different hypercolumn, refer to ./models/cldm_v15.yaml
:
hyperconfig:
target: cldm.cldm.HyperColumnLGN
params:
hypercond: [0]
size: 512
During inferance phrase, we use a single A100 GPU and it requires about 11G RAM to run it per image.
We propose a equivariance constraint for our modular autoencoder. The equivariance loss Lequ is the core of the equivariant constraint, primarily serving to increase independence between modules and correlation (or symmetry) within modules. The symmetry loss Lsym further enhances the correlation (or symmetry) of features within modules and suppresses the emergence of multiple unrelated features within the same module. The learned feature is visualized in the following:
We compress the raw image to enhance web experience. For raw images and more results, please refer to our paper.
The models are trained on only COCO dataset and test on multiple tasks.
The first line are the origin image and the prompt. The second line are conditions of multiple hypercolumn and canny. The last line are the generated images
We propose SCG and experimentally demonstrate that it can spontaneously emerge (or 0-shot generalize) various abilities, including super-resolution, dehazing, saturation and contrast manipulation, as well as conditional generation based on diverse styles such as oil paintings, ink paintings, ancient graffiti, sketches, and LineArt. Furthermore, SCG possesses two significant potentials: (1) Leveraging its self-supervision, SCG can further scale up its data and models to benefit from the scaling law, enhancing its basic capabilities; (2) Subsequently, SCG can be fine-tuned for specific tasks, leading to improved performance on particular tasks. These potentials suggest that SCG has the potential to become a foundation model for controllable generation. This framework comprises two components: a modular autoencoder and a conditional generator. Given the extensive research on conditional generation, we leverage the existing, mature ControlNet for this aspect. Our core contribution lies in designing a modular autoencoder based on proposed equivariance constraint, successfully enabling the network to spontaneously develop relatively independent and highly complementary modular features. These features are crucial for subsequent conditional generation.
If you find our work helpful for your research. Please consider citing our paper.
@article{scg,
title={Learning from Pattern Completion: Self-supervised Controllable Generation},
author={Zhiqiang Chen, Guofan Fan, Jinying Gao, Lei Ma, Bo Lei, Tiejun Huang and Shan Yu},
journal={arXiv preprint arXiv:2409.18694},
year={2024}
}
Our code is based on ControlNet. Thanks for their wonderful work!
SCG is licensed under the Apache License. See the LICENSE file for more details.