Jaihoon Kim*, Juil Koo*, Kyeongmin Yeo*, Minhyuk Sung (* Denotes equal contribution)
This repository contains the official implementation of SyncTweedies. SyncTweedies can be applied to various downstread applications including ambiguous image generation, arbitrary-sized image generation, 360° panorama generation and texturing 3D mesh and Gaussians. More results can be found at our project webpage.
We introduce a general diffusion synchronization framework for generating diverse visual content, including ambiguous images, panorama images, 3D mesh textures, and 3D Gaussian splats textures, using a pretrained image diffusion model. We first present an analysis of various scenarios for synchronizing multiple diffusion processes through a canonical space. Based on the analysis, we introduce a novel synchronized diffusion method, SyncTweedies, which averages the outputs of Tweedie’s formula while conducting denoising in multiple instance spaces. Compared to previous work that achieves synchronization through finetuning, SyncTweedies is a zero-shot method that does not require any finetuning, preserving the rich prior of diffusion models trained on Internet-scale image datasets without overfitting to specific domains. We verify that SyncTweedies offers the broadest applicability to diverse applications and superior performance compared to the previous state-of-the-art for each application.
- Python 3.8
- CUDA 11.7
- PyTorch 2.0.0
git clone https://github.com/KAIST-Visual-AI-Group/SyncTweedies
conda env create -f environment.yml
pip install git+https://github.com/openai/CLIP.git
pip install -e .
3D Mesh Texturing (PyTorch3D)
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu117_pyt200/download.html
3D Gaussians Texturing (Differentiable 3D Gaussian Rasterizer - gsplat)
cd synctweedies/renderer/gaussian/gsplat
python setup.py install
pip install .
Use 3D mesh and prompt pairs from Text2Tex and TEXTure. Text2Tex uses a subset of Objaverse dataset.
- 3D mesh texturing -
data/mesh/turtle.obj
(TEXTure),data/meshclutch_bag.obj
(Text2Tex)
For 3D mesh texture editing, use the generated 3D mesh from Luma AI.
- 3D mesh texture editing (SDEdit) -
data/mesh/sdedit/mesh.obj
(Luma AI)
Use depth maps from 360MonoDepth to generate 360° panoamra images.
- 360° panoamra generation -
data/panorama
Download Synthetic NeRF dataset and reconstruct 3D scenes using either 3D Gaussian Splatting framework or gsplat.
Use the reconstructed 3D scene for texturing 3D Gaussians.
- 3D Gaussians texturing -
data/gaussians/chair
anddata/gaussians/chair.ply
.
Please run the commands below to run each application.
Ambiguous Image
1-to-1 Projection
python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now
1-to-n Projection
python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now --views_names identity inner_rotate
n-to-1 Projection
python main.py --app ambiguous_image --case_num 2 --tag ambiguous_image --save_dir_now --optimize_inverse_mapping
--prompts
Text prompts to guide the generation process. (Provide a prompt per view)
--save_top_dir
Directory to save intermediate/final outputs.
--tag
Tag output directory.
--save_dir_now
Save output directory with current time.
--case_num
Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)
--seed
Random seed.
--views_names
View transformation to each denoising process.
--rotate_angle
Rotation angle for rotation transformations.
--initialize_xt_from_zt
Initialize the initial random noise by projecting from the canonical space.
--optimize_inverse_mapping
Use optimization for projection operation. (n-to-1 projection)
Arbitrary-sized Image
python main.py --app wide_image --prompt "A photo of a mountain range at twilight" --save_top_dir ./output --save_dir_now --tag wide_image --case_num 2 --seed 0 --sampling_method ddim --num_inference_steps 50 --panorama_height 512 --panorama_width 3072 --mvd_end 1.0 --initialize_xt_from_zt
--prompts
Text prompts to guide the generation process.
--save_top_dir
Directory to save intermediate/final outputs.
--tag
Tag output directory.
--save_dir_now
Save output directory with current time.
--case_num
Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)
--seed
Random seed.
--sampling_method
Denoising sampling method.
--num_inference_steps
Number of sampling steps.
--panorama_height
The height of the image to generate.
--panorama_width
The width of the image to generate.
--mvd_end
Step to stop the synchronization. (1.0 - Synchronize all timesteps, 0.0 - No synchronizaiton)
--initialize_xt_from_zt
Initialize the initial random noise by projecting from the canonical space.
3D Mesh Texturing
python main.py --app mesh --prompt "A hand carved wood turtle" --save_top_dir ./output --tag mesh --save_dir_now --case_num 2 --mesh ./data/mesh/turtle.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt
--prompts
Text prompts to guide the generation process.
--save_top_dir
Directory to save intermediate/final outputs.
--tag
Tag output directory.
--save_dir_now
Save output directory with current time.
--case_num
Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)
--mesh
Path to input 3D mesh.
--seed
Random seed.
--sampling_method
Denoising sampling method.
--initialize_xt_from_zt
Initialize the initial random noise by projecting from the canonical space.
--steps
Number of sampling steps.
python main.py --app mesh --prompt "lantern" --save_top_dir ./output --tag mesh --save_dir_now --case_num 2 --mesh ./data/mesh/sdedit/mesh.obj --seed 0 --sampling_method ddim --initialize_xt_from_zt --sdedit --sdedit_prompt "A Chinese style lantern" --sdedit_timestep 0.2
--sdedit
Editing 3D mesh texture.
--sdedit_prompt
Target editing prompt. This overrides the original prompt.
--sdedit_timestep
Timestep to add noise. (1.0 - x_0, 0.0 - x_T)
360° Panorama
python main.py --app panorama --tag panorama --save_top_dir ./output --save_dir_now --prompt "An old looking library" --depth_data_path ./data/panorama/cf726b6c0144425282245b34fc4efdca_depth.dpt --case_num 2 --average_rgb --initialize_xt_from_zt --model controlnet
--prompts
Text prompts to guide the generation process.
--save_top_dir
Directory to save intermediate/final outputs.
--tag
Tag output directory.
--save_dir_now
Save output directory with current time.
--depth_data_path
Path to depth map image.
--case_num
Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)
--mesh
Path to input 3D mesh.
--seed
Random seed.
--sampling_method
Denoising sampling method.
--initialize_xt_from_zt
Initialize the initial random noise by projecting from the canonical space.
--steps
Number of sampling steps.
--canonical_rgb_h
Resolution (height) of the RGB canonical space.
--canonical_rgb_w
Resolution (width) of the RGB canonical space.
--canonical_latent_h
Resolution (width) of the latent canonical space.
--canonical_latent_w
Resolution (width) of the latent canonical space.
--instance_latent_size
Resolution of the latent instance space.
--instance_rgb_size
Resolution of the RGB instance space.
--theta_range
Azimuthal range (0-360)
--theta_interval
Interval of the azimuth.
--FOV
Resolution of the RGB instance space.
--average_rgb
Perform averaging in the RGB domain (Only valid for Case 2 and Case 5).
3D Gaussians Texturing
python main.py --app gs --tag gs --save_dir_now --save_top_dir ./output --prompt "A photo of majestic red throne, adorned with gold accents" --source_path ./data/gaussians/chair --plyfile ./data/gaussians/chair.ply --dataset_type blender --case_num 2 --zt_init --force_clean_composition
--prompts
Text prompts to guide the generation process.
--save_top_dir
Directory to save intermediate/final outputs.
--tag
Tag output directory.
--save_dir_now
Save output directory with current time.
--case_num
Denoising case num. Refer to the main paper for other cases. (Case 2 - SyncTweedies)
--source_path
Path to input dataset (Refer to 3D Gaussian Splatting repo for data format).
--plyfile
Path to 3D Gaussians model plyfile.
--dataset_type
Input dataset type {colmap, blender}.
--zt_init
Initialize the initial random noise by projecting from the canonical space.
--no-antialiased
Used for 3D scenes trained with 3D Gaussian Splatting framework. Do not provide this option when using 3D scenes reconstructed with gsplat.
@article{kim2024synctweedies,
title={SyncTweedies: A General Generative Framework Based on Synchronized Diffusions},
author={Kim, Jaihoon and Koo, Juil and Yeo, Kyeongmin and Sung, Minhyuk},
journal={arXiv preprint arXiv:2403.14370},
year={2024}
}
This repository is based on Visual Anagrams, SyncMVD, and gsplat. We thank the authors for publicly releasing their codes.