forbidden.mp4
Hong-Xing "Koven" Yu*, Haoyi Duan*, Charles Herrmann, William T. Freeman, Jiajun Wu ("*" denotes equal contribution)
For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 48GB GPU memory to run.
Clone the repo and create the environment:
git clone https://github.com/KovenYu/WonderWorld.git && cd WonderWorld
mamba create --name wonderworld python=3.10
mamba activate wonderworld
We are using Pytorch3D to perform rendering.
Run the following commands to install it or follow their installation guide (it may take some time). We tested on cuda=12.4
, other cuda
versions should also work.
# switch to cuda 12.4, other versions should also work
mamba install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install submodules/depth-diff-gaussian-rasterization-min/
pip install submodules/simple-knn/
Install the rest of the requirements:
pip install -r requirements.txt
cd ./RepViT/sam && pip install -e . && cd ../..
python -m spacy download en_core_web_sm
Export your OpenAI api_key (If you want to use GPT-4 to generate scene descriptions):
export OPENAI_API_KEY='your_api_key_here'
Download RepViT model and put it to the root directory.
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt
-
Example config file
To run an example, first you need to write a config. An example config
./config/example.yaml
is shown below (more examples are located atconfig/more_examples
, feel free to try):runs_dir: output/real_campus_2 example_name: real_campus_2 seed: 1 # enable guided depth diffusion depth_conditioning: True # use gpt to generate scene description use_gpt: False debug: True # depth model and camera/depth parameters depth_model: marigold camera_speed: 0.001 fg_depth_range: 0.015 depth_shift: 0.001 sky_hard_depth: 0.02 init_focal_length: 960 # re-generate sky panorama images gen_sky_image: False # generate sky point cloud gen_sky: False # enable layer-wise generation gen_layer: True # load previously generated gaussians load_gen: False
-
Run
On your local laptop,
git clone https://github.com/haoyi-duan/splat.git
and openindex_stream.html
.To enable interactive visualization of your results through this local web browser, follow these steps:
- Ensure you have
'ssh'
installed on your local machine. - The main program will run on server user_id@server_name
# On your local machine ssh -L 7777:localhost:7777 server_name
On the server, run the main program:
# On user_id@server_name python run.py --example_config config/example.yaml --port 7777
More examples are located at
config/more_examples
, feel free to try!Open the
index_stream.html
on your local machine, and you should see the scene in it. You can navigate withWSAD
and arrow keys.- If you specify
use_gpt=True
in your example configuration file, the scene description for this new scene will be automatically generated by LLM; if you specifyuse_gpu=False
, you can manually input scene description you want in the text box of the local browser. Remember to click 'Next scene is ...' after you are done. - Next you need to set a proper camera view for the program to generate new scene. You can do this by wondering through the browser to a novel view, then press key
'R'
to let program interactively generate new scene in this view for you. - If you are not satisfied with the current generation, you can press key
Z
to delete the previous one generation, and follow step 1 and 2 to do a new generation. - Repeat 1-3, you will interactively generate a large-scale connected scene, and you can wonder through the scene freely during the whole process.
- After some generation, you can press key
X
to save the current scene. Next time, you can load the generated scene by specifyingload_gen=True
in your configuration file.
- Ensure you have
We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).
-
Add a new image in
./examples/images/
. -
Add content of this new image in
./examples/examples.yaml
.Here is an example:
- name: new_example image_filepath: examples/images/new_example.png style_prompt: DSLR 35mm landscape content_prompt: scene name, object 1, object 2, object 3 negative_prompt: '' background: ''
-
content_prompt: "scene name", "object 1", "object 2", "object 3"
-
negative_prompt and background are optional
-
-
Write a config
config/new_example.yaml
like./config/example.yaml
for the new example. -
Run the program following the previous section. (For the first time use, the model will automatically generate the panorama sky images for the example, which takes about 20 minutes on A6000 GPU. After the corresponding sky images for the example are stored, later use of this example will automatically skip this step)
@article{yu2024wonderworld,
title={WonderWorld: Interactive 3D Scene Generation from a Single Image},
author={Hong-Xing Yu and Haoyi Duan and Charles Herrmann and William T. Freeman and Jiajun Wu},
journal={arXiv:2406.09394},
year={2024}
}
We appreciate the authors of Marigold, SyncDiffusion, RepViT, Stable Diffusion, and OneFormer to share their code.