Interactive 3D Scene Generation from a Single Image

forbidden.mp4

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing "Koven" Yu*, Haoyi Duan*, Charles Herrmann, William T. Freeman, Jiajun Wu ("*" denotes equal contribution)

Getting Started

Installation

For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 48GB GPU memory to run.

Clone the repo and create the environment:

git clone https://github.com/KovenYu/WonderWorld.git && cd WonderWorld
mamba create --name wonderworld python=3.10
mamba activate wonderworld

We are using Pytorch3D to perform rendering. Run the following commands to install it or follow their installation guide (it may take some time). We tested on cuda=12.4, other cuda versions should also work.

# switch to cuda 12.4, other versions should also work
mamba install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
mamba install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install submodules/depth-diff-gaussian-rasterization-min/
pip install submodules/simple-knn/

Install the rest of the requirements:

pip install -r requirements.txt
cd ./RepViT/sam && pip install -e . && cd ../..
python -m spacy download en_core_web_sm

Export your OpenAI api_key (If you want to use GPT-4 to generate scene descriptions):

export OPENAI_API_KEY='your_api_key_here'

Download RepViT model and put it to the root directory.

wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.pt

Run examples

Example config file

To run an example, first you need to write a config. An example config ./config/example.yaml is shown below (more examples are located at config/more_examples, feel free to try):

runs_dir: output/real_campus_2
example_name: real_campus_2

seed: 1
# enable guided depth diffusion
depth_conditioning: True

# use gpt to generate scene description
use_gpt: False
debug: True

# depth model and camera/depth parameters
depth_model: marigold
camera_speed: 0.001
fg_depth_range: 0.015
depth_shift: 0.001
sky_hard_depth: 0.02
init_focal_length: 960

# re-generate sky panorama images
gen_sky_image: False
# generate sky point cloud
gen_sky: False

# enable layer-wise generation
gen_layer: True
# load previously generated gaussians
load_gen: False

Run

Local Visualization Setup:

On your local laptop, git clone https://github.com/haoyi-duan/splat.git and open index_stream.html.

To enable interactive visualization of your results through this local web browser, follow these steps:
- Ensure you have 'ssh' installed on your local machine.
- The main program will run on server user_id@server_name
```
# On your local machine
ssh -L 7777:localhost:7777 server_name
```
Main Program Running:

On the server, run the main program:
```
# On user_id@server_name
python run.py --example_config config/example.yaml --port 7777
```
More examples are located at config/more_examples, feel free to try!

Interactive Generation Step:

Open the index_stream.html on your local machine, and you should see the scene in it. You can navigate with WSAD and arrow keys.
1. If you specify use_gpt=True in your example configuration file, the scene description for this new scene will be automatically generated by LLM; if you specify use_gpu=False, you can manually input scene description you want in the text box of the local browser. Remember to click 'Next scene is ...' after you are done.
2. Next you need to set a proper camera view for the program to generate new scene. You can do this by wondering through the browser to a novel view, then press key 'R' to let program interactively generate new scene in this view for you.
3. If you are not satisfied with the current generation, you can press key Z to delete the previous one generation, and follow step 1 and 2 to do a new generation.
4. Repeat 1-3, you will interactively generate a large-scale connected scene, and you can wonder through the scene freely during the whole process.
5. After some generation, you can press key X to save the current scene. Next time, you can load the generated scene by specifying load_gen=True in your configuration file.

How to add more examples?

We highly encourage you to add new images and try new stuff! You would need to do the image-caption pairing separately (e.g., using DALL-E to generate image and GPT4V to generate description).

Add a new image in ./examples/images/.

Add content of this new image in ./examples/examples.yaml.

Here is an example:

- name: new_example
  image_filepath: examples/images/new_example.png
  style_prompt: DSLR 35mm landscape
  content_prompt: scene name, object 1, object 2, object 3
  negative_prompt: ''
  background: ''

content_prompt: "scene name", "object 1", "object 2", "object 3"
negative_prompt and background are optional

Write a config config/new_example.yaml like ./config/example.yaml for the new example.
Run the program following the previous section. (For the first time use, the model will automatically generate the panorama sky images for the example, which takes about 20 minutes on A6000 GPU. After the corresponding sky images for the example are stored, later use of this example will automatically skip this step)

Citation

@article{yu2024wonderworld,
    title={WonderWorld: Interactive 3D Scene Generation from a Single Image},
    author={Hong-Xing Yu and Haoyi Duan and Charles Herrmann and William T. Freeman and Jiajun Wu},
    journal={arXiv:2406.09394},
    year={2024}
}

Related Project

[CVPR2024] WonderJourney: Going from Anywhere to Everywhere

Acknowledgement

We appreciate the authors of Marigold, SyncDiffusion, RepViT, Stable Diffusion, and OneFormer to share their code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
RepViT		RepViT
assets		assets
config		config
examples		examples
gaussian_renderer		gaussian_renderer
marigold_lcm		marigold_lcm
marigold_module		marigold_module
midas_module		midas_module
models		models
scene		scene
submodules		submodules
syncdiffusion		syncdiffusion
util		util
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
arguments.py		arguments.py
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive 3D Scene Generation from a Single Image

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing "Koven" Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu ("*" denotes equal contribution)

Getting Started

Installation

Run examples

Local Visualization Setup:

Main Program Running:

Interactive Generation Step:

How to add more examples?

Citation

Related Project

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

KovenYu/WonderWorld

Folders and files

Latest commit

History

Repository files navigation

Interactive 3D Scene Generation from a Single Image

WonderWorld: Interactive 3D Scene Generation from a Single Image

Hong-Xing "Koven" Yu*, Haoyi Duan*, Charles Herrmann, William T. Freeman, Jiajun Wu ("*" denotes equal contribution)

Getting Started

Installation

Run examples

Local Visualization Setup:

Main Program Running:

Interactive Generation Step:

How to add more examples?

Citation

Related Project

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Hong-Xing "Koven" Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu ("*" denotes equal contribution)

Packages