Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu. CVPR 2025.
conda create --name sclg python=3.12
conda activate sclg
pip install mitsuba # tested for mitsuba==3.6.4
pip install unidecode Pillow anthropic transforms3d astor ipdb scipy jaxtyping imageio tqdm
# required for minecraft renderer
pip install spacy
python -m spacy download en_core_web_md
git clone https://github.com/zzyunzhi/scene-language.git
cd scene-language
pip install -e .
Run python scripts/installation/test_install.py
to check if the installation is successful.
Get your Anthropic API key following the official documentation
and add it to engine/key.py
:
ANTHROPIC_API_KEY = 'YOUR_ANTHROPIC_API_KEY'
OPENAI_API_KEY = 'YOUR_OPENAI_API_KEY' # optional, required for `LLM_PROVIDER='gpt'`
By default, we use Claude 3.7 Sonnet. You may switch to other language models by setting LLM_PROVIDER
in engine/constants.py
.
python scripts/run.py --tasks "a chessboard with a full set of chess pieces"
# Experimental
python scripts/run_self_reflect_with_moe.py --tasks "Sponge Bob and friends"
Renderings will be saved to ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.gif
.
Example results with Claude 3.5 Sonnet (raw outputs here):
"a chessboard with a full set of chess pieces" | "A 9x9 Sudoku board partially filled with numbers" | "a scene inspired by Egon Schiele" | "a Roman Colosseum" | "a spider puppet" |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
ENGINE_MODE=minecraft python scripts/run.py --tasks "a detailed cylindrical medieval tower"
Generated scenes are saved as json files in ${PROJ_ROOT}/scripts/outputs/run_${timestep}_${uuid}/${scene_name}_${uuid}/${sample_index}/renderings/*.json
.
For visualization, run the following command:
python viewers/minecraft/run.py
Then open http://127.0.0.1:5001 in your browser and drag generated json files to the web page.
Example results (raw outputs here):
"a witch's house in Halloween" | "a detailed cylindrical medieval tower" | "a detailed model of Picachu" | "Stonehenge" | "a Greek temple" |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
python scripts/run.py --tasks ./resources/examples/* --cond image --temperature 0.8
Macro definitions
The following table lists helper functions defined in this file in accordance with expressions defined in the domain-specific language (DSL) (Tables 2 and 5 of the paper):
Implementation | DSL |
---|---|
register |
bind |
library_call |
call |
primitive_call |
call |
loop |
union-loop |
concat_shapes |
union |
transform_shape |
transform |
rotation_matrix |
rotation |
translation_matrix |
translate |
scale_matrix |
scale |
reflection_matrix |
reflect |
compute_shape_center |
compute-shape-center |
compute_shape_min |
compute-shape-min |
compute_shape_max |
compute-shape-max |
compute_shape_sizes |
compute-shape-sizes |
The pipeline is sensitive to small changes in the prompts as shown here. It is recommended to run prompts with some variations for better results.
The current codebase allows you to generate 3D scenes with text or image prompts. Other tasks and renderers reported in the paper will be supported in future updates.
Please open a github issue or email us if encountering any issues.
If you find this work useful, please consider cite our paper:
@article{zhang2024scenelanguage,
title={The Scene Language: Representing Scenes with Programs, Words, and Embeddings},
author={Yunzhi Zhang and Zizhang Li and Matt Zhou and Shangzhe Wu and Jiajun Wu},
year={2024},
journal={arXiv preprint arXiv:2410.16770},
}