Skip to content

Latest commit

 

History

History
176 lines (140 loc) · 7.16 KB

README.md

File metadata and controls

176 lines (140 loc) · 7.16 KB

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

Yang Miao1, Francis Engelmann1, 2, Olga Vysotska1, Federico Tombari2, 3, Marc Pollefeys1, 4, Dániel Béla Baráth 1

1ETH Zurich 2Google 3TU Munich 4Microsoft

SceneGraphLoc solves the novel problem of cross-modal localization of a query image within 3D scene graphs incorporating a mixture of modalities.

teaser

[Video]

News 📰

  • 16. Sep 2024: Instructions updated and pretrained model uploaded.
  • 8. July 2024: Code was made public.
  • 1. July 2024: Accepted to ECCV 2024!
  • 26. Mar 2024: Code was uploaded.

Code Structure 🎬

├── VLSG
│   ├── preprocessing         <- data preprocessing
│   ├── configs               <- configuration definition
│   ├── src
│   │   │── datasets          <- dataloader for 3RScan and Scannet data
│   │   │── engine            <- trainer classes
│   │   │── models            <- definition of models and losses
│   │   │── room_retrieval    <- inference for scene graph retrieval
│   │   │── trainers          <- train + validation loop 
│   ├── scripts               <- implementation scripts 
│   │── utils                 <- util functions
│   │── README.md                    

Dependencies 📝

The project has been tested on Ubuntu 20.04. The main dependencies of the project are the following:

python: 3.8.15
cuda: 11.6

You can set up an environment as follows :

git clone https://github.com/y9miao/VLSG.git
cd VLSG

conda create -n "VLSG" python=3.8.15
conda activate VLSG
pip install -r requirement.txt

Other dependences:

conda activate VLSG
pip install -r other_deps.txt

cd thrid_party/Point-NN
pip install pointnet2_ops_lib/.

Dataset Generation 🔨

Download Dataset - 3RScan + 3DSSG

Download 3RScan and 3DSSG. Move all files of 3DSSG to a new 3RScan/files/ directory within Scan3R. The additional meta files are available here. Download the additional meta files and move them to 3RScan/files/. The structure should be:

├── 3RScan
│   ├── files                 <- all 3RScan and 3DSSG meta files and annotations
│   │   ├──Features2D         <- Pre-computed patches features of query images
│   │   ├──Features3D         <- Visual features of 3D objects
│   │   ├──orig               <- Scene Graph Data
│   │   ├──patch_anno         <- Ground truth patch-object annotation of query images
│   │   meta files
│   ├── scenes                <- scans

To generate labels.instances.align.annotated.v2.ply for each 3RScan scan, please refer to the repo from here.

Dataset Pre-process 🔨

After installing the dependencies, we download and pre-process the datasets.

First, we pre-process the scene graph information provided in the 3RScan annotation. The relevant code can be found in the data-preprocessing/ directory.
Don't forget to set the env variables "VLSG_SPACE" as the repository path, set "Data_ROOT_DIR" as the path to "3RScan" dataset and set "CONDA_BIN" to accordingly in the bash script.

bash scripts/preprocess/scan3r_data_preprocess.sh

The result processed data will be save to "{Data_ROOT_DIR}/files/orig".

Generating Ground Truth Patch-Object Annotastion

To generate ground truth annotation, use :

bash scripts/gt_annotations/scan3r_gt_annotations.sh

This will create a pixel-wise and patch-level ground truth annotations for each query image.

Patch-Level Features Pre-compute

In order to speed up training, we pre-compute the patch-level features with Dino v2. To generate the features, use :

bash scripts/features2D/scan3r_dinov2.sh

This will create patch-level features for query images and save in "{Data_ROOT_DIR}/Features2D/DinoV2_16_9_scan".

Image Crops of 3D Objects

In order to speed up training, we pre-compute the features of multi-view and multi-level image crops of 3D objects, which are used for 3D scene graph embeddings of image modality. To generate the features, use :

bash scripts/features3D/scan3r_sg_image.sh

This will create 10-view, 3-level features of image crops of 3D objects and save in "{Data_ROOT_DIR}/Features3D/obj_dinov2_top10_l3".

Training 🚄

To train SceneGraphLoc on 3RScan dataset generated from here, you can use :

bash scripts/train_val/train.sh

Evaluation 🚦

To evaluate SceneGraphLoc on 3RScan dataset in the task of scene graph retrieval, you can use:

bash scripts/train_val/inference.sh

The pretrained model is available here.
Don't forget to set the env variable "ROOM_RETRIEVAL_OUT_DIR" to the parent dir of the pretrained model.

Benchmark 📈

Coarse Visual Localization on 3D Scene Graphs

teaser

BibTeX 🙏

@misc{miao2024scenegraphloc,
      title={SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs}, 
      author={Yang Miao and Francis Engelmann and Olga Vysotska and Federico Tombari and Marc Pollefeys and Dániel Béla Baráth},
      year={2024},
      eprint={2404.00469},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgments ♻️

In this project we use (parts of) the official implementations of the following works and thank the respective authors for sharing the code of their methods: