Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence
⭐ACCV 2024⭐
Felipe Cadar · Guilherme Potje · Renato Mastins · Cédric Demonceaux · Erickson R Nascimento
Leveraging semantic information for improving visual correspondence.
To set up the environment for training, run the following command to create a new conda environment. We recommend using Python 3.9:
conda create -n reason python=3.9
Activate the environment before proceeding:
conda activate reason
Install the package:
pip install -e .
from reasoning.features.desc_reasoning import load_reasoning_from_checkpoint, Reasoning
# load the model with pre-trained weights
semantic_reasoning = load_reasoning_from_checkpoint('models/xfeat/')
# load it into the auxiliary class
reasoning_model = Reasoning(semantic_reasoning['model'])
# match two images
match_response = reasoning_model.match({
'image0': image0, # BxCxHxW normalized to [0,1]
'image1': image1 # BxCxHxW normalized to [0,1]
})
# get the matches
mkpts0 = match_response['matches0'] # BxNx2
mkpts1 = match_response['matches1'] # BxNx2
The example.py script shows how to automatically download and run a specific model.
The following table contains links to all the models and weights we used in our experiments.
Descriptor | Pre-trained weights | Size |
---|---|---|
xfeat | Download | 91.6 MB |
superpoint | Download | 91.0 MB |
alike | Download | 92.1 MB |
aliked | Download | 91.9 MB |
dedode_B | Download | 92.2 MB |
dedode_G | Download | 94.1 MB |
xfeat-12_layers-dino_G | Download | 221.0 MB |
xfeat-12_layers | Download | 219.0 MB |
xfeat-3_layers | Download | 57.1 MB |
xfeat-7_layers | Download | 132 MB |
xfeat-9_layers | Download | 167 MB |
xfeat-dino-G | Download | 94.3 MB |
xfeat-dino_B | Download | 92.3 MB |
xfeat-dino_L | Download | 92.6 MB |
You might want to train your own model to reason about your own descriptors. You need to take some preparations:
The processed dataset is available for download here: h5_scannet.zip
But if you want to follow the same steps we took to create it, take a look at the steps bellow.
To prepare the Scannet dataset for training, follow these steps:
- Download Scannet: First, download the Scannet dataset. Make sure to read and accept the terms of use.
python reasoning/scripts/scannet/01_download_scannet.py --out_dir datasets/scannet
- Extract Frames: Extract frames from the downloaded dataset, skipping every 15 frames.
python reasoning/scripts/scannet/02_extract_scannet.py --data_path datasets/scannet
- Calculate Covisibility: Calculate the covisibility between frames to identify good pairs for training.
python reasoning/scripts/scannet/03_calculate_scannet_covisibility.py --data_path datasets/scannet
- Convert to H5 Files: Convert the prepared data into H5 files for easier handling during training. It also helps to keep the number of files small in cluster enviroments.
python reasoning/scripts/scannet/04_build_h5.py --data_path datasets/scannet --output datasets/h5_scannet/
To speed up the training process, pre-extract some features from the dataset. Ours scripts read the h5 dataset and save the features to the save directory
Extract DINOv2-S features from the H5 dataset. You can adjust the batch size according to your system's capabilities.
python reasoning/scripts/export_dino.py --data ./datasets/h5_scannet --batch_size 4 --dino_model dinov2_vits14
For larger models, simply change the --dino_model
argument to one of the following: dinov2_vitb14
, dinov2_vitl14
, or dinov2_vitg14
.
Extract XFeat features from the dataset. Adjust the batch size as needed.
python reasoning/scripts/export_xfeat.py --data ./datasets/h5_dataset --batch_size 4 --num_keypoints 2048 h5_scannet
Your dataset folder should look like this:
datasets/
├── h5_scannet/
│ ├── train/
│ ├── features/
│ │ ├── dino-scannet-dinov2_vits14/
│ │ └── xfeat-scannet-n2048/
└── scannet/
└── scans/
For other descriptors, please check the reasoning/scripts/export_*.py
scripts.
All training and experiments were conducted on a SLURM cluster with 4xV100 32GB GPUs. Adjust the batch size to match your system's capabilities.
To start training, run the following command:
python reasoning/train_multigpu_reasoning.py \
--batch_size 16 \
--data ./datasets/h5_scannet \ # dataset folder with images and features
--plot_every 200 \ # tensorboard matching plots
--extractor_cache 'xfeat-scannet-n2048' \ # local features
--dino_cache 'dino-scannet-dinov2_vits14' \ # semantic features
-C xfeat-dinov2 # comment for tracking your exps
If you want to skip all the multi-gpu shenanigans, you can simply add the --local
flag.
This work was partially supported by grants from CAPES, CNPq, FAPEMIG, Google, ANER MOVIS from Conseil Régional BFC and ANR (ANR-23-CE23-0003-01), to whom we are grateful. This project was also provided with AI computing and storage resources by GENCI at IDRIS thanks to the grant 2024-AD011015289 on the supercomputer Jean Zay’s V100 partitions.
Shout out to the authors of DeDoDe for this readme header. Its quite nice.