This is the official PyTorch implementation of our work: "Learning Semantics for Visual Place Recognition through Multi-Scale Attention" accepted at ICIAP 2021.
In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors incorporating semantic and appearance information is beneficial, current state-of-the-art methods opt for a top down definition of the significant semantic content. Here we present the first VPR algorithm that learns robust global embeddings from both visual appearance and semantic content of the data, with the segmentation process being dynamically guided by the recognition of places through a multi-scale attention module. Experiments on various scenarios validate this new approach and demonstrate its performance against state-of-the-art methods. Finally, we propose the first synthetic-world dataset suited for both place recognition and segmentation tasks.
Read the paper here: [ArXiV]
Overview | Architecture | |
---|---|---|
Setup:
- Install Python3.6+
- Install pip3
pip install -r
requirements.txt
Datasets: (please refer to details)
- IDDAv2 dataset NOW available at Official IDDA Web Site ;
Town 3 | Town 10 | ||
---|---|---|---|
UTMx 277349.751 UTMy 110471.756 |
UTMx 277414.576 UTMy 110665.787 |
Usage:
- Train: Using default parameters, the script uses the final architecture configuration with
ResNet50 encoder, DeepLab semantic segmentation module, multi-scale pooling layer from 4th and 5th conv blocks and
finally the domain adaptation module.
It follows the exact training protocol and implementation details described into the main paper and supplementary
material. It trains all layers of the encoder and uses the multi-scale attention computed with the features
extracted from the 4th conv block.
python3 main.py --exp_name=<name output log folder> --dataset_root=<root path of IDDAv2 train dataset> --dataset_root_val=<root path of IDDAv2 val dataset> --dataset_root_test=<root path of RobotCar dataset> --DA_datasets=<path to the RobotCar folder where all scenarios are merged>
To resume the training specify--resume=<path of checkpoint .pth>
To change the encoder specify--arch=resnet101
To change the semantic segmentation module specify--semnet=pspnet
- Evaluate:
python3 eval.py --resume=<path of checkpoint .pth> --dataset_root_val=<root path of IDDAv2 val dataset> --dataset_root_test=<root path of RobotCar dataset>
Pretrained models:
- ResNet50 + DeepLab
- ResNet50 + PSPNet
- ResNet101 + DeepLab
- ResNet101 + PSPNet
Please note: the main paper shows average recalls obtained from all configurations run three times with different seeds respectively. Here instead we provide only one model per configuration.
If you use this repository, please consider to cite us:
@InProceedings{Paolicelli_2022_ICIAP,
author = {Paolicelli, Valerio and Tavera, Antonio and Masone, Carlo and Berton, Gabriele Moreno and Caputo, Barbara}},
title = {Learning Semantics for Visual Place Recognition through Multi-Scale Attention},
booktitle = {Image Analysis and Processing – ICIAP 2022},
month = {},
year = {},
pages = {}