*** This is a collection of our visual localization frameworks.
OFVL-MS (
@ICCV'23
): OFVL-MS: Once for Visual Localization across Multiple Indoor Scenes
UFVL-Net (
@TIM'23
): UFVL-Net: A Unified Framework for Visual Localization across Multiple Indoor Scenesss
-
Once-for-multiple-scenes. Both OFVL-MS and UFVL-Net optimize visual localization tasks of various scenes collectively using a multi-task learning manner, which challenges the conventional wisdom that SCoRe typically trains a separate model for each scene. OFVL-MS realizes layer-wise parameters sharing, while UFVL-Net realizes channel-wise and kernel-wise sharing polices.
-
Competive performance. Both OFVL-MS and UFVL-Net deliver extraordinary performances on two benchmarks and complex real scenes. We demonstrate that once the training for our methods are done, our methods can generalize to new scenes with much fewer parameters by freezing the task-shared parameters.
To set up the enviroment you can easily run the following command:
- Create environment
conda create -n ufvlnet python=3.7
conda activate ufvlnet
- Install torch, we verify UFVL-Net with pytorch 1.10.1 and cuda 11.3.
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
- Subsequently, we need to build the cython module to install the PnP solver:
cd ./pnpransac
rm -rf build
python setup.py build_ext --inplace
- Install openmmlab packages.
pip install mmcv-full==1.5.0
- Compile the ufvl-net as a package.
cd ufvl_net
pip install -e .
cd ..
export PYTHONPATH=./ufvl_net/
We utilize two standard datasets (i.e, 7-Scenes and 12-Scenes) to evaluate our method.
- 7-Scenes: The 7-Scenes dataset can be downloaded from 7-Scenes.
- 12-Scenes: The 12-Scenes dataset can be downloaded from 12-Scenes.
- LIVL: The real-world LIVL dataset can be downloaded from RealWorld-Scenes.
LIVL dataset collection equipment contains a mobile chassis, a RealSense D435 camera, and a VLP-16 laser radar. LIVL dataset records RGB-D images and corresponding camera poses of four different indoor environments.
The dataset is available at here.
Specifically, we utilize the ROS system to record RGB images and aligned depth images with corresponding timestamp
For each scene, four sequences are recorded, in which three sequences are used for training and one sequence for testing.
- K544: a room spanning about
$12 \times 9 m^{2}$ with$3109$ images for training and$1112$ images for testing. - Floor5: a hall spanning about
$12 \times 5 m^{2}$ with$2694$ images for training and$869$ images for testing. - Parking lot1: a parking lot spanning about
$8 \times 6 m^{2}$ with$2294$ images for training and$661$ images for testing. - Parking lot2: a parking lot spanning about
$8 \times 8 m^{2}$ with$2415$ images for training and$875$ images for testing.