GitHub - yolish/multi-scene-pose-transformer: Multi-Scene Camera Pose Regression with Transformers

Learning Multi-Scene Camera Pose Regression with Transformers (Oral ICCV2021)

Official PyTorch implementation of a multi-scene camera pose regression paradigm with Transformers, for details see our paper Learning Multi-Scene Absolute Pose Regression with Transformers.

The figure below illustrates our approach: two transformers separately attend to position- and orientation- informative features from a convolutional backbone. Scene-specific queries (0-3) are further encoded with aggregated activation maps into latent representations, from which a single output is selected. The strongest response, shown as an overlaid color-coded heatmap of attention weights, is obtained with the output associated with the input image's scene. The selected outputs are used to regress the position x and the orientation q.

Repository Overview

This code implements:

Training of a Transformer-based architecture for multi-scene absolute pose regression
Training of a PoseNet-like (CNN based) architecture for single scene pose regression
Testing of the models implemented in 1-2

Prerequisites

In order to run this repository you will need:

Python3 (tested with Python 3.7.7, 3.8.5), PyTorch
Set up dependencies with pip install -r requirements.txt
Download the Cambridge Landmarks dataset and the 7Scenes dataset

Pretrained Models

You can download our pretrained models for the 7Scenes dataset and the Cambridge dataset (before and after finetuning), from here: pretrained models

Usage

The entry point for training and testing is the main.py script in the root directory

For detailed explanation of the options run:)

python main.py -h

For example, in order to train our model on the 7Scenes dataset run:

python main.py ems-transposenet train ./models/backbones/efficient-net-b0.pth /path/to/7scenes-datasets ./datasets/7Scenes/7scenes_all_scenes.csv 7Scenes_config.json

Your checkpoints (.pth file saved based on the number you specify in the configuration file) and log file will be saved under an 'out' folder.

To run on cambridge, you will need to change the configuration file to CambridgeLandmarks_config.json for initial training and CambridgeLandmarks_finetune_config.json for fine-tuning (see details in our paper).

In order to test your model, for example on the fire scene from the 7Scenes dataset:

python main.py ems-transposenet test ./models/backbones/efficient-net-b0.pth /path/to/7scenes-datasets ./datasets/7Scenes/abs_7scenes_pose.csv_fire_test.csv 7Scenes_config.json --checkpoint_path <path to your checkpoint .pth>

Citation

If you find this repository useful, please consider giving a star and citation:

@article{Shavit21,
  title={Learning Multi-Scene Absolute Pose Regression with Transformers},
  author={Shavit, Yoli and Ferens, Ron and Keller, Yosi},
  journal={arXiv preprint arXiv:2103.11468},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
datasets		datasets
img		img
models		models
plots		plots
scripts		scripts
util		util
7Scenes_config.json		7Scenes_config.json
CambridgeLandmarks_config.json		CambridgeLandmarks_config.json
CambridgeLandmarks_finetune_config.json		CambridgeLandmarks_finetune_config.json
README.md		README.md
example_configs.json		example_configs.json
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Multi-Scene Camera Pose Regression with Transformers (Oral ICCV2021)

Repository Overview

Prerequisites

Pretrained Models

Usage

Citation

About

Releases

Packages

Languages

yolish/multi-scene-pose-transformer

Folders and files

Latest commit

History

Repository files navigation

Learning Multi-Scene Camera Pose Regression with Transformers (Oral ICCV2021)

Repository Overview

Prerequisites

Pretrained Models

Usage

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages