Skip to content

canyagmur/XPoint

Repository files navigation

XPoint
A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration

| İsmail Can Yağmur |
| Hasan F. Ateş |
| Bahadır K. Güntürk |

example
Accurate multimodal image matching presents significant challenges due to non-linear intensity variations across spectral modalities, extreme viewpoint changes, and the scarcity of labeled datasets. Current state-of-the-art methods are typically specialized for a single spectral difference, such as visible-infrared, and struggle to adapt to other modalities due to their reliance on expensive supervision, such as depth maps or camera poses. To address the need for rapid adaptation across modalities, we introduce XPoint, a self-supervised, modular image-matching framework designed for adaptive training and fine-tuning on aligned multimodal datasets, allowing users to customize key components based on their specific tasks. XPoint leverages modularity and self-supervision to allow for the adjustment of elements such as the base detector, which generates pseudo-ground truth keypoints invariant to viewpoint and spectrum variations. The framework integrates a VMamba encoder, pre-trained on segmentation tasks, for robust feature extraction, and includes three joint decoder heads: two dedicated to interest point and descriptor extraction, and a task-specific homography regression head that imposes geometric constraints for superior performance in tasks like image registration. This flexible architecture enables quick adaptation to a wide range of modalities, demonstrated by training on Optical-Thermal data and fine-tuning on settings such as visual-near infrared(0.75–1.4 $\mu$m), visual-infrared(3-8 $\mu$m), visual-longwave infrared(0.8–15 $\mu$m), and visual-synthetic aperture radar. Experimental results show that XPoint consistently outperforms or matches state-of-the-art methods in feature matching and image registration tasks across five distinct multispectral datasets.

XPoint

This is a PyTorch implementation of "XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration"

Installation

This software requires Python 3.8 or higher (Tested on 3.11.0).

Requirements can be installed with:

pip install -r requirements.txt

The repository includes pre-trained models for XPoint. However, to train the models, you need to download the dataset separately (see Dataset).

Dataset

Multispectral Image Pair Dataset

The dataset is hosted on the Autonomous Systems Lab dataset website, which also offers basic information about the data.

The dataset can be downloaded by running (from the xpoint directory):

python download_multipoint_data.py

A different target directory can be specified with the -d flag. You can force overwrite existing files by setting the -f flag. Please note that the dataset files are quite large (over 36 GB total), so the download process may take some time.

VEDAI Dataset

The VEDAI dataset can be downloaded from the official website.

VIS-NIR, VIS-IR and VIS-SAR Datasets

These datasets are proposed by RedFeaT. We uploaded datasets to the drive link. You can download the "nir_ir_sar_datasets.zip" file from the drive link.

Dataset Structure

The dataset is expected to be structured as one of the following examples:

1- HDF5 Files (set "filename" parameter in config files):

data
├── MULTIPOINT
│   ├── training.hdf5
│   └── test.hdf5
└── VEDAI
    ├── training.hdf5
    └── test.hdf5

2- Image Files (set "foldername" parameter in config files):

data
├── MULTIPOINT
│   ├── training
│   │   ├── optical
│   │   │   ├── 0001.png
│   │   │   ├── 0002.png
│   │   │   └── ...
│   │   └── thermal
│   │       ├── 0001.png
│   │       ├── 0002.png
│   │       └── ...
│   └── test
│       ├── optical
│       │   ├── 0001.png
│       │   ├── 0002.png
│       │   └── ...
│       └── thermal
│           ├── 0001.png
│           ├── 0002.png
│           └── ...

As it can be seen, the dataset is expected to be structured in a way that the training and test data are separated into different directories. The optical and thermal images are expected to be in separate directories. The image pairs are expected to have the same name in the optical and thermal directories. You can follow the same structure for other datasets, such as VEDAI, VIS-NIR, VIS-IR, and VIS-SAR.

Pre-trained Models

Pre-trained models for XPoint can be downloaded from the drive link. The pre-trained models are under "ALL_BESTS" folder and include "MP","VEDAI", "NIR", "IR" and "SAR" for respective datasets models. Download those models and store them in the model_weights directory.

The base model is "MP" and it is trained on the multispectral image pair dataset with a resolution of 256x256. Then, other models are finetuned on their respective datasets from the base model.

Usage

In the following section the scripts to train and visualize the results of XPoint are explained. For each script, additional help on the input paramaters and flags can be found using the -h flag (e.g. python show_keypoints.py -h).

Benchmark on Predicting Keypoints and Homography

The performance of the trained XPoint can be evaluated by executing the benchmark.py script.

Example benchmark on multipoint's dataset:

python benchmark.py -y configs/cipdp.yaml -m model_weights/ALL_BESTS -v MP -e -p

Here the '-y' flag specifies yaml file ,the -m flag specifies the model weights, the -v flag the version of the model, the -e flag computes the metrics for the whole dataset, and the -p flag plots the results of some samples. The yaml file specifies the dataset and the model parameters.

Individually Predicting Repeatibility Score

Predicting only keypoints can be done executing the predict_keypoints.py script. The results are plotted by adding the -p flags and the metrics for the whole dataset are computed by adding the -e flag.

Predicting the Matching and Homography Estimation Score

Predicting the alignment of an image pair can be done using the predict_align_image_pair.py script. The resulting keypoints and matches can be visualized by adding the -p flag. The metrics over the full dataset are computed when adding the -e flag.

Generating Keypoint Labels

Keypoint labels for a given set of image pairs can be generated using:

python export_keypoints.py -o tmp/labels.hdf5 -m model_weights/RIFT2 -v none

where the -o flag defines the output filename. The base detector and the export settings can be modified by making a copy of the configs/config_export_keypoints.yaml config file, editing the desired parameters, and specifying your new config file with the -y flag. -m flag specifies the model weights, and the -v flag specifies the version of the model.

python export_keypoints.py -y configs/custom_export_keypoints.yaml -o tmp/labels.hdf5

Visualizing Keypoint Labels

The generated keypoint labels can be inspected by executing the show_keypoints.py script:

python show_keypoints.py -d data/MULTIPOINT/training.hdf5 -k tmp/labels.hdf5 -n 100

The -d flag specifies the dataset file, the -k flag the labels file, and the -n flag the index of the sample which is shown.

Visualizing Samples from Datasets

By executing the following command:

python show_image_pair_sample.py -i tmp/test.hdf5 -n 100

the 100th image pair of the tmp/test.hdf5 dataset is shown.

Training XPoint

XPoint can be trained by executing the train.py script. All that script requires is a path to a yaml file with the training parameters:

python train.py -y configs/cmt.yaml

The hyperparameter for the training, e.g. learning rate, model parameters, can be modified in the yaml file.

Citing

If you use this code in your research, please consider citing the following paper:

TODO: Add citation

Credits

TODO Add credits

About

Multimodal Image Matching Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages