VisIRNet is a deep learning model designed for aligning visible and infrared image pairs captured by UAVs. This repository contains the code and resources used in the paper "VisIRNet: Deep Image Alignment for UAV-Taken Visible and Infrared Image Pairs" published in IEEE Transactions on Geoscience and Remote Sensing.
- Clone the repository:
git clone https://github.com/ozerlabs-proxy/VisIrNet.git cd VisIRNet
- Create a virtual environment and activate it:
conda create -n VisIrNet python==3.10 conda activate VisIrNet
- Install the required packages:
pip install -r requirements.txt
-
create data under VisIrNet/
mkdir data cd data
-
link datasets to data
python ./scripts/link_datasets_to_data.py
or manually
cd data ln -s ~/Datasets/GoogleEarth . ln -s ~/Datasets/MSCOCO . ln -s ~/Datasets/SkyData . ln -s ~/Datasets/VEDAI . ln -s ~/Datasets/GoogleMap .
We did our experiments on a cluster of computers and GPUs with slurm. The scripts for training and inference are provided. The configs folder includes configuration files for models, datasets, loss functions etc. Choose and provide the the configuration file to the training script (feel free to adjust them).
# train locally
conda activate VisIrNet
python Train.py --config-file >>>skydata_default_config.json<<<
OR
# train with slurm
sbatch slurm-training.sh
#1. inference locally
conda activate VisIrNet
python Test.py --config-file skydata_default_config.json --r_loss_function l2_corners_loss --b_loss_function ssim_pixel
OR
#2. inference with slurm
sbatch slurm-inference.sh
visualize logs with tensorboard
# make sure conda env is activated
conda activate VisIrNet
tensorboard --logdir logs/tensorboard
The VisIRNet architecture is designed to handle the challenges of aligning visible and infrared images. Refer to the paper for detailed information about the model architecture and design choices.
The model was trained and tested on the SkyData, VEDAI, Google Earth, Google Maps, and MSCOCO datasets. The results are presented in the paper. The following tables show the quantitative results for the SkyData, VEDAI, Google Earth, Google Maps, and MSCOCO datasets.
Backbone losses choice
- ✓ mean_squared_error (mse_pixel) "l2"
- ✓ mean_absolute_error (mae_pixel) "l1"
- ✓ sum_squared_error (sse_pixel)
- ✓ structural_similarity (ssim_pixel)
registration losses choice
- ✓ l1_homography_loss
- ✓ l2_homography_loss
- ✓ l1_corners_loss
- ✓ l2_corners_loss
SkyData | VEDAI | |
---|---|---|
mse_pixel | ✓ | ✓ |
mae_pixel | ✓ | ✓ |
sse_pixel | ✓ | ✓ |
ssim_pixel | ✓ | ✓ |
Backbone | R_loss | SkyData | VEDAI |
---|---|---|---|
mse_pixel | l2_corners_loss | ✓ | ✓ |
mae_pixel | l2_corners_loss | ✓ | ✓ |
sse_pixel | l2_corners_loss | ✓ | ✓ |
ssim_pixel | l2_corners_loss | ✓ | ✓ |
If you use this code in your research, please cite our paper:
@article{ozer2024visirnet,
title={VisIRNet: Deep Image Alignment for UAV-Taken Visible and Infrared Image Pairs},
author={{\"O}zer, Sedat and Ndigande, Alain P},
journal={IEEE Transactions on Geoscience and Remote Sensing},
volume={62},
pages={1--11},
year={2024},
publisher={IEEE}
}
This project is licensed under the MIT License - see the LICENSE file for details.