Tavis Shore Simon Hadfield Oscar Mendez
Centre for Vision, Speech, and Signal Processing (CVSSP)
University of Surrey, Guildford, GU2 7XH, United Kingdom
Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360° field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70° crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
🚧 Under Construction
conda env create -f requirements.yaml
Model | Orientation Aware |
R@1 | R@5 | R@10 | R@1% | R@1 | R@5 | R@10 | R@1\% |
---|---|---|---|---|---|---|---|---|---|
CVUSA 90° | CVUSA 70° | ||||||||
CVM | ❌ | 2.76 | 10.11 | 16.74 | 55.49 | 2.62 | 9.30 | 15.06 | 21.77 |
CVFT | ❌ | 4.80 | 14.84 | 23.18 | 61.23 | 3.79 | 12.44 | 19.33 | 55.56 |
DSM | ❌ | 16.19 | 31.44 | 39.85 | 71.13 | 8.78 | 19.90 | 27.30 | 61.20 |
L2LTR | ❌ | 26.92 | 50.49 | 60.41 | 86.88 | 13.95 | 33.07 | 43.86 | 77.65 |
TransGeo | ❌ | 30.12 | 54.18 | 63.96 | 89.18 | 16.43 | 37.28 | 48.02 | 80.75 |
GeoDTR | ❌ | 18.81 | 43.36 | 57.94 | 88.14 | 14.84 | 38.03 | 51.27 | 88.17 |
BEV-CV | ❌ | 15.17 | 33.91 | 45.33 | 82.53 | 14.03 | 32.32 | 43.25 | 81.48 |
GAL | ≈ | 22.54 | 44.36 | 54.17 | 84.59 | 15.20 | 32.86 | 42.06 | 75.21 |
DSM | ✅ | 33.66 | 51.70 | 59.68 | 82.46 | 20.88 | 36.99 | 44.70 | 71.10 |
L2LTR | ✅ | 25.21 | 51.90 | 63.54 | 91.16 | 22.20 | 46.71 | 58.99 | 89.37 |
TransGeo | ✅ | 21.96 | 45.35 | 56.49 | 86.80 | 17.27 | 38.95 | 49.44 | 81.34 |
GeoDTR | ✅ | 15.21 | 39.32 | 52.27 | 88.72 | 14.00 | 35.28 | 47.77 | 86.39 |
BEV-CV | ✅ | 32.11 | 58.36 | 69.06 | 92.99 | 27.40 | 52.94 | 64.47 | 90.94 |
CVACT 90° | CVACT 70° | ||||||||
CVM | ❌ | 1.47 | 5.70 | 9.64 | 38.05 | 1.24 | 4.98 | 8.42 | 34.74 |
CVFT | ❌ | 1.85 | 6.28 | 10.54 | 39.25 | 1.49 | 5.13 | 8.19 | 34.59 |
DSM | ❌ | 18.11 | 33.34 | 40.94 | 68.65 | 8.29 | 20.72 | 27.13 | 57.08 |
L2LTR | ❌ | 13.07 | 30.38 | 41.00 | 76.07 | 6.67 | 15.94 | 23.45 | 49.37 |
TransGeo | ❌ | 10.75 | 28.22 | 37.51 | 70.15 | 7.01 | 19.44 | 27.50 | 62.19 |
GeoDTR | ❌ | 26.53 | 53.26 | 64.59 | 91.13 | 16.87 | 40.22 | 53.13 | 87.92 |
BEV-CV | ❌ | 4.14 | 14.46 | 22.64 | 61.18 | 3.92 | 13.50 | 20.53 | 59.34 |
GAL | ≈ | 26.05 | 49.23 | 59.26 | 85.60 | 14.17 | 32.96 | 43.24 | 77.49 |
DSM | ✅ | 31.17 | 51.44 | 60.05 | 82.90 | 18.44 | 35.87 | 44.39 | 71.97 |
L2LTR | ✅ | 33.62 | 46.28 | 58.21 | 78.62 | 28.65 | 53.59 | 65.02 | 90.48 |
TransGeo | ✅ | 28.16 | 34.44 | 41.54 | 67.15 | 24.05 | 42.68 | 55.47 | 80.72 |
GeoDTR | ✅ | 26.76 | 53.65 | 65.35 | 92.12 | 15.38 | 37.09 | 49.40 | 86.38 |
BEV-CV | ✅ | 45.79 | 75.85 | 83.97 | 96.76 | 37.85 | 69.00 | 78.52 | 95.03 |
If you find BEV-CV useful for your work please cite:
@INPROCEEDINGS{bevcv,
author={Shore, Tavis and Hadfield, Simon and Mendez, Oscar },
booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
title={BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation},
year={2024},
pages={11047-11054},
}