Skip to content

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).

Notifications You must be signed in to change notification settings

BaolanChen/Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 

Repository files navigation

Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval

Awesome

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).


📢 Latest Updates

🔥🔥🔥 Last Updated on 2024.12.24 🔥🔥🔥

  • 2024.12.24: Update CFITR.
  • 2024.12.09: Update CDMAN、MSA、KTIR、CMPAGL、CCLS2T、SARCI、FSISR and SCAT.
  • 2024.12.05: Update SIRS and HVSA.

Table of Contents

Remote Sensing Cross-Modal Image-Text Survey

Paper Title Publication Affiliation Note
Paper Vision-Language Models in Remote Sensing: Current progress and future trends GRSM 2024 King Abdullah University of Science and Technology
Paper Language Integration in Remote Sensing: Tasks, datasets, and future directions GRSM 2023 King Saud University
Paper Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works TGRS 2023 Central South University
Paper The Potential of Visual ChatGPT For Remote Sensing Arxiv 2023 University of Western São Paulo
Paper 遥感大模型:进展与前瞻 武汉大学学报 (信息科学版) 2023 Wuhan University

Remote Sensing Image-Text Datasets

Dataset Name Image size Image Resolution VLMs
UCM-Captions 613 256 × 256 -
Sydney-Captions 2,100 500 × 500 -
RSICD 10,921 224 × 224 -
RSITMD 4,743 256 × 256 -
NWPU-Captions 31,500 256 × 256 -
RS5M 5 million+ All Resolutions GeoRSCLIP
SkyScript 5.2 million+ All Resolutions SkyCLIP

Remote Sensing Cross-Modal Image-Text Retrieval Models

Paper Title Publication Affiliation Code Note
CFITR Toward Efficient and Accurate Remote Sensing Image–Text Retrieval With a Coarse-to-Fine Approach GRSL 2024 Beijing Foreign Studies University Github
CDMAN Thread the Needle: Cues-Driven Multi-Association for Remote Sensing Cross-Modal Retrieval TGRS 2024 Wuhan University of Technology -
MSA Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text Retrieval TGRS 2024 Xidian University Github
KTIR Knowledge-aware Text-Image Retrieval for Remote Sensing Images TGRS 2024 EPFL -
CMPAGL Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval TGRS 2024 Shanghai Maritime University Github
FGIS Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval JSTARS 2024 Chongqing University -
EBAKER Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning ACMMM 2024 Tianjin University -
CUP Cross-Modal Remote Sensing Image–Text Retrieval via Context and Uncertainty-Aware Prompt TNNLS 2024 Xidian University Github
CCLS2T Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval TGRS 2024 Xidian University -
MIIA Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval TGRS 2024 Northwestern Polytechnical University -
SARCI Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval TGRS 2024 Wuhan University of Technology Github
GLISA Masking-Based Cross-Modal Remote Sensing Image–Text Retrieval via Dynamic Contrastive Learning TGRS 2024 China University of Mining and Technology -
SCAT Spatial–Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval TGRS 2024 Northwestern Polytechnical University -
FSISR Cross-Modal Hashing With Feature Semi-Interaction and Semantic Ranking for Remote Sensing Ship Image Retrieval TGRS 2024 Harbin Institute of Technology -
SkyEyeGPT Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model Arxiv 2024 Northwestern Polytechnical University Github
MFF-SFE Cross-modal retrieval method based on MFF-SFE for remote sensing image-text 中国科学院大学学报 2024 Aerospace Information Research Institute, Chinese Academy of Sciences -
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing TGRS 2024 Hohai University Github
C2F-ITR From Coarse To Fine: An Offline-Online Approach for Remote Sensing Cross-Modal Retrieval IGARSS 2024 Beijing Foreign Studies University -
MGRM-EL Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval TGRS 2024 Northwestern Polytechnical University -
SIRS Multitask Joint Learning for Remote Sensing Foreground-Entity Image–Text Retrieval TGRS 2024 Soochow University Github
PIR A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval ACMMM 2023 oral Zhejiang University of Technology Github
PE-RSITR Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval TGRS 2023 Northwestern Polytechnical University Github
HVSA Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning TGRS 2023 Aerospace Information Research Institute, Chinese Academy of Sciences Github
SWAN Reducing Semantic Confusion Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval ICMR 2023 oral Zhejiang University of Technology Github
KAMCL Knowledge-Aided Momentum Contrastive Learning for Remote-Sensing Image Text Retrieval TGRS 2023 Tianjin University Github
IEFT Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval TGRS 2023 Xidian University Github
- A Texture and Saliency Enhanced Image Learning Method For Cross-Modal Remote Sensing Image-Text Retrieval IGARSS 2023 Xidian University -
Multilanguage Transformer Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval JSTARS 2022 King Saud University -
GaLR Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information TGRS 2022 Aerospace Information Research Institute, Chinese Academy of Sciences Github
- Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space IJRS 2022 National University of Defense Technology -
AMFMN Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval TGRS 2021 Aerospace Information Research Institute, Chinese Academy of Sciences Github
LW-MCR A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing TGRS 2021 Aerospace Information Research Institute, Chinese Academy of Sciences Github
VSE++ VSE++: Improving Visual-Semantic Embeddings with Hard Negatives BMVC 2018 spotlight University of Toronto Github

Remote Sensing Vision-Language Foundation Models

Abbreviation Title Publication Paper Code & Weights
RSGPT RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv2023 RSGPT link
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Arxiv2023 RemoteCLIP link
GeoRSCLIP RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 GeoRSCLIP link
GRAFT Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment ICLR2024 GRAFT -
CSP CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations ICML2023 CSP link
GeoCLIP GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization NeurIPS2023 GeoCLIP link
SatCLIP SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery Arxiv2023 SatCLIP link

 

问题、反馈和对此存储库的贡献

我欢迎各种反馈,最好通过GitHub Issues 分享。 同样,如果您有任何疑问或只是想与他人交流想法,请随时发布这些内容。

致谢

感谢相关论文、相关项目

引用

如果您发现本项目对您的研究有用,请考虑引用它。

About

A collection of papers, datasets, benchmarks, code, and model weights for Remote Sensing Cross-Modal Image-Text Retrieval (RSCMIT).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published