LIGHT: Multi-Modal Text Linking for Historical Maps

This repository provides training and inference scripts for LIGHT, an approach for linking text instances on historical maps.

LIGHT is a multi-modal learning model that jointly models text content, spatial layouts, polygonal shapes, and visual features to resolve complex linking challenges in scanned historical maps.

📢 The paper has been accepted for oral presentation at ICDAR 2025.
📄 View on arXiv

📚 Pretraining

1. Polygon Encoder Pretraining

CUDA_VISIBLE_DEVICES="0" torchrun \
  --nproc_per_node=1 \
  --nnodes=1 \
  --node_rank=0 \
  --master_addr=127.0.0.1 \
  --master_port=14476 \
  pretrain_poly.py --config configs/pretrain_poly.yaml

2. Full LIGHT Model Pretraining

CUDA_VISIBLE_DEVICES="0" torchrun \
  --nproc_per_node=1 \
  --nnodes=1 \
  --node_rank=0 \
  --master_addr=127.0.0.1 \
  --master_port=14476 \
  pretrain.py --config configs/pretrain_light.yaml

🔧 Fine-Tuning

python train.py --config configs/light.yaml

🔍 Inference

python inference.py \
  --test_dataset MapText_test \
  --out_file predict.json \
  --model_dir ./_weights/finetune_light \
  --anno_path icdar24-test-png-annotations.json \
  --img_dir icdar24-test-png/test_images/

📁 Notes

Create a Conda environment from env.yaml
All dataset configurations are in the dataset/buildin.py. You need to update the paths to your dataset and annotations. Contact Yijun Lin if you want to use the pretraining datasets. We use ICDAR24 MapText competition Rumsey dataset for finetuning and testing.
All config files are in the configs directory. You can modify hyperparameters or dataset settings.
You can download model weights from Google Drive: Polygon Pretrain Weights, LIGHT Pretrain Weights, LIGHT Fintuned Weights

🔗 References

If you find this repository useful in your own work, we would appreciate a citation to the accompanying paper:

@inproceedings{weinman2024counting,
   authors = {Lin, Yijun and Olson, Rhett and Wu, Junhan and Chiang, Yao-Yi and Weinman, Jerod},
   title = {LIGHT: Multi-Modal Text Linking on Historical Maps},
   booktitle = {19th International Conference on Document Analysis and Recognition ({ICDAR} 2025)},
   series = {Lecture Notes in Computer Science},
   publisher = {Springer},
   location = {Wuhan, China},
   year = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LIGHT		LIGHT
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LIGHT: Multi-Modal Text Linking for Historical Maps

📚 Pretraining

1. Polygon Encoder Pretraining

2. Full LIGHT Model Pretraining

🔧 Fine-Tuning

🔍 Inference

📁 Notes

🔗 References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

knowledge-computing/multimodal-text-linking

Folders and files

Latest commit

History

Repository files navigation

LIGHT: Multi-Modal Text Linking for Historical Maps

📚 Pretraining

1. Polygon Encoder Pretraining

2. Full LIGHT Model Pretraining

🔧 Fine-Tuning

🔍 Inference

📁 Notes

🔗 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages