Skip to content

xlwang233/CaLLiPer

Repository files navigation

CaLLiPer - Contrastive Language-Location Pre-training

Welcome! This is the repository for CaLLiPer model presented in our paper Multimodal Contrastive Learning of Urban Space Representations from POI Data (preprint: https://arxiv.org/abs/2411.06229).

⭐ Highlights

  • Simple and effective representation learning for urban spaces using POI data.
  • The first multimodal contrastive learning model to align spatial and semantic information.
  • Improved conceptualisation of urban space representations through location encoding.
  • Enhanced modelling of POI semantics by pre-trained text encoders.
  • State-of-the-art performance and interpretability.

Requirements

  • pytorch >= 2.2.0
  • transformer >= 4.43.4
  • pytorch-lightning == 2.3.3
  • tensorboard >= 1.14.0
  • scikit-learn ...

Data

POI data

The use of OS POI data requires an Educational Licence. For demonstration purpose, we provide a sample of the POI data used in our study in /data/london_poi_202203/sample_poi.csv. You can easily construct your own coordinate-text pairs as training data similar to this.

Land use data

The land use data used in our experiment was derived from Verisk data, obtained through Digimap. Please find it in /data/landuse_classification/sampled_points_landuse.geojson

Socioeconomic status data

The NS-SeC dataset used in our experiment was obtained from ONS. The original data is stored in data/socioeconomic/lon_lsoa_ns-sec.csv. The preprocessing code of this dataset can be found in the evaluation notebook - sdm.ipynb

Training

Specifiy the hyperparameters in configs/default.yaml and run the following command for model training.

python main.py --cofing configs/default.yaml

The output will be saved in logs/{exp_name}/, including Tensorboard events and model checkpoints.

Testing

After finishing the pre-training, one can evaluate the resulting CaLLiPer model checkpoint on two downstream tasks - LUC (luc.ipynb) and SDM (sdm.ipynb). For ease of replicating the results presented in our paper, we provide the pre-trained checkpoint of CaLLiPer-SenTrans - shared through Google Drive. Please download it and put it in checkpoints/.

See the two Jupyter Notebooks, luc.ipynb and sdm.ipynb, for the complete downstream model training process. The resulting downstream models will be saved in downstream_res/.... If you do not feel like training downstream models yourself, we have also provided them - you probably have already found them in downstream_res/...

Note that due to the size of CaLLiPer-Llama, we have not provided its checkpoint in this repo at the moment but will consider sharing it if requested.

TODOs

We plan to add more stuff in the future:

✅ Pre-trained results for Space2Vec and HGI. See baselines/pretrained. :white_check_mark: Code for clustering visualisation (reproducing Figure 3 in the paper). See vis_clustering.ipynb. :black_square_button: Code for TF-IDF, LDA, Place2Vec. Coming soon.

Acknowledgements

The implementation of various location encoding methods is based on Space2Vec and Spherical Harmonics and Sinusoidal Representation Networks. This work has also been inspired by SatCLIP.

We appreciate their inspiring works.

Citation

If you find this repo useful for your research, please consider citing the following paper:

@article{wang2024multimodal,
  title={Multimodal Contrastive Learning of Urban Space Representations from POI Data},
  author={Wang, Xinglei and Cheng, Tao and Law, Stephen and Zeng, Zichao and Yin, Lu and Liu, Junyuan},
  journal={arXiv preprint arXiv:2411.06229},
  year={2024}
}

About

Contrastive Language-Location Pre-training

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published