Skip to content

Cross-gazetteer record linking of natural features in Switzerland using machine learning (random forests) and handcrafted rules.

Notifications You must be signed in to change notification settings

eacheson/machine-learning-gazetteer-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

machine-learning-gazetteer-matching

This repo features code, annotated data, and results for the IJGIS paper Machine learning for cross-gazetteer matching of natural features.

Notebooks

Jupyter notebooks are in the top-level of this repo, numbered according to the order in which they should be run, and organized into 3 numbered subsets:

  • 0_ : (00, 01, 02): preparation, preprocessing
  • 1_ : (10, 11, 12, 13, 14): rule-based matching
  • 2_ : (20, 21): machine learning based matching using random forests

Note these notebooks rely heavily on code in the gazmatch folder.

Data

In /data/, we share our annotated data, annotated_sample.csv as well as some serialized files, including test_set_ids.pkl for the feature-type-balanced test set used in a subset of experiments. The latest GeoNames and SwissNames3D data can be obtained online:

Note these datasets will not be identical to the ones used in this paper, which were downloaded in 2017. In particular, SwissNames3D may change UUIDs for certain records in newer versions. Data preparation involving the raw datasets is described and performed in the preparation notebooks. Contact the first author of the associated paper with any data requests.

Results

The /results/ folder contains tsv files used to plot the results in the paper. The /html_exports/ contains html exports of all the notebooks for easy viewing in a browser.

About

Cross-gazetteer record linking of natural features in Switzerland using machine learning (random forests) and handcrafted rules.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published