Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate Geonames 'City of' prefixes #1609

Closed
wants to merge 1 commit into from
Closed

Commits on Mar 2, 2022

  1. feat(dedupe): Handle Geonames 'City of' prefixes

    A common cause of deduplication errors is Geonames locality/localadmin
    records that start with 'City of'.
    
    Our name comparison logic is fairly conservative: it only looks at
    things like punctuation, diacriticals, etc. Otherwise, we have to
    consider names that are different meaning the underlying records
    represent genuinely different places.
    
    Getting too far away from this general stance could be dangerous, but we
    can handle specific outliers just fine.
    
    Geonames records that start with 'City of' are one of these cases.
    Often, there is a Geonames `locality` record with just the name, (like
    'New York'), and then a Geonames `localadmin` record with the 'City of'
    prefix. Usually only one of those records will have a WOF concordance,
    so this is still helpful even combined with
    #1606
    orangejulius committed Mar 2, 2022
    Configuration menu
    Copy the full SHA
    3f72bb7 View commit details
    Browse the repository at this point in the history