-
-
Notifications
You must be signed in to change notification settings - Fork 163
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(dedupe): Handle Geonames 'City of' prefixes
A common cause of deduplication errors is Geonames locality/localadmin records that start with 'City of'. Our name comparison logic is fairly conservative: it only looks at things like punctuation, diacriticals, etc. Otherwise, we have to consider names that are different meaning the underlying records represent genuinely different places. Getting too far away from this general stance could be dangerous, but we can handle specific outliers just fine. Geonames records that start with 'City of' are one of these cases. Often, there is a Geonames `locality` record with just the name, (like 'New York'), and then a Geonames `localadmin` record with the 'City of' prefix. Usually only one of those records will have a WOF concordance, so this is still helpful even combined with #1606
- Loading branch information
1 parent
6aa997d
commit 9eb9c98
Showing
2 changed files
with
44 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters