Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

Comparator to flag invalid primary tag combinations #102

Merged
merged 4 commits into from
Mar 16, 2017

Conversation

bkowshik
Copy link
Contributor

Objective

Per #94 (comment)

Amazing stats @bkowshik, this directly gives the probability of any tag combination on OSM. With these 286 invalid and 330 possibly erroneous tag combination, can we already start flagging them?

  • Happy about how we parse a csv file instead of a json, making it easy on the 👀

Next actions

  • Would love to get some 👀 @amishas157
  • Merge to master

cc: @planemad @geohacker

@geohacker
Copy link
Contributor

@bkowshik this uses a csv file for the tag combination scores - how do you plan to make this maintainable? should we use the taginfo API directly?

@bkowshik
Copy link
Contributor Author

At the moment, we have a script to generate the csv file here:

Options

@geohacker, I could think of the following 3 options:

  1. csv file in the repository
    • TagInfo is not queried for each feature
    • Simple to build and easy to understand
    • Needs to be updated manually using the script ^
  2. csv file created during deploy
    • TagInfo is not queried for each feature
    • Latest data is queried from TagInfo during deploy
    • Data is as frequent as the number of deploys
    • Makes the deploy a little slower
  3. Query TagInfo
    • TagInfo is queried for each feature which is a lot of features!!!
    • We have the latest data and no manual maintenance

Unless you are seeing something here, I am happy with the current setup, (option 1). We will get a better sense of what is needed once we deploy and 👀 the results for a few days.

@planemad
Copy link
Contributor

👍 the current CSV is good till we see some results from this. We can think about scaling with taginfo if this works as expected.

@amishas157
Copy link
Contributor

amishas157 commented Mar 14, 2017

@bkowshik This looks good to go. And also regarding keeping the tag info data updated, i think manually running scripts is fine. As there should not be much difference in tag-info data on daily basis, if we think of it in terms of percentage. But as we are scaling our comparators based on taginfo, we can put some cron jobs to run these scripts on a monthly basis or so.

@bkowshik bkowshik merged commit 63e93aa into master Mar 16, 2017
@bkowshik bkowshik deleted the invalid-tag-combinations branch March 16, 2017 04:20
@bkowshik
Copy link
Contributor Author

Published to npm as version: 4.14.0

@geohacker
Copy link
Contributor

🎉

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants