-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spellchecking warnings for certain tag names (to catch typos by users) #7754
Comments
@peternewman This sounds cool in theory. I'm not sure if it'd be useful in practice—it might be! I'm a bit worried about the size of shipping a spelling dictionary in iD or the complexity of calling out to an API. Generally I imagine that mappers use either iD's presets or TagInfo suggestions for selecting tags, so in either case the spelling is handled for them. Do you have any sense of how widespread spelling problems are? A few hundred misspelled tags per year would be regretable, but probably not a major issue. #4579 is about flagging tags that don't appear in the OSM database yet… perhaps that would be sufficient? |
As mentioned, if you only pick "relevant" words and filter the dictionary the resulting file shouldn't end up too large. I'd agree calling out to an API is probably more hassle than it's worth.
I'd agree iD presets should be fine (or can be fixed and typos handled at source). You have to use TagInfo for stuff like my ramp=separate as there aren't presets for it, and I don't think presets would really make sense. The problem is that TagInfo is the source of the problem: I'm not sure it's possible to pick a level where typos like sepErate are ignored without filtering out some possible but infrequent suggestions too. E.g. the value you've picked in #7203 won't work for this case. There's a chance it might have if it was in from the beginning, but that doesn't cover all the other editors.
I'm not sure off-hand. If there's an easy way to dump the key-value pairs from OSM it would be pretty easy to generate some general stats on it.
That looks like that's primarily keys not values? |
I proposed starting out with keys, but in principle the same mechanism could be extended to values if we can reliably distinguish enumerated keys from freeform keys: #4579 (comment). |
So I, and previously 90 other people, misspelt pavilion with two L's while tagging in OSM (on my part because it didn't show in the drop down of the building key, yes I probably should have use a preset or noticed it didn't flag one).
So I've created https://github.com/openstreetmap/iD/pull/7749/files which I think will fix the outstanding ones.
I've also done a codespell ( https://github.com/codespell-project/codespell/ ) run across the repo and fixed the obvious issues there in #7752 .
However it seems to me there are options to possibly improve the user tagging experience (and stop typo tags gaining traction), by effectively spellchecking a subset of the tags, probably simply based upon an approved list of possible typos.
For example take the codespell dictionaries:
https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/
And do a reverse lookup against the presets for any possible typos in keys or values and just use them (which avoids a new tag which can appear as a typo being incorrectly flagged). Obviously don't check Name/Brand/Operator etc (or any of the address bits). Perhaps as a safer option just check tags which exist in the presets or something. So for example because there is a preset of building=pavilion, you'd search codespell for typos for building and find:
And pavilion and find:
pavillion->pavilion
Therefore if I type any of the words on the left, it flags an issue that I probably mean the RHS.
The text was updated successfully, but these errors were encountered: