Spellchecking warnings for certain tag names (to catch typos by users) #7754

peternewman · 2020-06-28T23:17:07Z

So I, and previously 90 other people, misspelt pavilion with two L's while tagging in OSM (on my part because it didn't show in the drop down of the building key, yes I probably should have use a preset or noticed it didn't flag one).

So I've created https://github.com/openstreetmap/iD/pull/7749/files which I think will fix the outstanding ones.

I've also done a codespell ( https://github.com/codespell-project/codespell/ ) run across the repo and fixed the obvious issues there in #7752 .

However it seems to me there are options to possibly improve the user tagging experience (and stop typo tags gaining traction), by effectively spellchecking a subset of the tags, probably simply based upon an approved list of possible typos.

For example take the codespell dictionaries:
https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/

And do a reverse lookup against the presets for any possible typos in keys or values and just use them (which avoids a new tag which can appear as a typo being incorrectly flagged). Obviously don't check Name/Brand/Operator etc (or any of the address bits). Perhaps as a safer option just check tags which exist in the presets or something. So for example because there is a preset of building=pavilion, you'd search codespell for typos for building and find:

buiding->building
buidling->building
bulding->building
buliding->building

And pavilion and find:
pavillion->pavilion

Therefore if I type any of the words on the left, it flags an issue that I probably mean the RHS.

The text was updated successfully, but these errors were encountered:

quincylvania · 2020-10-26T14:23:51Z

@peternewman This sounds cool in theory. I'm not sure if it'd be useful in practice—it might be! I'm a bit worried about the size of shipping a spelling dictionary in iD or the complexity of calling out to an API.

Generally I imagine that mappers use either iD's presets or TagInfo suggestions for selecting tags, so in either case the spelling is handled for them. Do you have any sense of how widespread spelling problems are? A few hundred misspelled tags per year would be regretable, but probably not a major issue.

#4579 is about flagging tags that don't appear in the OSM database yet… perhaps that would be sufficient?

peternewman · 2020-10-26T15:27:46Z

@peternewman This sounds cool in theory. I'm not sure if it'd be useful in practice—it might be! I'm a bit worried about the size of shipping a spelling dictionary in iD or the complexity of calling out to an API.

As mentioned, if you only pick "relevant" words and filter the dictionary the resulting file shouldn't end up too large. I'd agree calling out to an API is probably more hassle than it's worth.

Generally I imagine that mappers use either iD's presets or TagInfo suggestions for selecting tags, so in either case the spelling is handled for them.

I'd agree iD presets should be fine (or can be fixed and typos handled at source). You have to use TagInfo for stuff like my ramp=separate as there aren't presets for it, and I don't think presets would really make sense. The problem is that TagInfo is the source of the problem:
https://taginfo.openstreetmap.org/keys/?key=ramp#values

I'm not sure it's possible to pick a level where typos like sepErate are ignored without filtering out some possible but infrequent suggestions too. E.g. the value you've picked in #7203 won't work for this case. There's a chance it might have if it was in from the beginning, but that doesn't cover all the other editors.

Do you have any sense of how widespread spelling problems are? A few hundred misspelled tags per year would be regretable, but probably not a major issue.

I'm not sure off-hand. If there's an easy way to dump the key-value pairs from OSM it would be pretty easy to generate some general stats on it.

#4579 is about flagging tags that don't appear in the OSM database yet… perhaps that would be sufficient?

That looks like that's primarily keys not values?

1ec5 · 2020-10-29T02:04:58Z

#4579 is about flagging tags that don't appear in the OSM database yet… perhaps that would be sufficient?

That looks like that's primarily keys not values?

I proposed starting out with keys, but in principle the same mechanism could be extended to values if we can reliably distinguish enumerated keys from freeform keys: #4579 (comment).

quincylvania mentioned this issue Oct 26, 2020

iD forces users to add misspelled tag 'clothers' #7924

Closed

quincylvania added the considering Not Actionable - still considering if this is something we want label Oct 26, 2020

tordans mentioned this issue Sep 28, 2021

Filter misspelled twins #8725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spellchecking warnings for certain tag names (to catch typos by users) #7754

Spellchecking warnings for certain tag names (to catch typos by users) #7754

peternewman commented Jun 28, 2020

quincylvania commented Oct 26, 2020

peternewman commented Oct 26, 2020

1ec5 commented Oct 29, 2020

Spellchecking warnings for certain tag names (to catch typos by users) #7754

Spellchecking warnings for certain tag names (to catch typos by users) #7754

Comments

peternewman commented Jun 28, 2020

quincylvania commented Oct 26, 2020

peternewman commented Oct 26, 2020

1ec5 commented Oct 29, 2020