Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data for AtmOperators (and probably other equivalent ones) #2194

Closed
peternewman opened this issue Oct 24, 2020 · 10 comments
Closed

Improve data for AtmOperators (and probably other equivalent ones) #2194

peternewman opened this issue Oct 24, 2020 · 10 comments

Comments

@peternewman
Copy link
Collaborator

peternewman commented Oct 24, 2020

Use case
There is currently a lot of duplication within AtmOperators. Some of which some rudimentary de-duplication would fix:

  - Note Machine # 67
  - NoteMachine # 26
  - Notemachine # 7

Also:

 - Tesco # 158
  - tesco # 6

See also Coop in here (but more complicated):
https://github.com/westnordost/StreetComplete/blob/0a1a722fece9b8706915806ebfebd5cd901e8b5d/res/country_metadata/atmOperators.yml#L1753-L1844

I've not looked or answered those quests yet, but I suspect charging stations and clothes containers would benefit from the same stuff too.

Proposed Solution
Simple short-term fix, take each returned name, lower case it and remove any punctuation ([\s,.:;-_], then group by that de-duplicated key. Return the most popular entry within each de-duplicated key. So in my two examples above Note Machine and Tesco. Optionally bump the count (if used) to be the merged total.

Longer term, switch to https://github.com/osmlab/name-suggestion-index/ when available, e.g. osmlab/name-suggestion-index#2883 for ATMs.

There are already charging stations:
https://nsi.guide/index.html?t=brands&k=amenity&v=charging_station

I'm sure they'd be happy to add recycling container operators too.

@westnordost
Copy link
Member

westnordost commented Oct 24, 2020

StreetComplete just counts the data as it is currently used and sorts it by usage. If you want certain spelling variants to establish, then you have to do this yourself. Long term solution might be NSI, yes.

Simple short-term fix, take each returned name, lower case it and remove any punctuation ([\s,.:;-_],

eh no, StreetComplete should not just modify the operator names to its own whim. Who is to say wether it should be "Post", "Deutsche Post" or "Deutsche Post AG"? Not StreetComplete.
Edit: Your concrete suggestion to take the most popular key has the issue that if two are relatively popular (there is no clear winner), it would just take the one that is a little more popular and drop the a little less popular one.

@westnordost
Copy link
Member

So, StreetComplete does only expedite harmonization of this tags here insofar as that often-used values are suggested, but nothing more.

@matkoniecz
Copy link
Member

matkoniecz commented Oct 24, 2020

Ones that are obvious and appearing just few times are easy to fix - for tesco see on a computer https://overpass-turbo.eu/s/Zmo (overpass turbo has poor mobile interface).

Select run button to run query, select export to export objects to JOSM/level0. Or even open them one by one in iD.

Note https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct - manual fixing 6 tesco should be OK, but for NoteMachine I would consult with a local community.

@peternewman
Copy link
Collaborator Author

But given Tesco and tesco surely you can just suggest the most popular one @westnordost even if we ignore any other processing like white space that I initially suggested?

Or apply some ratio, such as when they are gathered discard any that have less than say 10% of the total grouping (at lease then you'd only offer two of the Note Machine's and one of the Tesco's).

I accept your post example is hard and can probably only be solved by NSI.

Given charging stations already exist in NSI, can you switch to their feed of those now to avoid people continuing to populate the map with data that is less accurate?

Charity shop looks a pretty good equivalent to recycling clothes too (as I suspect that's where they go):
https://nsi.guide/index.html?t=brands&k=shop&v=charity&tt=charity

@matkoniecz
Copy link
Member

But given Tesco and tesco surely you can just suggest the most popular one @westnordost even if we ignore any other processing like white space that I initially suggested?

Fixing this manually is simpler, doable by more people and actually fixes problem not just hides it in StreetComplete. It can be done by any mapper with access to PC.

Larger scale edits benefit from scripting, but there is decent chance that posting about it on your local mailing list will reveal person happy to run such edit (assuming that there is consensus which form is preferable)

@peternewman
Copy link
Collaborator Author

Note https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct - manual fixing 6 tesco should be OK, but for NoteMachine I would consult with a local community.

Agreed, also some research needs to be done for which Note Machine is correct, whereas I know Tesco, Barclays and Sainsbury's dont have a lower case first letter.

@matkoniecz
Copy link
Member

See also https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/fix_overly_verbose_Euronet_Sp._z_o.o. (fixes one problem in area local to me, may be useful for copy-pasting template of documentation page and includes source code of a bot script)

@peternewman
Copy link
Collaborator Author

Fixing this manually is simpler, doable by more people and actually fixes problem not just hides it in StreetComplete. It can be done by any mapper with access to PC.

I'd disagree, unless I co-ordinate with SC and all past versions are blocked, the problem will just keep popping up regularly as people using the old versions will keep tagging tesco. You need to at least get the current data clear and the source to stop new stuff popping up. At least with this being in beta there is a chance to cleanse the data before it spreads too wide.

Although NSI probably needs sorting to fix any new ones which do appear.

Also from a usability point of view it's a bit bewildering when I type in Note and it offers three practically identical options.

@matkoniecz
Copy link
Member

matkoniecz commented Oct 24, 2020

the problem will just keep popping up regularly as people using the old versions will keep tagging tesco

That would require combination of

  • people not updating SC
  • encountering new locations where Tesco is an answer
  • selecting incorrect version while tagging

If that actually happens, some filtering (maybe NSI based) maybe may be worth doing. But for now just cleaning OSM data should be good enough (and should be done anyway).

@westnordost
Copy link
Member

But given Tesco and tesco surely you can just suggest the most popular one @westnordost even if we ignore any other processing like white space that I initially suggested?

For Tesco maybe, but this is exactly what I am not going to concern myself with - manually searching through this list of popular operators and deciding for each whether one spelling is acceptable or even synonymous to one thing or another. The list is created automatically from current data, I am not going to meddle manually with it.

Switching to NSI becomes an option once most well used operators are contained in the NSI-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants