Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presets: Separate terms into synonyms and keywords #6139

Open
westnordost opened this issue Apr 3, 2019 · 17 comments
Open

Presets: Separate terms into synonyms and keywords #6139

westnordost opened this issue Apr 3, 2019 · 17 comments
Labels
preset An issue with an OpenStreetMap preset or tag

Comments

@westnordost
Copy link
Contributor

westnordost commented Apr 3, 2019

I recently had a closer look at the data in /data/presets/ and I must say, it's pretty awesome. Such a wealth of information and so well structured, great job guys! I hope this would be used more in other OSM projects. (As a matter of fact, I am currently working on a (Java-)library that should serve as a tags<->names dictionary which will exclusively scrape it's data from here)

So, to the point, I have a suggestion that could improve searchability of the presets even more. Currently, there is the name and there are the terms. I suggest to split up terms into "further names" (synonyms) and keywords.

Advantages I see

  1. search matches with synonyms can get a higher priority than for keywords - possibly just below name and even before suggestions (brand names)
  2. for items for which the search matches a synonym, that synonym could be displayed in the result list instead of the primary translation/most-well-known one (which might, depending on the area, not actually be the most-well-known one). Currently, only that translation is shown (Spielbank is a synonym for Kasino in German): spielbank
    The current behavior is understandable because iD can't know whether a term in terms is actually a synonym or just a keyword (and a keyword shouldn't be displayed in the result list).

Dealing with current translations

An unproblematic migration in respect to the current translations could work like this: A new translatable field named synonyms is introduced (or name field being made into names) and all translations in terms (perhaps renamed to keywords for clarity) just stay where they are. The behavior does not change. Then, translators can gradually move terms that are actually synonyms into the other field to profit from the features the separation brings.

What do you think?

@BjornRasmussen
Copy link
Contributor

That would be great step towards improving the searchability of presets. +1

@quincylvania
Copy link
Collaborator

I recently had a closer look at the data in /data/presets/ and I must say, it's pretty awesome. Such a wealth of information and so well structured, great job guys!

@westnordost On behalf of the project, thank you! Our presets are especially great because so many individuals have contributed to them. A new preset makes for an excellent first PR.

I hope this would be used more in other OSM projects. (As a matter of fact, I am currently working on a (Java-)library that should serve as a tags<->names dictionary which will exclusively scrape it's data from here)

Sounds terrific! Be aware that the preset schema can change fairly often to suit the needs of iD.

I suggest to split up terms into "further names" (synonyms) and keywords.

This is an interesting idea. We'll need to think about this more but here are my initial thoughts:

  • "Synonyms" as you outline them remind me of Wikidata aliases.
  • I think if a preset showed up with a different label I would assume it was a different preset altogether. There are already confusingly similar presets in iD (e.g. Casino vs. Adult Gaming Center) because OSM tagging is so complex. I'd like for preset names to remain stable in the UI.
  • We're planning on adding an optional subtitle property to show below the preset name (see Add preset subtitle property #6137). It will also be used for search results and ranking. Perhaps this alone is an okay alternative?
  • Can you think of any other advantages? I personally find that the presets are already pretty searchable, do you disagree? I want to make sure we have a really compelling reason to do this since it could make a lot of work for translators.

@quincylvania quincylvania added considering Not Actionable - still considering if this is something we want preset An issue with an OpenStreetMap preset or tag labels Apr 4, 2019
@westnordost
Copy link
Contributor Author

westnordost commented Apr 4, 2019

Can you think of any other advantages? I personally find that the presets are already pretty searchable, do you disagree?

I don't disagree, it is pretty well searchable and I am impressed, I just think that it may be improved if this is done.

Perhaps both the advantages I mentioned are more apparent with (at least the) German locale. The majority of terms on this locale are actually synonyms, very few keywords there. At least for me, it felt odd that for the thing I am searching for, the best match is not the text I entered and also not highlighted in any way.

Of course, that the search word actually matches a preset 100% because it is a synonym rather than a keyword could be shown in a different way, like with subtitles as you suggest. (Title: [Primary translation] Subtitle: a.k.a [matched Synonym], or the other way around)

Another cumulative idea is to highlight the section of the the matched word that matches the input text (in bold or underlined), so when searching for "Spiel", the results that actually contain the word are highlighted, i.e.

  • Spielbank
  • Spielwarengeschäft
    etc., like in Google:
    Bildschirmfoto 2019-04-04 um 12 19 00

This cumulative idea (shall I create a separate ticket?) would work better together with this idea because it would be confusing if the best match is the only match that is not highlighted.

@quincylvania
Copy link
Collaborator

quincylvania commented Apr 9, 2019

I've been thinking more about this and I think we should do it. So far I've been using the subtitle property as a mixture of synonyms and short descriptions, but it'd be better to keep them separate.

I suggest we limit subtitle to descriptions, use terms for arbitrary internal search phrases, and add an aliases property as an array of display-ready strings in order of priority, like so:

"name": "Events Venue",
"subtitle": "Rentable facility for events like weddings and banquets",
"aliases": [
    "Event Space",
    "Wedding Venue",
    "Banquet Hall",
],
"terms": [
    "celebration",
    "party"
]

When a search only matches an alias, we could show the alias alongside the name and subtitle rather than as a replacement. (We may need an alternative design for long text.)

Screen Shot 2019-04-09 at 7 37 56 AM

The main benefit I see to this addition is that users can feel more confident with their choice of preset rather than wondering "Is this the same as the thing I'm trying to add?"

Another cumulative idea is to highlight the section of the the matched word that matches the input text

@westnordost This could be good, we'd have to see how it looks in practice. Feel free to create a separate issue!

@quincylvania quincylvania removed the considering Not Actionable - still considering if this is something we want label Apr 9, 2019
@westnordost
Copy link
Contributor Author

Cool, that sounds great! I have a load of German aliases to dump into Transifex from an earlier attempt to localize primary features :-)

Where does the subtitle come from? taginfo displays a similar description, I think it comes from the wiki. An option also for iD?

@quincylvania
Copy link
Collaborator

Where does the subtitle come from? taginfo displays a similar description, I think it comes from the wiki. An option also for iD?

@westnordost We'll be adding the subtitle as a preset property native to iD in #6137. The wiki descriptions are unsuitable for iD since:

  • They require API calls to get them so we can't really display them instantly for every feature in the search results.
  • The don't all map directly to iD presets (which can be defined by multiple tags).
  • Many are longer and more detailed than we need here.
  • They are of varying quality.
  • They may not reflect how iD interprets a feature.

@westnordost
Copy link
Contributor Author

westnordost commented Apr 10, 2019

By the way, terms is currently a comma-separated string, I think this is for Transifex-reasons. So aliases should probably follow the same scheme. Except of course, if it is an option to convert this for consumption already when generating the final presets.json.

@westnordost
Copy link
Contributor Author

Also, maybe it would be better to not add a aliases translation key, but instead rename name to names. The first item in that list would then be the the one that is shown as the name, the others are the aliases.

Why?

As due to the nature of Transifex or any/most other translation portals, it encourages users to add something to every translation key because otherwise it is flagged as a missing translation.
So even if there is no real alias for a map feature, translators are pushed towards maybe even adding aliases that do not fit 100%. This is already a problem with the terms key. In the German translation, I often see the workaround that the name is copied into the terms field to just not have an empty translation.
This would be avoided when using the names field.

Additionally, what should be the primary name (instead of just an alias) could become the topic of smaller edit wars after the introduction of aliases. When the name+aliases are kept in one translation key, this could be defused a bit because then it just becomes a matter of rearranging the names (by importance/common-ness). Otherwise, it would involve deleting and replacing the name and hopefully adding that replaced name to the alias key in the same breath.

@quincylvania
Copy link
Collaborator

quincylvania commented Apr 18, 2019

Also, maybe it would be better to not add a aliases translation key, but instead rename name to names. The first item in that list would then be the the one that is shown as the name, the others are the aliases.

An interesting idea, but I prefer keeping the name and the aliases separate. Using names would complicate a lot of the code in iD, plus it would entail more much work for translators compared to a purely additive change. The displayed completion percentage of a translation is ultimately arbitrary, but if it's really an issue we can consider a smarter solution like allowing an explicit "null" translation that iD will ignore.

@1ec5
Copy link
Collaborator

1ec5 commented Apr 19, 2019

Using names would complicate a lot of the code in iD, plus it would entail more much work for translators compared to a purely additive change.

iD’s build process already has to transform the strings it exports from Transifex into a suitable format at runtime, so could that transformation also partition “name” into “name” and “aliases”? (Leaving it as “name” despite the possibility of multiple lines of translations would avoid makework for translators.) The main challenge is communicating to translators that multiple names are allowed, but Transifex does show any instructions in comments in a prominent position.

The displayed completion percentage of a translation is ultimately arbitrary, but if it's really an issue we can consider a smarter solution like allowing an explicit "null" translation that iD will ignore.

The percentage may seem like a trifle from the perspective of the project’s developers, but it’s pretty much the only thing that motivates many translators. Seeing the percentage get stuck due to untranslatable strings can be discouraging.

@quincylvania
Copy link
Collaborator

The percentage may seem like a trifle from the perspective of the project’s developers, but it’s pretty much the only thing that motivates many translators. Seeing the percentage get stuck due to untranslatable strings can be discouraging.

@1ec5 Thanks for noting this, it's useful to hear. We'll come up with some way to allow 100% translation without encouraging bad translations.

@westnordost
Copy link
Contributor Author

I am now working on implementing this

@quincylvania quincylvania added the waitfor Waiting for something label Jun 11, 2019
@quincylvania quincylvania removed the waitfor Waiting for something label Dec 9, 2020
@quincylvania
Copy link
Collaborator

iD's presets have been spun out to id-tagging-schema, so it seems time to revisit this feature.

I like the idea of allowing multiple values in the name field, and then putting everything after the first value into a separate aliases property, since it makes aliases optional for translators. Example: "name": "Indoor Corridor;Hallway;Passageway"

@mbrzakovic
Copy link
Collaborator

To summarize (@quincylvania to confirm): Idea is to have aliases and search terms for presets where aliases would be 'stronger' suggestion.

This feature is very good and a lot has already been done, I am dropping the 2.20 milestone in order to speed-up next release, however this change should go in the upcoming cycle.

@mbrzakovic mbrzakovic removed this from the 2.20.0 milestone Jul 13, 2021
@westnordost
Copy link
Contributor Author

Thank you for the update, Milos! I wasn't monitoring the state of the id presets (because obviously it was in hiatus) but when a new version of the id tagging presets are released that includes these and there is reasonable adoption by translators, I should revisit my https://github.com/westnordost/osmfeatures library to include searching by aliases as well.

It seems however that the differentiation between aliases and terms is not already pushed to Transifex, which means that translators can not yet start adding terms/aliases to the different presets. Before the version of iD in which this is introduced is released, it would be good to give translators a generous head start to supply the translations for this.

@mbrzakovic
Copy link
Collaborator

I see, thanks for this info. Plan is to get some work done on id-tagging-schema soon and I think alias translation should be included.

@westnordost
Copy link
Contributor Author

Part of this ticket would be to use the alias also in the UI, see e.g. #6139 (comment) and the following post by Quincy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preset An issue with an OpenStreetMap preset or tag
Projects
None yet
Development

No branches or pull requests

6 participants