Skip to content

Can't extend {way_name} with an option on Transifex #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yuryleb opened this issue Apr 6, 2017 · 10 comments
Closed

Can't extend {way_name} with an option on Transifex #101

yuryleb opened this issue Apr 6, 2017 · 10 comments

Comments

@yuryleb
Copy link
Contributor

yuryleb commented Apr 6, 2017

I prepared new feature for OSRM Text Instructions - grammatical cases support for way names. This is quite important for many languages (Russian, Ukrainian, Polish, Bulgarian, Finnish, Estonian and many others) - otherwise translated strings with inserted way names (in nominative case as in OSM) look grammatically incorrect in most cases. Moreover, I don't know any other navigation system that could do this properly (except my humble attempt to improve Garmin TTS voices 😉). I would be proud to make OSRM Text Instructions the first system with grammatical case support but I can't 😞

The main idea of this future feature is in extending {way_name} and {rotary_name} variables in target language's strings with explicit grammatical case label - {way_name:accusative} for example - and processing way names with block of regular expressions grouped by this grammatical case.

But Trasifex rejects my attempts to change "Turn left onto {way_name}", in Russian "Поверните налево на {way_name}" to "Поверните налево на {way_name:accusative}" with "Missing {way_name} variable error" 😥 Other variants ({way_name}{accusative} for example) also don't work due to difference with variables from original English string.

My experience with TTS voices reveals that determining required grammatical case for part of whole text is not the simple task and could be easily done only for known and expected words combinations. Actually this could be implemented too but extends already big regular expressions file (already ~900 expressions for Russian street names only) some more.

I don't like the extending all translation strings with proposed grammatical case option - this has no sense for them. I don't want to extend English strings only with this option - this invalidates all existing translations. I would like to update Russian (currently) translation only. Maybe some Transifex checks could be suppressed/updated to allow non-exact match of variables?

@1ec5, @freenerd, @TheMarex, need your advice please.

@yuryleb
Copy link
Contributor Author

yuryleb commented Apr 6, 2017

Whole feature description for reference (from uncommitted changes):

Grammar support

Many languages - all Slavic (Russian, Ukrainian, Polish, Bulgarian, etc), Finnic (Finnish, Estonian) and others - have grammatical case feature that could be supported in OSRM Text Instructions too.
Originally street names are being inserted into instructions as they're in OSM map - in nominative case.
To be grammatically correct, street names should be changed according to target language rules and instruction's context before insertion.

Actually grammatical case applying is not the simple and obvious task due to real-life languages complexity.
It even looks so hard so, for example, all known native Russian navigation systems don't speak street names in their pronounceable route instructions at all.

But fortunately street names have restricted lexicon and naming rules and so this task could be relatively easily solved for this particular case.

Implementation details

The quite universal and simplier solution is the changing street names with the prepared set of regular expressions grouped by required grammatical case.
The required grammatical case should be specified right in instruction's substitution variables:

  • {way_name} and {rotary_name} variables in translated instructions should be appended with required grammar case name after colon: {way_name:accusative} for example
  • languages/grammar folder should contain language-specific JSON file with regular expressions for specified grammar case:
{
    "v5": {
        "accusative": [
            ["^ (\\S+)ая-(\\S+)ая [Уу]лица ", " $1ую-$2ую улицу "],
            ["^ (\\S+)ая [Уу]лица ", " $1ую улицу "],
            ...
  • All such JSON files should be registered in common languages.js
  • Instructions text formatter (index.js in this module) should:
    • check {way_name} and {rotary_name} variables for optional grammar case after colon: {way_name:accusative}
    • find appropriate regular expressions block for target language and specified grammar case
    • call standard string replace with regular expression for each expression in block passing result from previous call to the next; the first call should enclose original street name with whitespaces to make parsing words in names a bit simplier.
  • Strings replacement with regular expression is available in almost all other programming language and so this should not be the problem for other code used OSRM Text Instructions' data only.
  • If there is no regular expression matched source name (that's for names from foreign country for example), original name is returned without changes. This is also expected behavior of standard string replace with regular expression. And the same behavior is expected in case of missing grammar JSON file or grammar case inside it.

Example

Russian "Большая Монетная улица" street from St Petersburg (Big Monetary Street in rough translation) after processing with Russian grammar rules will look in following instructions as:

  • "Turn left onto {way_name}" => ru:"Поверните налево на {way_name:accusative}" => "Поверните налево на Большую Монетную улицу"
  • "Continue onto {way_name}" => ru:"Продолжите движение по {way_name:dative}" => "Продолжите движение по Большой Монетной улице"
  • "Make a U-turn onto {way_name} at the end of the road" => ru:"Развернитесь в конце {way_name:genitive}" => "Развернитесь в конце Большой Монетной улицы"
  • "Make a U-turn onto {way_name}" => ru:"Развернитесь на {way_name:prepositional}" => "Развернитесь на Большой Монетной улице"

Design goals

  • Cross platform - uses the same data-driven approach as OSRM Text Instructions
  • Test suite - has prepared test to check available expressions automatically and has easily extendable language-specific names testing pattern
  • Customization - could be easily extended for other languages with adding new regular expressions blocks into grammar support folder and modifying {way_name} and other variables in translated instructions only with necessary grammatical case labels

Notes

@oxidase
Copy link

oxidase commented Apr 6, 2017

👍 would be nice to have correct grammatical cases!

The note about OSRM is doable, but would require a change in datafacde interface to virtual StringView GetStringForID(const NameID id, StringType, ISOLocale) const = 0;, that will involve changes in lua profiles, extraction, path annotation and name table parts /cc @TheMarex @danpat

@yuryleb
Copy link
Contributor Author

yuryleb commented Apr 6, 2017

@oxidase, BTW actually it's a bit harder to apply grammatical cases to Ukrainian streets - they use "sorting" words order (that's street status name is always last) instead of "natural" words order in Russian OSM. But there is known solution for this - Garmin maps use "sorting" words order everywhere so I should be aware about this too 😉

@freenerd
Copy link
Member

freenerd commented Apr 6, 2017

Thanks @yuryleb, this is a great issue providing a lot of awesome context. Awesome 🎉

Overall this sounds like an interesting problem we want to solve in osrm-text-instructions and the plan you outlined sounds reasonable.

We are looking into if there are other language-specific cases we would want to handle with a similar pattern and only want to integrate something that can hit more of these cases at once. That will take a bit of time. If you want to go ahead and implement your approach now we are happy top consider it, but I can not guarantee you we will merge it soon.

I'm not sure yet how we'd solve the problem with Transifex variables. As an ugly hack we have overrides right now, but these so far were not used for adding new content, which would remove some of the translation from Transifex and therefore make translating harder.

The ruleset you build for the Garmin TTS is very impressive. Including these in the language JSON will likely make sense then. Likely these could be added via overrides, since they don't have to be handled in Transifex.

@yuryleb
Copy link
Contributor Author

yuryleb commented Apr 6, 2017

Actually the JSON with regular expressions to apply grammatical cases only looks not so huge 😉 I also worried about that. I tested my feature on local frontend instance and saw no any responses delays or memory overdraft (as it was possible without special profiling tools actually).

I feel it would be better to publish my changes as PR for public review. Of course, I don't expect it will be merged soon or merged at all.

@1ec5
Copy link
Member

1ec5 commented Apr 6, 2017

But Trasifex rejects my attempts to change "Turn left onto {way_name}", in Russian "Поверните налево на {way_name}" to "Поверните налево на {way_name:accusative}" with "Missing {way_name} variable error" 😥

Is it a warning, or does Transifex actually disallow the translation? If it’s just a warning, I’ve just made you a reviewer for Russian, so you can “review” the string to suppress the warning.

@yuryleb
Copy link
Contributor Author

yuryleb commented Apr 6, 2017

@1ec5, it was error for me early and I couldn't save my changes. I'll try now.

@yuryleb
Copy link
Contributor Author

yuryleb commented Apr 6, 2017

Thanks @1ec5 but I still can't save changes even with Reviewer rights with the same 'Missing {way_name} variable error':
transifexerror
And I don't think this will be a good solution - any new contributor with regular rights can't save his changes here without restoring original {way_name} instead of my {way_name:dative} here 😞

There should be some option in Transifex to control this...

@1ec5
Copy link
Member

1ec5 commented Apr 7, 2017

How tied are we to JSON, at least as the format we use to communicate with Transifex? It isn’t particularly well-suited for localization, as #19 points out. The native .stringsdict format on iOS and the Android strings XML format both have built-in support for grammatical gender and number. It’s unfortunate that we have to roll our own ad-hoc translation format atop JSON; we’d probably end up having to reverse engineer that into .stringsdict and Android strings XML for the native platforms.

@yuryleb
Copy link
Contributor Author

yuryleb commented Oct 4, 2017

Added avoidance in redesigned #102 to insert grammatical case labels in override post-Transifex script (for Russian language only)

@yuryleb yuryleb closed this as completed Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants