-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start message format test suite #113
Conversation
Perhaps @nbouvrette, @stasm or @zbraniecki would like to start looking at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this, @grhoten! I have a few questions inline to help me understand where this sits on the ladder of abstraction.
<param name="relationship" type="String"><value>brother</value></param> | ||
</params> | ||
<source>I didn't find <var name="relationship" inflect="indefinite"/> in your contacts. What is your <var name="relationship" inflect="genitive"/> name?</source> | ||
<print>I didn't find a brother in your contacts. What is your brotherʼs name?</print> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should the variants like a brother
and brother's
be defined? Is that something the implementation should provide? Or would there be another resource editable by the translators where these terms would be defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a default set of data. It's a lexical dictionary. We used to have a set based off of Wiktionary, but we're switching away from it due to licensing issues around modification, completeness and cleanliness of the data. The lexical dictionary helps with all of the common edge cases. When a word is missing from the lexical dictionary, we use heuristics or machine learning to fill in the gaps. For English, the default rules are pretty simple.
This also reduces the amount of effort to translate the common stuff. Think of it like translation memory, but it's available at runtime instead of at build time.
If the default is obscure, like a product name, invented word or a company name, you can create something like a semantic concept where you define all of the semantic features to get the words into grammatical agreement. We call this dialog metadata, and it's stored in our semantic feature model. Other parts of this working group have used the more generic term "data model".
I can put in a Russian example to highlight how to do it by hand and how complicated it is to get a number and noun into grammatical agreement. It's harder to find a language tougher than Russian. I mean Arabic, Finnish and Turkish have their own difficult issues, but those complexities fit into an architecture that properly supports Russian.
<params> | ||
<param name="object" type="String"><value>lights</value></param> | ||
</params> | ||
<source>The <var name="object"/> <switch value="object" feature="number"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that the implementation is capable of inspecting the object (lights
) and extracting some grammatical information from it? (here: the grammatical number).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
</test> | ||
<test> | ||
<params> | ||
<param name="object" type="String"><value>light</value></param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the translation of the word light
get into this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is user vocabulary. I could have easily used the phrase "light on the front porch" or "Stanisław's party light". If you're dealing with something like HomeKit, the naming of various objects is up to the user.
It may not be translated. If it's translated, it would require additional infrastructure. It's stored in a SemanticFeatureModel, but I didn't define how it goes into a SemanticFeatureModel in these test cases. Fluent seems to have a similar concept to the translated phrases, but I think it's put into the same file of messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Russian tests to highlight how a SemanticConcept would work as a quantity. I used Latin text to highlight that it's using the declared data instead of a lexical dictionary, how the number changes pronunciation and how the grammatical case changes as the value changes.
If I wanted to be complete with the testing in Russian, I'd vary the grammatical gender and test more of the speak lines.
I picked Russian because it's much more complicated when compared to many other languages. See ru_RU.xml for details.
…straints Add Russian tests
These changes are a part of issue #109. This is just a start. I wanted to get feedback before I add more. I can confirm that my implementation passes these tests.
It's worth noting that this XML syntax could potentially be extended to include the XHTML namespace or the SSML namespace. For example, the speak namespace could be omitted from the print line, and the print namespace could be omitted from the speak line. Using XML should also lend itself well to XLIFF for translation purposes.
Some options that remain to be implemented for testing include: