Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start message format test suite #113

Merged
merged 2 commits into from
Apr 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions test/format/en_US.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE formattest SYSTEM "../formattest.xsd">
<formattest locale="en-US">
<test>
<params>
<param name="relationship" type="String"><value>brother</value></param>
</params>
<source>I didn't find <var name="relationship" inflect="indefinite"/> in your contacts. What is your <var name="relationship" inflect="genitive"/> name?</source>
<print>I didn't find a brother in your contacts. What is your brotherʼs name?</print>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should the variants like a brother and brother's be defined? Is that something the implementation should provide? Or would there be another resource editable by the translators where these terms would be defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a default set of data. It's a lexical dictionary. We used to have a set based off of Wiktionary, but we're switching away from it due to licensing issues around modification, completeness and cleanliness of the data. The lexical dictionary helps with all of the common edge cases. When a word is missing from the lexical dictionary, we use heuristics or machine learning to fill in the gaps. For English, the default rules are pretty simple.

This also reduces the amount of effort to translate the common stuff. Think of it like translation memory, but it's available at runtime instead of at build time.

If the default is obscure, like a product name, invented word or a company name, you can create something like a semantic concept where you define all of the semantic features to get the words into grammatical agreement. We call this dialog metadata, and it's stored in our semantic feature model. Other parts of this working group have used the more generic term "data model".

I can put in a Russian example to highlight how to do it by hand and how complicated it is to get a number and noun into grammatical agreement. It's harder to find a language tougher than Russian. I mean Arabic, Finnish and Turkish have their own difficult issues, but those complexities fit into an architecture that properly supports Russian.

</test>
<test>
<params>
<param name="relationship" type="String"><value>aunt</value></param>
</params>
<source>I didn't find <var name="relationship" inflect="indefinite"/> in your contacts. What is your <var name="relationship" inflect="genitive"/> name?</source>
<print>I didn't find an aunt in your contacts. What is your auntʼs name?</print>
</test>
<test>
<params>
<param name="object" type="String"><value>light</value></param>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the translation of the word light get into this message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user vocabulary. I could have easily used the phrase "light on the front porch" or "Stanisław's party light". If you're dealing with something like HomeKit, the naming of various objects is up to the user.

It may not be translated. If it's translated, it would require additional infrastructure. It's stored in a SemanticFeatureModel, but I didn't define how it goes into a SemanticFeatureModel in these test cases. Fluent seems to have a similar concept to the translated phrases, but I think it's put into the same file of messages.

Copy link
Member Author

@grhoten grhoten Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added Russian tests to highlight how a SemanticConcept would work as a quantity. I used Latin text to highlight that it's using the declared data instead of a lexical dictionary, how the number changes pronunciation and how the grammatical case changes as the value changes.

If I wanted to be complete with the testing in Russian, I'd vary the grammatical gender and test more of the speak lines.

I picked Russian because it's much more complicated when compared to many other languages. See ru_RU.xml for details.

</params>
<source>The <var name="object"/> <switch value="object" feature="number">
<case is="plural">are</case>
<default>is</default>
</switch> on</source>
<print>The light is on</print>
</test>
<test>
<params>
<param name="object" type="String"><value>lights</value></param>
</params>
<source>The <var name="object"/> <switch value="object" feature="number">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that the implementation is capable of inspecting the object (lights) and extracting some grammatical information from it? (here: the grammatical number).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

<case is="plural">are</case>
<default>is</default>
</switch> on</source>
<print>The lights are on</print>
</test>
<test>
<params>
<param name="object" type="String"><value>lights on the front porch</value></param>
</params>
<source>The <var name="object"/> <switch value="object" feature="number">
<case is="plural">are</case>
<default>is</default>
</switch> on</source>
<print>The lights on the front porch are on</print>
</test>
<test>
<params>
<param name="object" type="String"><value>light on the front porch</value></param>
</params>
<source>The <var name="object"/> <switch value="object" feature="number">
<case is="plural">are</case>
<default>is</default>
</switch> on</source>
<print>The light on the front porch is on</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>1</value></param>
<param name="unit" type="String"><value>day</value></param>
</params>
<source>Your meeting is in <quantity value="number" unit="unit"/> from now</source>
<print>Your meeting is in 1 day from now</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>2</value></param>
<param name="unit" type="String"><value>day</value></param>
</params>
<source>Your meeting is in <quantity value="number" unit="unit"/> from now</source>
<print>Your meeting is in 2 days from now</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>3</value></param>
<param name="unit" type="String"><value>church</value></param>
</params>
<source><quantity value="number" unit="unit"/> were found nearby</source>
<print>3 churches were found nearby</print>
</test>
<test>
<params>
<param name="oridinalNum" type="Number"><value>4</value></param>
</params>
<source>Your <number name="oridinalNum" style="asWords" variant="ordinal"/> meeting has been canceled</source>
<print>Your fourth meeting has been canceled</print>
</test>
</formattest>
66 changes: 66 additions & 0 deletions test/format/es_MX.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE formattest SYSTEM "../formattest.xsd">
<formattest locale="es-MX">
<test>
<params>
<param name="words" type="String[]"><value>gato</value><value>gata</value><value>gatos</value><value>gatas</value></param>
</params>
<source>¿Te gustan los videos sobre <list name="words" type="conjunction" inflect="definite"/>?</source>
<print>¿Te gustan los videos sobre el gato, la gata, los gatos y las gatas?</print>
</test>
<test>
<params>
<param name="words" type="String[]"><value>gatos</value></param>
</params>
<source>¿Te gustan los videos sobre <list name="words" type="conjunction" inflect="definite"/>?</source>
<print>¿Te gustan los videos sobre los gatos?</print>
</test>
<test>
<params>
<param name="words" type="String[]"><value>gatos</value><value>idiomas</value></param>
</params>
<source>¿Te gustan los videos sobre <list name="words" type="conjunction"/>?</source>
<print>¿Te gustan los videos sobre gatos e idiomas?</print>
</test>
<test>
<params>
<param name="words" type="String[]"><value>idiomas</value><value>gatos</value></param>
</params>
<source>¿Te gustan los videos sobre <list name="words" type="conjunction"/>?</source>
<print>¿Te gustan los videos sobre idiomas y gatos?</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>1</value></param>
<param name="unit" type="String"><value>niño</value></param>
</params>
<source>Hay <quantity value="number" unit="unit"/> en el video</source>
<print>Hay 1 niño en el video</print>
<speak>Hay un niño en el video</speak>
</test>
<test>
<params>
<param name="number" type="Number"><value>1</value></param>
<param name="unit" type="String"><value>niña</value></param>
</params>
<source>Hay <quantity value="number" unit="unit"/> en el video</source>
<print>Hay 1 niña en el video</print>
<speak>Hay una niña en el video</speak>
</test>
<test>
<params>
<param name="number" type="Number"><value>3</value></param>
<param name="unit" type="String"><value>niño</value></param>
</params>
<source>Hay <quantity value="number" unit="unit"/> en el video</source>
<print>Hay 3 niños en el video</print>
<speak>Hay tres niños en el video</speak>
</test>
<test>
<params>
<param name="word" type="String"><value>gato</value></param>
</params>
<source>Me gustan <var name="word" inflect="definite plural"/></source>
<print>Me gustan los gatos</print>
</test>
</formattest>
94 changes: 94 additions & 0 deletions test/format/ru_RU.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE formattest SYSTEM "../formattest.xsd">
<formattest locale="ru-RU">
<test>
<params>
<param name="number" type="Number"><value>1</value></param>
<param name="unit" type="String"><value>километр</value></param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>1 километр 1 километра 1 километр 1 километру 1 километром 1 километре</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>2</value></param>
<param name="unit" type="String"><value>километр</value></param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>2 километра 2 километров 2 километра 2 километрам 2 километрами 2 километрах</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>5</value></param>
<param name="unit" type="String"><value>километр</value></param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>5 километров 5 километров 5 километров 5 километрам 5 километрами 5 километрах</print>
</test>
<test>
<params>
<param name="number" type="Number"><value>1</value></param>
<param name="unit" type="SemanticConcept">
<value constraint="neuter nominative singular">nominative,singular</value>
<value constraint="neuter instrumental singular">instrumental,singular</value>
<value constraint="neuter accusative singular">accusative,singular</value>
<value constraint="neuter dative singular">dative,singular</value>
<value constraint="neuter genitive singular">genitive,singular</value>
<value constraint="neuter prepositional singular">prepositional,singular</value>
<value constraint="neuter nominative plural">nominative,plural</value>
<value constraint="neuter instrumental plural">instrumental,plural</value>
<value constraint="neuter accusative plural">accusative,plural</value>
<value constraint="neuter dative plural">dative,plural</value>
<value constraint="neuter genitive plural">genitive,plural</value>
<value constraint="neuter prepositional plural">prepositional,plural</value>
</param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>1 nominative,singular 1 genitive,singular 1 accusative,singular 1 dative,singular 1 instrumental,singular 1 prepositional,singular</print>
<speak>одно nominative,singular одного genitive,singular одно accusative,singular одному dative,singular одним instrumental,singular одном prepositional,singular</speak>
</test>
<test>
<params>
<param name="number" type="Number"><value>2</value></param>
<param name="unit" type="SemanticConcept">
<value constraint="neuter nominative singular">nominative,singular</value>
<value constraint="neuter instrumental singular">instrumental,singular</value>
<value constraint="neuter accusative singular">accusative,singular</value>
<value constraint="neuter dative singular">dative,singular</value>
<value constraint="neuter genitive singular">genitive,singular</value>
<value constraint="neuter prepositional singular">prepositional,singular</value>
<value constraint="neuter nominative plural">nominative,plural</value>
<value constraint="neuter instrumental plural">instrumental,plural</value>
<value constraint="neuter accusative plural">accusative,plural</value>
<value constraint="neuter dative plural">dative,plural</value>
<value constraint="neuter genitive plural">genitive,plural</value>
<value constraint="neuter prepositional plural">prepositional,plural</value>
</param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>2 genitive,singular 2 genitive,plural 2 genitive,plural 2 dative,plural 2 instrumental,plural 2 prepositional,plural</print>
<speak>два genitive,singular двух genitive,plural два genitive,plural двум dative,plural двумя instrumental,plural двух prepositional,plural</speak>
</test>
<test>
<params>
<param name="number" type="Number"><value>5</value></param>
<param name="unit" type="SemanticConcept">
<value constraint="neuter nominative singular">nominative,singular</value>
<value constraint="neuter instrumental singular">instrumental,singular</value>
<value constraint="neuter accusative singular">accusative,singular</value>
<value constraint="neuter dative singular">dative,singular</value>
<value constraint="neuter genitive singular">genitive,singular</value>
<value constraint="neuter prepositional singular">prepositional,singular</value>
<value constraint="neuter nominative plural">nominative,plural</value>
<value constraint="neuter instrumental plural">instrumental,plural</value>
<value constraint="neuter accusative plural">accusative,plural</value>
<value constraint="neuter dative plural">dative,plural</value>
<value constraint="neuter genitive plural">genitive,plural</value>
<value constraint="neuter prepositional plural">prepositional,plural</value>
</param>
</params>
<source><quantity value="number" unit="unit"/> <quantity value="number" unit="unit" inflect="genitive"/> <quantity value="number" unit="unit" inflect="accusative"/> <quantity value="number" unit="unit" inflect="dative"/> <quantity value="number" unit="unit" inflect="instrumental"/> <quantity value="number" unit="unit" inflect="prepositional"/></source>
<print>5 genitive,plural 5 genitive,plural 5 genitive,plural 5 dative,plural 5 instrumental,plural 5 prepositional,plural</print>
<speak>пять genitive,plural пяти genitive,plural пять genitive,plural пяти dative,plural пятью instrumental,plural пяти prepositional,plural</speak>
</test>
</formattest>
101 changes: 101 additions & 0 deletions test/formattest.xsd
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="formattest">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="test"/>
</xs:sequence>
<xs:attribute name="locale" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="test">
<xs:complexType>
<xs:sequence>
<xs:element ref="params"/>
<xs:element ref="source"/>
<xs:element ref="print"/>
<xs:element ref="speak"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="params">
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs="unbounded" ref="param"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="param">
<xs:complexType>
<xs:sequence>
<xs:element ref="value"/>
</xs:sequence>
<xs:attribute name="name" use="required" type="xs:NCName"/>
<xs:attribute name="type" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="value">
<xs:complexType>
<xs:attribute name="constraint" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:complexType name="message" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="number"/>
<xs:element ref="list"/>
<xs:element ref="quantity"/>
<xs:element ref="switch"/>
<xs:element ref="var"/>
</xs:choice>
</xs:complexType>
<xs:element name="source" type="message"/>
<xs:element name="list">
<xs:complexType>
<xs:attribute name="inflect" type="xs:NCName"/>
<xs:attribute name="name" use="required" type="xs:NCName"/>
<xs:attribute name="type" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="number">
<xs:complexType>
<xs:attribute name="name" use="required" type="xs:NCName"/>
<xs:attribute name="style" use="required" type="xs:NCName"/>
<xs:attribute name="variant" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="quantity">
<xs:complexType>
<xs:attribute name="inflect" type="xs:NCName"/>
<xs:attribute name="unit" use="required" type="xs:NCName"/>
<xs:attribute name="value" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="switch">
<xs:complexType>
<xs:sequence>
<xs:element ref="case"/>
<xs:element ref="default"/>
</xs:sequence>
<xs:attribute name="feature" use="required" type="xs:NCName"/>
<xs:attribute name="value" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="case">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="message">
<xs:attribute name="is" use="required" type="xs:NCName"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="default" type="message"/>
<xs:element name="var">
<xs:complexType>
<xs:attribute name="inflect" type="xs:NCName"/>
<xs:attribute name="name" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
<xs:element name="print" type="xs:string"/>
<xs:element name="speak" type="xs:string"/>
</xs:schema>