Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define technical terms #30

Closed
echeran opened this issue Feb 4, 2020 · 18 comments
Closed

Define technical terms #30

echeran opened this issue Feb 4, 2020 · 18 comments
Labels
documentation Improvements or additions to documentation

Comments

@echeran
Copy link
Collaborator

echeran commented Feb 4, 2020

This isn't a feature so much as an attempt to see if we can all define some of the technical terms that we've been using, based on what they mean to us. From the rich conversations we've had in meetings and in Github issue threads, I suspect that we may be using different terms to describe the same concept, or even the same term to refer to different concepts.

The thought is that if we each individually fill out our definitions for terms in a separate comment, we can compare notes at the end. And maybe some good consequences can pop out from that (reduced vocabulary -> clearer convos? realization of more common ground?)

I've gone through a few of the Github threads with the largest use of technical-sounding terms (skipping over things like linguistic terms), and listed them in order of first observed occurrence. To participate, just copy-paste the terms that mean something to you into a new comment below, and define them in your own words.

  • DOM Overlay 1 2
  • interpolate 1 2 3
  • translation merging
  • syntax - 1 2 3
  • authoring - 1 2
  • selector - 1 2 3
  • file format - 1 2 3
  • markup - 1
  • placeholder type
  • ITS data category 1 2
  • UI language/locale - 1
  • placeholder/variable locale / formatting locale - 1 2 3
  • resource locale - 1
  • compound message - 1
  • placeable - 1
  • locale chain - 1
  • language fallback - 1 2 3
  • language negotiation - 1 2
  • full message - 1
  • fragment message / sentence fragment - 1 2
  • localizable resource - 1
  • API argument syntax - 1
  • storage format / representation - 1 2
  • binding syntax - 1
  • API - 1 2 3 4
  • implementation - 1 2 3
  • positional variable - 1
  • standard message format - 1
  • spec - 1
  • interchange format / representation - 1 2
  • source code representation - 1
  • data model - 1 2
  • serialization - 1
  • runtime format - 1 2
  • build/parse-time format - 1
  • translation/localization format - 1 2
  • nested markup - 1
  • multi-variant message - 1
  • intermediate format / representation - 1
  • consumed format - 1
  • AST (abstract syntax tree) - 1 2 3 4
  • import/export filter for a format - 1
  • developer format - 1
  • multi-level filter for a format - 1
  • message syntax - 1
@mimckenna
Copy link

UI language/locale

The language and jurisdiction used to determine content in the User Interface. E.g. en_US/CA means use content for Canada in the US English language.

@mimckenna
Copy link

placeholder/variable locale / formatting locale

The locale (regional language) used for placeholder content or to format dynamic variables. E.g. a placeholder variable may be for a currency value 123456.78 INR. In a US locale (en_US/US) it would appear as ₹1,234,567.89 INR, in the Ukraine (en_US/UA) as 1.234.567,89 ₹ INR and in India (en_GB/IN) as ₹ 12,34,567.89 INR

@mimckenna
Copy link

resource locale

The locale used for content in the user interface. This is not always the formatting locale. For example, in iOS and Android, a user may choose a language for content, and "regional settings" for formats - I can choose to have my UI in American English but my regional settings for dates, time, numbers, calendar to follow European or even Chinese conventions.

@mimckenna
Copy link

locale chain

This refers to a list of approved locales to choose content from if content for the requested locale does not exist. Internally, we use this to keep the user flow contained in legally approved content since we have legal obligations to present certain terms and liabilities if we do not.

For a user, this could refer to a their list of languages in order of preference similar to the HTML Accept-Language list.

@mimckenna
Copy link

language fallback

If content is not available in the requested language, this is the process to "fall back" to the next available language in the locale chain.

@zbraniecki
Copy link
Member

@mimckenna is there a value in diverging at all from https://unicode.org/reports/tr35/#Identifiers ?

@mimckenna
Copy link

@zbraniecki - good point - I was rattling off how I would describe these terms to members of my technical team. Now that you bring it up, I agree that we should use pre-existing definitions in Unicode or other accepted standards, with links back to those definitions, where they exist. I'll be happy to revise my first attempts above following that model.

I think what the purpose of this doc is to provide very concise descriptions of each term as opposed to the multi-paragraph descriptions in tr35. We can wordsmith these down to single-sentence descriptions with pointers back to reference standards.

Some of these terms/concepts may not be found in tr35, such as AST (abstract syntax tree), and some others have fairly complete but lengthy descriptions that would be difficult, as is, to fit in a single sentence when pulled from TR35. An example would be the developer viewpoint (developer format) of message syntax vs the translator view of messages to translate (translation/localization format). This is implied in TR35 but it is far from concise.

@mimckenna
Copy link

Question - shall I direct-edit my initial responses, or revise as additional github comments in the thread?

@dchiba
Copy link

dchiba commented Feb 13, 2020

locale chain is called language priority list in BCP 47. Similarly, language fallback is known as lookup matching. Maybe there should be a mention at the end. e.g. "... This is so called language priority list in the formal BCP 47 terminology." or something alike.

@zbraniecki
Copy link
Member

I don't think we should follow BCP47 btw. Unicode UTS #35 is more up to date.

Similarly, language fallback is known as lookup matching

I don't think that's accurate. Lookup matching is a particular strategy of language negotiation.
There are others possible. For example, in Gecko we use three - https://firefox-source-docs.mozilla.org/intl/locale.html#filtering-matching-lookup

@echeran
Copy link
Collaborator Author

echeran commented Feb 13, 2020

Question - shall I direct-edit my initial responses, or revise as additional github comments in the thread?

Revising in the form of additional comments (maybe all batched together as one comment?) sounds good.

At this point, I think it's good to get as many responses recorded as possible first. Afterwards, we see what we observe & discuss. I'm interested in waiting for everyone's responses are because I think the responses in aggregate can help give us better clarity than just a few.

@echeran
Copy link
Collaborator Author

echeran commented Feb 14, 2020

Here are my responses for the terms I've used:

interpolate - formatting & inserting values inside of a string
translation merging - combining the translated version of content back into the source document
syntax - the arrangement of tokens in a file (or equivalent) according to a set of rules. syntax is a prerequisite for semantics (meaning).
file format - similar to syntax. the syntax (& semantics) of a file.
markup - a syntax that allows the more essential data to be annotated by secondary data, not necessarily specific to HTML/XML
placeholder type - whether a placeholder represents number/plurals, gender, etc.
language fallback - for locales (lang+region), not just language alone -- a mechanism for determining an acceptable locale (based on lang and/or region) when info for the exact locale is not available
API - the exact function names with argument lists and types and expected behaviors / output for a particular software
implementation - how a particular specification of inputs, behaviors, and outputs are achieved for a particular programming language or platform
spec - high-level description of expected inputs, behaviors, and outputs
positional variable - a placeholder within a message whose value is injected according to a specific index in a list of provided values
data model - the structure of data that describes how a message should be formatted. is independent of implementation. a part of the specification
serialization - how to turn data structures to/from text/bytes
AST (abstract syntax tree) - a tree showing the structure of tokens in a file, according to the syntax. is the output of a parser, usually according to a grammar, most often used in the context of a compiler
import/export filter for a format - "filter" is an Okapi term for a file format reader/writer that converts a file into the Okapi data model for an l10n document
multi-level filter for a format - an Okapi file format reader/writer that supports the nesting of two different file formats in one doc (ex: JSON file whose strings contain HTML)

@dchiba
Copy link

dchiba commented Feb 14, 2020

@zbraniecki This is for defining terminology, so I didn't mean to suggest following BCP47. I meant to use standard terms such as language priority list and lookup from BCP47, which defines elements and schemes for dealing with locales.

BCP47 says filtering and lookup are the 2 basic types of matching schemes. The former can return multiple locales, while the latter always returns one. Mozilla's "Matching" appears to be lookup performed on each requested locale. Isn't it an enhanced form of lookup?

UTS 35 is a more practical specification adopted by CLDR and others. I think it is a way to practice BCP47 and I would agree with you that we may find it reasonable to have some deviations in the corners.

@zbraniecki
Copy link
Member

Isn't it an enhanced form of lookup?

It is. But BCP47 specifies how the algorithms should work, and I don't think its the only available way, hence I'd prefer not to overspecify that yet :)

Also, I noticed "language fallback" and "locale chain" used separately.

I'd like to suggest we use "locale fallback chain" - a list of locales created as a result of locale negotiation between available and requested locales.
In general, I'd suggest we never talk about a single locale, since all our operations are intended to fallback in case of errors and missing data.

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2020

Looks like this area is fuzzy: matching / enhanced form of lookup / fallback / negotiation
I can explain how Android works.

But it is negotiation (once, at application launch time) followed by fallback (for every resource load)

The UI / formatting locales are not 100% separate, you will not see French UI with German dates (unless you did something wrong in the localization :-)


I can add a couple of joke definitions:

  • localization: using local variables in your code
  • globalization: using global variables in your code

More serious definitions:

  • localization (l10n) : what translators do
  • internationalization (i18n) : what programmers do
  • globalization (g11n) : what companies do

They are over-simplifications, but are memorable and clarify what the "buckets" are.

Sure, "translators" really mean localization companies + localization engineers + linguistic / technical QA + PMs + language managers + terminology managers + language specialists + reviewers + DTP, etc, a full machinery.
Same as "programmers" means also UX, PMs, QA, tech writers, etc. It also means architecture, data (and database) structures, design documents, etc, not just "sit down and write code"

L10N takes "assets" from human language A and gives back the same assets in languages X, Y, Z, ...

I18N make sure applications work without any code changes in any human language.
If I see an English string in a French application it might be a localization bug (a translator missed it) or an internationalization one (hard-coded)

G11N means market research, financial, tech infrastructure, legal, competition, decisions to localize or not (you can go to a new country without translating), etc.

See how much I had to write, just to (partially) explain 3 short bullets?
:-)

@echeran
Copy link
Collaborator Author

echeran commented Feb 18, 2020

Thanks for the replies so far.

See how much I had to write, just to (partially) explain 3 short bullets?

That was indeed thorough (thanks), but 1-2 sentences per term would have sufficed. The reason -- more to the point -- here are sets of terms where I want to figure out what they mean. Are they the same? related? different? and how so? My hunch is that we can dedupe some of these terms to have clearer conversations (or less likely to talk past each other).

selector
placeholder type
ITS data category

placeholder
placeable
fragment message / sentence fragment
variable (positional / named)

API argument syntax
storage format / representation
binding syntax
standard message format
interchange format / representation
source code reprsentation
runtime format
build/parse-time format
translation/localization format
intermediate format / representation
consumed format
AST (abstract syntax tree)
developer format
message syntax

@romulocintra romulocintra added the documentation Improvements or additions to documentation label Feb 19, 2020
@romulocintra
Copy link
Collaborator

@echeran this can be closed no? i thinks all terms already merged into glossary, that's correct?

@echeran
Copy link
Collaborator Author

echeran commented Jul 20, 2020

Yes, I agree, all these terms are contained in the glossary, so we should be fine to close this issue. I'll do that now.

@echeran echeran closed this as completed Jul 20, 2020
@echeran echeran mentioned this issue May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

6 participants