Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized Terms #176

Closed
stasm opened this issue Oct 1, 2018 · 7 comments
Closed

Parameterized Terms #176

stasm opened this issue Oct 1, 2018 · 7 comments
Labels

Comments

@stasm
Copy link
Contributor

stasm commented Oct 1, 2018

a.k.a. The Return of Macros.

In #102 we added VariantLists. They are a solution to abusing SelectExpressions by not defining the selector. Variants of a VariantList can be retrieved externally from where the term is referenced with a VariantExpression.

The design of VariantLists has its own shortcomings which I outlined in #102 (comment) and which required the addition of specialized code in abstract.mjs to tightly control where exactly VariantLists are allowed.

At the time, we accepted these shortcomings as trade-offs and also because we didn't find other alternatives. I think I may have stumbled across an idea which is worth bringing it up here.

Proposal

Rather than treating terms with VariantLists for value as dictionaries whose members must be accessed from the outside, what if we encapsulated the member access in a form of a function?

In other words, rather than defining the following term as a VariantList:

# Before, with VariantLists
-fxaccount-brand-name = {
    [lowercase] account Firefox
   *[uppercase] Account Firefox
}

sync-signedout-account-title = Connetti il tuo { -fxaccount-brand-name[lowercase] }

We could parameterize it with the $case local variable and use a regular Pattern as the value:

# After, with parameterized terms
-fxaccount-brand-name($case) = { $case ->
    [lowercase] account Firefox
   *[uppercase] Account Firefox
}

# Call it similar to a function
sync-signedout-account-title = Connetti il tuo { -fxaccount-brand-name("lowercase") }

Using the default variant, or referencing a term which doesn't have any parameters would use the current TermReference syntax without any paranthesis:

# English FTL
sync-signedout-account-title = Connect to your { -fxaccount-brand-name }

Advantages

  • Values of terms can always be Patterns again.
  • Accessing a specific variant uses the CallExpression syntax. No need for the specialized VariantExpression.
  • Accessing variants would now use the same semantics as SelectExpressions in Patterns do right now. New features of SelectExpressions, like Allow selectors to be lists #4, would automatically be available in terms.
  • Local variables can be used inside of the term's value, too. The above example doesn't show case this, but you could imagine a different term using a $number variable. In the current design, the name of this variable is effectively hardcoded in the term and all messages referencing it must have the variable of this exact name passed into them by the app. This would limit the utility of private terms, which we might introduce in the future.

Open questions

  1. How should the scoping rules work exactly? Should terms be allowed to access variables passed to messages which reference them, or should all variables be passed to them explicitly?
  2. Should attributes use the parameters defined on the term id, or should there be a way to define (possibly different) parameters on the attribute id?
  3. Related, when referencing attributes, should the parameters be attached to the term ({-brand-name("lowercase").gender}) or to the attribute ({-brand-name.gender("lowercase")})?
  4. Could we use named arguments to define default values of parameters?
stasm added a commit to projectfluent/fluent.js that referenced this issue Oct 12, 2018
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs
against the reference fixtures, has half the lines of code, and is as fast in
SpiderMonkey as the old one (and slightly faster in V8).


        Goals

  1. Support 100% of Fluent Syntax 0.7. This includes the indentation
     relaxation, dropping tabs and CR as syntax whitespace, normalizing new
     lines to LF, and only allowing numbers and identifiers as variant keys.

  2. Maintain good performance. The parser is used in performance-critical code
     paths. Back in the days of Firefox OS it had to be both fast _and_ produce
     tightly packed results so that translations don't take up too much space
     on the device. I think the storage requirements can be relaxed these days.

  3. Write code which will be easy to maintain in the future. The parser was
     first written even before Fluent branched off from L20n. It's seen many
     changes and additions over the last two years. As new features accrued it
     became hard to maintain it and also to keep track of all known bugs. My
     goal for the re-write was not only to clean it up but also to define the
     conformance story for the future and to improve the testing
     infrastructure.


        Design

The parser focuses on minimizing the number of false negatives at the expense
of increasing the risk of false positives. In other words, it aims at parsing
_valid_ Fluent messages with a success rate of 100%, but it may also parse some
invalid messages which the reference parser would reject. The parser doesn't
perform any validation and may produce entries  which wouldn't make sense in
the real world. For best results users are advised to validate translations
with the fluent-syntax parser pre-runtime.

The main parser loop iterates over the beginnings of messages and terms. This
is to efficiently skip over comments (which have no use on runtime), and to
recover from errors. When a fatal error is encountered, the parser instantly
bails out of the currently-parsed message and moves on to the next one. Errors
are discarded and are not visible to the users of `FluentResource`. The do
carry a minimal description of what went wrong which may be useful when reading
the code and for debugging, though.

The parser makes an extensive use of sticky regexes which can be anchored to
any offset of the source string without slicing it. In some places, it's easier
to just check the character currently at the cursor, so it does a fair share of
that, too.


        Conformance

My original plan was to base the parser on the EBNF and only parse well-formed
syntax. In this PR, I went for something a bit wider than that: a superset of
well-formed syntax. The main deviation from the EBNF is related to parsing
`VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are
called on `Terms` and `Functions` respectively. The optimistic parser doesn't
differentiate between `Messages`, `Terms` and `Functions`. I decided to
implement it this way because this code might soon change anyways (see
projectfluent/fluent#176).

Another deviation is that the parser treats commas in argument lists  as
whitespace, similar to how Clojure treats them in sequence lists. I might
suggest we upstream this in the spec, too, because it makes the implementation
of args lists _much_ simpler.

I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser
already supports Syntax 0.7 and passes the [reference
fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures).
This made it possible to also turn on the reference testing in the runtime
parser, too. `make fixtures` creates the parsed results for all reference
fixtures; for now they must be verified manually before they're committed.
`make test` can be used in development to assert that the output of the runtime
parser still matches the committed one.
stasm added a commit to projectfluent/fluent.js that referenced this issue Oct 12, 2018
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs
against the reference fixtures, has half the lines of code, and is as fast in
SpiderMonkey as the old one (and slightly faster in V8).


        Goals

  1. Support 100% of Fluent Syntax 0.7. This includes the indentation
     relaxation, dropping tabs and CR as syntax whitespace, normalizing new
     lines to LF, and only allowing numbers and identifiers as variant keys.

  2. Maintain good performance. The parser is used in performance-critical code
     paths. Back in the days of Firefox OS it had to be both fast _and_ produce
     tightly packed results so that translations don't take up too much space
     on the device. I think the storage requirements can be relaxed these days.

  3. Write code which will be easy to maintain in the future. The parser was
     first written even before Fluent branched off from L20n. It's seen many
     changes and additions over the last two years. As new features accrued it
     became hard to maintain it and also to keep track of all known bugs. My
     goal for the re-write was not only to clean it up but also to define the
     conformance story for the future and to improve the testing
     infrastructure.


        Design

The parser focuses on minimizing the number of false negatives at the expense
of increasing the risk of false positives. In other words, it aims at parsing
_valid_ Fluent messages with a success rate of 100%, but it may also parse some
invalid messages which the reference parser would reject. The parser doesn't
perform any validation and may produce entries  which wouldn't make sense in
the real world. For best results users are advised to validate translations
with the fluent-syntax parser pre-runtime.

The main parser loop iterates over the beginnings of messages and terms. This
is to efficiently skip over comments (which have no use on runtime), and to
recover from errors. When a fatal error is encountered, the parser instantly
bails out of the currently-parsed message and moves on to the next one. Errors
are discarded and are not visible to the users of `FluentResource`. The do
carry a minimal description of what went wrong which may be useful when reading
the code and for debugging, though.

The parser makes an extensive use of sticky regexes which can be anchored to
any offset of the source string without slicing it. In some places, it's easier
to just check the character currently at the cursor, so it does a fair share of
that, too.


        Conformance

My original plan was to base the parser on the EBNF and only parse well-formed
syntax. In this PR, I went for something a bit wider than that: a superset of
well-formed syntax. The main deviation from the EBNF is related to parsing
`VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are
called on `Terms` and `Functions` respectively. The optimistic parser doesn't
differentiate between `Messages`, `Terms` and `Functions`. I decided to
implement it this way because this code might soon change anyways (see
projectfluent/fluent#176).

Another deviation is that the parser treats commas in argument lists  as
whitespace, similar to how Clojure treats them in sequence lists. I might
suggest we upstream this in the spec, too, because it makes the implementation
of args lists _much_ simpler.

I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser
already supports Syntax 0.7 and passes the [reference
fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures).
This made it possible to also turn on the reference testing in the runtime
parser, too. `make fixtures` creates the parsed results for all reference
fixtures; for now they must be verified manually before they're committed.
`make test` can be used in development to assert that the output of the runtime
parser still matches the committed one.
@stasm stasm added the syntax label Oct 16, 2018
@stasm
Copy link
Contributor Author

stasm commented Oct 19, 2018

More open questions from the last time we talked about this in person:

  1. Should both positional and named arguments be allowed?

  2. If positional arguments are allowed, how can the caller invoke the default value of an argument which precedes another one? I.e. for -term($one, $two), should it be possible to somehow just pass the value of $two?

  3. If named arguments are allowed, should we name them using the $ sigil? -term(arg: "value") vs. -term($arg: "value").

  4. Or maybe we should drop the $ entirely and keep it reserved only for data coming directly from the app? In this case, we'd write:

    -fxaccount-brand-name(case) = { case ->
        [lowercase] account Firefox
       *[uppercase] Account Firefox
    }

@stasm
Copy link
Contributor Author

stasm commented Oct 19, 2018

I have a proposal which solves questions: 1, 4, 5, 6, and maybe 7 and 8.

# @param $case - Letter case to use.
-fxaccount-brand-name = { $case ->
    [lowercase] account Firefox
   *[uppercase] Account Firefox
}

sync-signedout-account-title =
    Connetti il tuo { -fxaccount-brand-name($case: "lowercase") }

Let's treat all variables referenced in Terms as references to named arguments passed to the term via a CallExpression. This means that terms are not allowed to access any variables passed to the message they are referenced from, unless these variables are passed to the term explicitly via the CallExpression. This answers question 1.

Terms don't need to define their format parameters in this proposal at all (-term(arg) = ...). The parameters are accessed lazily during the resolution of the term from the list of named arguments passed to it. This answers question 4.

When calling a term, both positional and named arguments can be used, but positional are ignored, because, by their anonymous nature, they can't be looked up via variable names. This answers question 5 and 6.

I'm still not entirely sure about question 7 and 8. I can see arguments to both using -fxaccount-brand-name($case: "lowercase") and -fxaccount-brand-name(case: "lowercase") (without the $). I'm erring on the side of consistency from the localizer's POV, which means using $ everywhere.

@stasm
Copy link
Contributor Author

stasm commented Oct 19, 2018

As far as questions 2 and 3 are concerned, and assuming term attributes are only used to describe grammatical or stylistic properties of the term itself... do we actually need parameterization there?

-brand-name =
    { $case ->
       *[nominative] Firefox
        [accusative] Firefoxa
    }
    .gender = masculine

has-updated =
    { -brand-name.gender ->
        [masculine] { -brand-name($case: "nominative") } został zaktualizowany.
        [feminine] { -brand-name($case: "nominative") } została zaktualizowana.
       *[other] Program { -brand-name($case: "nominative") } został zaktualizowany.
    }

@Pike
Copy link
Contributor

Pike commented Oct 19, 2018

Vowel harmony could be a use-case (didn't read the article in full) where you'd like to denote to the calling site that the term's dominating vowels depend on case or plurality?

@stasm
Copy link
Contributor Author

stasm commented Oct 19, 2018

Interesting, thanks for the link.

I'd also like to point out that it's not possible to achieve this with the current design of VariantLists, either: -term.attr[name] is not valid syntax. I filed #189 about it.

@stasm
Copy link
Contributor Author

stasm commented Oct 19, 2018

As I was researching this issue, I realized the current design of VariantLists has more limitations than I initially considered. I filed #187 and #188. Macros could be a way to fix them, if we removed VariantLists along the way. If not, we should look into fixing them in some other way.

@stasm
Copy link
Contributor Author

stasm commented Nov 6, 2018

I'd like to move forward on this proposal. I opened #205 with a WIP implementation allowing the -term(arg: "value") syntax. I'm leaving accessing variants of attributes out of scope for now; we'll add them in #189. I also filed #204 to remove VariantLists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants