Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict variant keys to Numbers and Identifier #127

Closed
stasm opened this issue May 23, 2018 · 20 comments
Closed

Restrict variant keys to Numbers and Identifier #127

stasm opened this issue May 23, 2018 · 20 comments
Labels
backwards incompatible Old files won't parse in new parsers. forwards incompatible Old parsers won't parse newer files. syntax

Comments

@stasm
Copy link
Contributor

stasm commented May 23, 2018

This is related to #90, in which I'd like to relax the grammar of variant keys. I'd like to start by restricting it to just Numbers and Identifiers. We can relax this later and the benefit of restricting it now is to clarify the exact rules of parsing the surrounding whitespace, as in [____key____].

@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

I checked all translations in l10n-central and this wouldn't introduce any breakages. The only variant keys we currently use are numbers, one through other and windows, macos etc.

@stasm stasm added syntax spec: ast backwards incompatible Old files won't parse in new parsers. forwards incompatible Old parsers won't parse newer files. labels May 23, 2018
@stasm
Copy link
Contributor Author

stasm commented Jul 26, 2018

Absent any comments, I went ahead and opened #157 with the implementation.

@Pike
Copy link
Contributor

Pike commented Jul 26, 2018

Well, all the comments are in #90

@stasm
Copy link
Contributor Author

stasm commented Jul 27, 2018

#90 is a discussion about relaxing the grammar of variant keys. Regardless of its outcome, there's a value in starting by restricting the grammar. It will allow us to relax it to our liking later on. I propose that we move #90 to after 1.0 and start with a minimal viable and maximally-extensible design right now.

@Pike
Copy link
Contributor

Pike commented Jul 27, 2018

My concern from #90 is still valid, and unaddressed: Introducing the concept of Identifier in a semantically unrelated context.

Also reducing the character set we have for variants, which would lead to grammatical concepts that had to be latinized to be expressed in Fluent.

@stasm
Copy link
Contributor Author

stasm commented Jul 27, 2018

Identifier is a grammar concept, not a semantic one. Semantics come into play in form of MessageReference, VariableReference etc.

As to your second point: we already limit the ability to express these grammatical concepts by not allowing non-Latin characters in names of attributes. Both the value (männlich) and the name of the attribute (Genus?) might require non-Latin characters.

I've said it before: my goal is to definitely allow both. That's #90 or #117. In the meantime, I'd like to restrict the grammar so that we don't have our hands tied when we get to relaxing it.

@flodolo
Copy link
Contributor

flodolo commented Aug 9, 2018

I don't believe it's an unreasonable ask to limit variant names to ASCII characters, even more so if:

  • It's a temporary solution.
  • We can generate a meaningful error message when a variant name uses invalid characters.

As a side note, partially related: we currently only have 2 cases in l10n-central where localizers added their own variants (it, fr for Firefox Account brand), both are using English for these variant names ("uppercase", "lowercase"). As someone who has to review these changes, I confess it's a big advantage to have them in English, since it lets me understand what's going on.

@zbraniecki
Copy link
Collaborator

Yeah, I'm with Flod here. I'd like to see us handle the New York case, but I'm totally ok with not rushing it.

@mathjazz
Copy link
Contributor

I'd like to see us handle the New York case, but I'm totally ok with not rushing it.

The New York case being this?

greeting =
    { $state ->
        [New York] Hi!
       *[other] Hello!
    }

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

Yes, that's correct.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

Unless I see strong arguments against it, I'm going to move forward with this proposal on Thursday, August 16, when I'm back from vacation. The proposed restriction is a safety measure which will allow us to design a proper solution in #90 and #117. Thank you, everyone, for voicing your opinions!

@Pike
Copy link
Contributor

Pike commented Aug 15, 2018

I think that this change will make fixing #90 harder. If nothing else, it'll require changes to fixtures that will make it very hard to review for correctness. It'll also require untangling the identifier, and it'll require more invasive changes in parser implementations.

It will also put baggage on #117 that that ticket shouldn't lift. Identifiers are to exchange information between the program and the localization. We're already conflating that with XML attribute names, which works both because of luck and because we borrowed some XML_Namechar productions. Adding another (VariantName) is just going to make that problem harder to solve. VariantNames do have their own problem space, as they can be external variable values, return values from globals, as well as Term attribute values.

I propose to keep a dedicated AST entry for VariantName, even if it's lexically similar to what Identifier might be at some point. I'm happy to remove the multi-word part of, as I agree that's pretty ill-defined right now.

Another benefit is that checks in compare-locales and friends could be written in a forwards-compatible manner.

@stasm
Copy link
Contributor Author

stasm commented Aug 16, 2018

Thanks, @Pike. I'm not sure I understood all of your comment. Please find my answers and a few clarification questions below.

I think that this change will make fixing #90 harder. If nothing else, it'll require changes to fixtures that will make it very hard to review for correctness.

The changes to tests should not influence the design of the spec. Regardless, can you explain how the review of #90 would be made harder by this change?

It'll also require untangling the identifier, and it'll require more invasive changes in parser implementations.

Can you explain what you mean by untangling here?

It will also put baggage on #117 that that ticket shouldn't lift. Identifiers are to exchange information between the program and the localization. We're already conflating that with XML attribute names, which works both because of luck and because we borrowed some XML_Namechar productions.

We're far from being lucky. The current Fluent spec doesn't allow localizing many possible XML attribute names. See https://www.w3.org/TR/xml/#attdecls and https://www.w3.org/TR/xml/#sec-starttags

AttDef ::= S Name S AttType	S DefaultDecl
Attribute ::= Name Eq AttValue

This is what the spec has to say about the Name production:

The first character of a Name MUST be a NameStartChar, and any other characters MUST be NameChars; this mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining characters. Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names.

This means that #117 should be considered with higher priority in order to satisfy the fourth design principle of Fluent (with XML and HTML being one of the most widely-adopted UI languages in existence):

Translations are objects with values and additional data which can be mapped to attributes of UI widgets and components.

VariantNames do have their own problem space, as they can be external variable values, return values from globals, as well as Term attribute values.

External variable values are strings and #90 has a proposal about how to deal it with them in variant keys without making any concessions. Return values from globals are controlled by the implementation or the developer; it's reasonable to expect them to follow the grammar restrictions defined by the grammar. Term attribute values are the only tricky one on the list, but there's little we can do about it other than accept it and use #90 and this issue as remedies. A valid attribute value of closing] will never be allowed verbatim as a variant key.

I propose to keep a dedicated AST entry for VariantName, even if it's lexically similar to what Identifier might be at some point.

What do you mean by similar here?

@Pike
Copy link
Contributor

Pike commented Aug 16, 2018

I don't think that we should take attribute names as a driver for identifiers. foo.bar is a valid attribute name. Luckily, the full stop is explicitly excluded from NameStartChar. If we want to cover xml attribute names as fluent attribute names, we'll need to stop using the identifier production there.

I'm using untangling in this sense. If we want to fix our use of identifier in one scenario, we'll likely need to fork the grammar production, and use identifier for identifiers, and attribute names for attribute names. Same goes for variant names.

@stasm
Copy link
Contributor Author

stasm commented Aug 28, 2018

I don't think that we should take attribute names as a driver for identifiers. foo.bar is a valid attribute name. Luckily, the full stop is explicitly excluded from NameStartChar. If we want to cover xml attribute names as fluent attribute names, we'll need to stop using the identifier production there.

I now better understand your position, thanks. Mid-term, I think we should allow foo.bar as a valid attribute name, and it will require a new grammar production. It's a different issue however.

I'm using untangling in this sense. If we want to fix our use of identifier in one scenario, we'll likely need to fork the grammar production, and use identifier for identifiers, and attribute names for attribute names. Same goes for variant names.

The outcome of #90 will impact whether we'll need a separate production for variant names. If it lands, all complex matching of variants can be achieved with the syntax proposed there. Which is why I don't want to over-engineer this issue right now if it doesn't need it. For our current use-cases, (one, windows, lowercase) are perfectly sufficient.

If this was only about VariantName, I'd prefer to use the Identifier AST node for it right now, and change it to a dedicated production and node later, when there's a use-case. However, given the attribute name issue, I see an opportunity to restrict variant names and relax attribute names in the future by following your proposal from #127 (comment):

I propose to keep a dedicated AST entry for VariantName, even if it's lexically similar to what Identifier might be at some point. I'm happy to remove the multi-word part of, as I agree that's pretty ill-defined right now.

My only change is to call this node Name, so that it can be used both for attribute names and variant names. I'll also file a new issue to relax its grammar to follow XML's Name production.

@stasm
Copy link
Contributor Author

stasm commented Aug 28, 2018

I see three paths forward:

  1. Restrict variant names to NumberLiterals and Identifiers in 0.7 (Restrict variant keys to Identifiers and NumberLiterals #157). In 0.8, introduce a new Name production allowing more characters than Identifier (at least . and :) and use it for attribute and variant names.

    • This is the safest path but it results in more AST churn.
  2. Introduce a new Name production which is identical to Identifier in 0.7 (Introduce the Name node for attribute and variant names #172). In 0.8, extend it to allow more characters (at least . and :, see Attribute names should allow dots #173).

    • I'm OK with this if there's a commitment to fixing Attribute names should allow dots #173 before 1.0. This approach reduces the AST churn and allows tools to prepare for using the new Name node. However, I would like to avoid adding a new node if it ends up being the same as Identifier in 1.0.
  3. Introduce a new Name production which allows . and : on top of what Identifier does in 0.7.

    • If allowing . and : is an easy decision, I'm OK making it all in 0.7.

I'd like to make a call by tomorrow.

@Pike
Copy link
Contributor

Pike commented Aug 28, 2018

I don't see the connection between attribute names and variant names.

I don't have an opinion on names and values of things that try to do multiple things at once.

@stasm
Copy link
Contributor Author

stasm commented Aug 28, 2018

There's a connection by the virtue of not being Identifiers, which forbid . for the purpose of AttributeExpression. If it wasn't for this purpose, I'd suggest using Name (or Identifier) for all discussed use-cases.

I don't have an opinion on names and values of things that try to do multiple things at once.

Sorry, but this isn't very helpful :( I was hoping my last comment (#127 (comment)) would help us move forward on this. I realize I've included attributes in this issue now, but this is a response to your feedback to keep VariantName, which I think has its merits in the larger picture. If we don't include attributes in this discussion, I don't see any reason to keep VariantName right now and I'd like to go ahead with option no. 1 to unblock 0.7.

@Pike
Copy link
Contributor

Pike commented Aug 28, 2018

All of the three options lead to Name, and I don't have something constructive to say about Name.

@stasm
Copy link
Contributor Author

stasm commented Aug 28, 2018

OK, thanks. I think it makes sense to go with the safest and the most restricted option right now. i.e. using only NumberLiterals and Identifiers as variant keys. #157 has the proposed implementation. I'd like to ask you for a technical review on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards incompatible Old files won't parse in new parsers. forwards incompatible Old parsers won't parse newer files. syntax
Projects
None yet
Development

No branches or pull requests

5 participants