Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion thread for ballot: "effect of selectors on subsequent placeholders" #872

Closed
aphillips opened this issue Aug 28, 2024 · 16 comments · Fixed by #877
Closed

Discussion thread for ballot: "effect of selectors on subsequent placeholders" #872

aphillips opened this issue Aug 28, 2024 · 16 comments · Fixed by #877

Comments

@aphillips
Copy link
Member

This is the discussion thread for the current ballot.

@sffc
Copy link
Member

sffc commented Aug 28, 2024

Commentary on my votes in order to make sure I understand things correctly:

My main priority is: Prevent situations where the same variable name means two different things in two different contexts. This includes when the variable is used in the .match and in match arms.

I put a higher value on clarity, readability, and simplicity than I do on brevity.

  • (A) Do nothing: This has the problem where the settings given to the variable in the .match don't appear to affect the resulting number. This is the worst because it is a big footgun and, in my mind, one of the motivations behind inventing a new MF syntax in the first place.
  • (B) Require annotation of selector variables in placeholders: Based on the design doc, it is still possible to construct code that looks very similar to (A), so I think I still reject this option for not achieving the goal.
  • (C) Allow both local and input declarative selectors with immutability: The resulting syntax is confusing, but I think it achieves my primary goal, so I have this as my last choice before the "!" mark (unacceptable).
  • (D) Allow mutable input declarative selectors: It seems bad to violate immutability. But, it appears that this option means that referencing a variable in match arms results in the same options as the .match clause, and it isn't possible to reference the "old" options associated with the variable inside the .match, so it is not unacceptable.
  • (E) Allow immutable input declarative selectors: This option seems to satisfy my requirements, and it allows a one-line match statement in simple cases. It is therefore my preferred choice.
  • (F) Match on variables instead of expressions: This also seems to satisfy my requirement; it is slightly more verbose than (E), but that doesn't concern me much, so I will rate it equally.
  • (G) Provide a #-like Feature: This is fine except for having to keep track of positional arguments, which could be confusing and harm readability. So, acceptable but not preferred.
  • (H) Hybrid approach, Match may mutate, no duplicates: This seems very similar to (D) and I do not really understand the subtle differences. I will therefore rate it equal to (D), in front because I trust that it is intended as an improvement on (D).

@aphillips
Copy link
Member Author

@sffc Your logic is very similar to mine.

I didn't vote for (G) because, while it solves a problem with the current syntax, it introduces new features and complexity that I think are distracting. In addition, it might represent additional hazard for translation processes, especially where translation memories are recalling individual pattern strings. The numbered items are linked to the specific key order rather than the selector names.

I did include (F) in my voting. I don't like the lack of {/} around the selectors. When the message is whitespace normalized, the selectors blend too easily into the first key list, but this is a problem we can fix if this approach were selected.

(chair hat ON)
As a reminder, = means > when it comes to tallying votes, although the result of this ballot is likely to be a discussion.

@sffc
Copy link
Member

sffc commented Aug 28, 2024

I did include (F) in my voting. I don't like the lack of {/} around the selectors. When the message is whitespace normalized, the selectors blend too easily into the first key list, but this is a problem we can fix if this approach were selected.

My working assumption was that we are currently choosing the semantics, not the syntax, and if (F) is chosen, the exact placement and nature of the syntax punctuation characters could be subject to further discussion.

@macchiati
Copy link
Member

My bottom line is like the above comments. At all costs, avoid the situation where the formatted value is not consistent with the selected value. So that eliminates A and B. I could have put G above those, but it is a bit step backwards in terms of readability. About the others, I do favor H, because I think it has the most natural reading for anyone reading or constructing messages, and and minimizes redundancy in the syntax. D would be next on the list for that reason.

@eemeli
Copy link
Collaborator

eemeli commented Aug 29, 2024

The way I see it, MF2 will get read way more than it gets written, and most of the writing of it will consist of modifications of existing messages (e.g. translations thereof) rather than writing new messages.

I also presume that MF2 will be an auxiliary language for ~everyone interacting with it, and so it has a higher-than-usual bar for making intuitive sense, and not exhibiting surprising behaviour.

To that end, my strong first preference is to simplify .match to not allow annotation (option F), so that our existing .input and .local declarations can be used for their stated purposes.

If that's not acceptable, then I'd prefer to require the annotation when using the same variable in selectors and placeholders (option B), as that will guide message authors towards a very similar pattern as selecting on variables. This does not make it impossible to write bad messages, but it raises the bar so that doing so requires more work than doing the sensible thing. We're also never going to make bad messages impossible to write.

Adding {$0} etc. as syntax for referring to selector values (option G) takes the same approach of making the right thing simpler than the wrong thing, and it's effectively the choice that was made in MF1.

If we must complicate the .match so that it has a declarative side effect, then let's at least make sure that we make it as simple as possible, and do not allow the same variable name to have more than one meaning within a message (option E). I think this is unnecessarily duplicating the behaviour we have the explicit .input and .local for, and adding yet another way of doing the same thing. We should not be playing code golf with the syntax like this.

This is a real problem, and we should fix it, so doing nothing (option A) is not an acceptable option either. But at least it's not quite as bad as adding even more complexity and surprises to the syntax (options C, D, H).

@echeran
Copy link
Collaborator

echeran commented Aug 30, 2024

@sffc said:

My main priority is: Prevent situations where the same variable name means two different things in two different contexts. This includes when the variable is used in the .match and in match arms.

+1 fair enough

I put a higher value on clarity, readability, and simplicity than I do on brevity.

+1 absolutely

  • (A) Do nothing: ... This is the worst because it is a big footgun and, in my mind, one of the motivations behind inventing a new MF syntax in the first place.

Disagree with this characterization. FWIW, our Why MF2.0? doc from the beginning still holds up well. The more basic design points in the data model & syntax that needed to be addressed first about MF1:

  • Prevent nesting of messages inside messages when there are multiple selectors.
    • The nesting of messages in MF1 was effectively performing string concatenation, which is a rookie i18n mistake. MF1 effectively forced this pattern when multiple selectors were present for lack of a better option.
  • Allow users to bring their own formatting functions to the table
    • This is achievable through designing to interfaces. Having an interface for functions has allowed built-in and user functions to participate as equals, allows flexibility for implementers to wrap built-in functions, etc.
  • Add a way to declare and reuse a set of function arguments that are commonly reused
    • We have that with .local
      • .input is derivative of .local and is a convenience

It's important to point out that Option A does not literally mean "Do Nothing". You can solve the problem through the existing .local which was created for this problem, and if you want to get stricter and enforce, you can also enforce with a linter. In other words, requiring this via the spec/syntax is just one way to solve the problem, and does not incur incredible complexity to the mental model of readers.

  • (D) Allow mutable input declarative selectors: ...it appears that this option means that referencing a variable in match arms results in the same options as the .match clause, and it isn't possible to reference the "old" options associated with the variable inside the .match, so ...

This is exactly the significant complexity that Options D, E & H all add. The comments here & previously from @aphillips @macchiati and @sffc talk in terms of a single benefit but ignore/minimize the cost of the tradeoff required to pay for it. I would like to see discussion there because that's where I disagree in my evaluation. The description of the benefits isn't wrong, but that's the whole story.

Because these Options D, E & H are allowing .match blocks to do shadowing. And shadowing is something that we all agreed was a very undesriable thing -- did we decide that in early 2023 or was it 2022? (We seem to use the word "immutability" to describe not shadowing, but that's not what immutability is.) Violating "immutability" -- violating our long held desire to not shadow -- is undoing our own principles. Furthermore, how are we going to explain to users .local and .input, but then say .match does selection and one of those declarations, so the implicit declarations created by a .match don't interfere with previous declarations, or if they do and there is a corner case conflict, we'll plug that hole with this extra rule, ....etc. etc. And it's all not confusing enough to be noteworthy.

@eemeli said:

If we must complicate the .match so that it has a declarative side effect, then let's at least make sure that we make it as simple as possible, and do not allow the same variable name to have more than one meaning within a message (option E). I think this is unnecessarily duplicating the behaviour we have the explicit .input and .local for, and adding yet another way of doing the same thing. We should not be playing code golf with the syntax like this.

+1. In fact, the past 11 months of post-Seville syntax discussions feel a lot like repeatedly paying for concision / ease with added complexity. Concision != simplicity. Ease != simplicity. Ease is subjective measure, but complexity is objective. Options D, E & H are easy when you understand it, but it objectively complicates the reader's ability to understand the meaning of a variable depending on .locals and .inputss at the top but also whether there's a .match and whether the same variable is referenced again, or... It makes messages without .local and .input easier, but at what cost?

In Summary

Options D, E & H add a significant complexity to the ability for users to understand messages. I think the amount of complexity make them a non-starter, so I vetoed them. Proponents of those options should at acknowledge and address the negative impacts to the user, and whether other alternatives (Option F or Option A w/ linting) have such costs. The fact that these options re-introduce shadowing, which we previously rejected, brings up a meta-concern that comes up repeatedly.

Options B requires & Option C allows syntax that would be verbose and better cleaned up with .locals and such, so these options are strictly worse than the status quo (Option A).

Option G involves positional arguments, which have also been previously rejected by the group early on. In programming languages, they are less readable and lead to more brittle (non-forward-compatible) APIs. The costs of this option are obvious and other options have fewer costs, IMO.

I think only Option A & Option F address the problem without adding major extra complexity and without redundant requirements. I would be fine with either, although I prefer Option A with linting. it's worth pointing out that Option F is backwards compatible with Option A (status quo).

@sffc
Copy link
Member

sffc commented Aug 30, 2024

Hmm, I mostly agree with @echeran except on two points.

First, option A not only allows writing bad messages, but it makes it easy to do so and actively hides that there might be an issue. You can always fix it, but the job of syntax and API design is to make the right things easy and the wrong thing hard. This is why I put option A in last place.

Second, I don't agree with the characterization that option E complicates things like D and H. As noted in the other thread, option E is equivalent to saying that .match with an expression is simply equivalent to .input followed by .match. No scopes, and easy to implement and reason about.

I rejected A and B and not the others because my only hard line is that the correct message should be harder to write than the wrong message. The others are not bad enough for me to raise an objection, but I still prefer E and F.

@eemeli
Copy link
Collaborator

eemeli commented Aug 30, 2024

I rejected A and B and not the others because my only hard line is that the correct message should be harder to write than the wrong message.

I don't think this is the case with option B. There should not be a case where with that option an incorrect message is easier to write than a correct one. Would you have an example in mind where this happens?

@aphillips
Copy link
Member Author

@echeran noted:

Because these Options D, E & H are allowing .match blocks to do shadowing.

E doesn't shadow, since it forbids annotation of an already declared variable. This was an important part of my preferring this design.

Options D, E & H add a significant complexity to the ability for users to understand messages. I think the amount of complexity make them a non-starter, so I vetoed them. Proponents of those options should at acknowledge and address the negative impacts to the user, and whether other alternatives (Option F or Option A w/ linting) have such costs.

I am not sure what the significant complexity is for E?

.input {$num :integer}
.match {$num}
* {{This is the long hand}}

.match {$num :integer}
* {{This is E's shorthand for the above. What's hard to understand?}}

.input {$num :integer}
.match {$num :integer}
* {{This is an error because it redeclares {$num}. 
    Making it an error guards against mistakes
    made by translators and message authors
    who inadvertently made the selector different
    from the formatter via shadowing.}}

Option F is fine by me. It's the reverse of E, in that you have to put the annotation in a declaration in order to use a selector. It requires the use of .local to do selection-only annotation, such as @mihnita's example involving person names and gender selection. It requires writing a declaration every time one uses a selector, which I find unnecessarily verbose compared to E.

I personally think that .local should be avoided as much as possible because it introduces the possibility of message name collision with the passed in argument set. I have used the fact that one can pass in more arguments than are actually used in the message as a "feature" in MF1 extensively in the past. The .local feature is useful, but presents a risk of runtime error in such an environment. Requiring .local thus makes me nervous.

The problems with Option A are laid out by the design document (primarily in the Use Cases). I think there is a huge negative impact to the example message (a version of which I give just below), because the selector does the wrong thing. It is exceedingly weird to me to say that this message "does not shadow $num" because I can see $num being "declared" twice:

.input {$num :number minimumFractionDigits=1}
.match {$num :integer}
one {{You have {$num} banana. This message is wrong, because 1.1 banana is wrong}}
*   {{You have {$num} bananas.}}

Going back to the design doc--if you think there are "negative impacts to the user", can you describe them in use case/user story language? Or are they already covered? Impacts should appear in the design doc, because it's hard to respond to assertions of impact without knowing what you mean. I'm not saying such impact doesn't exist, but by making it visible allows us to have a conversation about the relative priority of each.

@bearfriend
Copy link
Contributor

I strongly second this by @sffc:

I put a higher value on clarity, readability, and simplicity than I do on brevity.

Being able to intuit the outcome of one's message is my absolute highest priority. Convenience and brevity is near the bottom (within reason).

(A) Do nothing:
I don't personally find selectors having no effect on subsequent placeholders to be an issue. In fact, I find it intuitive. However, I understand I may be in the minority here and that this is an issue in actual practice, not just perceived. So, I rate this low, but not unacceptable.

(B) Require annotation of selector variables in placeholders:
Even with convenience and brevity being low priorities for me, this just strikes me as the "lazy" solution that doesn't itself do no harm, so I find it unacceptable

(C) Allow both local and input declarative selectors with immutability:
I find the added complexity both to the syntax and cognitively to the user quite high, but it does follow easily definable behavior. This is my last acceptable option.

(D) Allow mutable input declarative selectors:
This option overall achieves the goals well, but the ability to redeclare inputs creates inconsistency I think unnecessarily.

(E) Allow immutable input declarative selectors:
All the benefits of (D) without the downside. While this option technically adds complexity it almost hides itself, acting only as shorthand the user doesn't even need to recognize. This is my first choice.

(F) Match on variables instead of expressions:
This creates extreme clarity, at what I judge as trivial expense. I rate this second only in acknowledgement that the vast majority of complex messages will use a single selector and, otherwise, no declarations.

(G) Provide a #-like Feature:
If the message is edited, either positional arguments will break, or a potentially less than ideal selector order is necessitated to avoid it. This makes it unacceptable. If this could be fleshed out to avoid that I would reconsider.

(H) Hybrid approach, Match may mutate, no duplicates:
This is also similar to (D), but fixes the mutability inconsistency by going the other direction from (E). That is, it allows even locals to be overwritten. The increased consistency is appreciated, but it feels overall like it moves in the the wrong direction to achieve it (and incompletely at that).

@mihnita
Copy link
Collaborator

mihnita commented Sep 1, 2024

At all costs, avoid the situation where the formatted value is not consistent with the selected value

There are use cases for not always doing that:

.match {$host :gender}
  female {{$host} invited you to her party}
  ...
}

With the type of $host being a Person class, I would expect that the formatted value uses a PersonNameFormatter.

I think most people would be surprised to see "female invited you to her party}" as a result.

Sure, it can be solved by saying "if you don't want consistency between match and format, you must define a .local"
But the same answer can be given the other way around: "if you want consistency between match and format, you must define a .local"

I'm not arguing one or another.

Only saying that I don't see this as a blocker for A or B.

@mihnita
Copy link
Collaborator

mihnita commented Sep 1, 2024

What I dislike about "(G) Provide a #-like Feature" is using a number.

That makes the message dependent on the order of the selector.
And that is completely irrelevant to the meaning of the text.

There are many reasons why we removed support for ordinal placeholders.
This solution would bring it back.

.match {$folderCount :integer} {$fileCount :integer}
* {You deleted {$1} files in {$0} folders ...}

Changing the order of the selectors in match changes the message. No reason why it should.

That is bad for localization.
It is often the case that translators, or TM leveraging, is not aware of the whole context.
Or translators don't look at the whole context, and it is unclear what {$0} means.

It also implies that placeholders start at 0, for some reason. That is a bias.

.match {$host :gender} {$guestCount :integer}
* {{$host} had {$1} guests at her party.}

Why $1 and not $2?


Lastly, even if we ignore this, the solution does not solve much.
The burden is still on the dev to know that if they want consistency they MUST use numeric {$0} instead of {$fileCount}.

It is as error-prone as doing nothing, with added complexity with the numbers.
And the complexity is for implementation, devs using it, and especially localization.

@mihnita
Copy link
Collaborator

mihnita commented Sep 1, 2024

+100 to sffc and bearfriend

I put a higher value on clarity, readability, and simplicity than I do on brevity.

We can assume that developers will have some kind of support from the IDE that will suggest [ .input, .local, .match ] as soon as I type . in code mode.
And will reduce it to .input (or even commit it) as soon as I type .i

Second, even if this is not the case, think about the time one spends on this as a developer.

I work on a feature for 2 weeks. Write a design document, write code, unit tests, deal with UX, security, accessibility, code review & feedback implementation, etc.

And for that I add 10 messages. One of them might have a selector, and we force them to be to add an explicit .local declaration.
No big deal.
We ask them to type some 10-20 extra characters in 2 weeks of development.

Brevity does not matter at all.

But if they save the characters and the result is not what they expected, the bug might be only detected later. Need to fix the message, code review again, maybe fix the translation in all the languages.

So clarity is paramount.

Hence my vote for F above all.

@echeran
Copy link
Collaborator

echeran commented Sep 2, 2024

@aphillips said:

I am not sure what the significant complexity is for E?

Let's think about it from the point of view of a user who has to parse messages written according to Option E. Here is what we're telling them:

  • If you have expressions that repeat the same operand and annotation (function and options), then use a .local. Now you can reuse that variable.
  • If you want to reuse the name of an external input value but with the same annotation (function & options) applied automatically in multiple places, then use .input. In effect, .input shadows the external input value, and it is derivative (you can achieve the effect w/o shadowing via .local, and shadowing is less likely to be readable vs. .local).
  • So far, if you see a variable passed as an operand somewhere later (.local, .match, a placeholder's expression), you know what to expect
  • We have .match, which matches on selectors
  • But with Option E, we now can no longer pass previously declared variables into .match. Why not?
  • ...because now .match doesn't just match selectors, it is also doing double duty as an implicit .input
  • So now with Option E, if you want to reuse an annotation (function & options) of an external input value, you must provide it in a selector expression in the .match line...
  • ...and the formatted version of that external input takes on the same name as the original input value, which is confusing
  • Ex from @mihnita right above:
    .match {$host :gender}
    female {{$host} invited you to her party}
    ...
    }
    
    In this example, $host now refers to male, female, etc. in the patterns
  • And now, if I want to satisfy an example from Use Case 6 of the design doc, which starts off as:
    .match {$person :gender}
    male {{Bienvenido {$person :personName}}}
    female {{Bienvenida {$person :personName}}}
    other {{Le damos la bienvenida {$person :personName}}}
    
    Option E gives very surprising behavior. Because what it would do is apply the :personName formatting function on values like male, female, etc.
  • Instead, Option E requires a .local before the .match in the above example
  • Why? Because Option E .match shadows $person, it doesn't just do matching.
  • It is objectively complex that .match causes dependencies between 2 distinct operations (matching, shadowing/scope/declaration). I'm not the first to bring this concern up, but here is where I explained this before.
  • Furthermore, the shadowing that Option E .match causes also leads to potentially confusing naming because of that shadowing. More complexity.

The series of rules and implications above are a consequence of the complexity -- distinct separate rules being made to interact with each other unnecessarily.

Option F is fine by me. ... It requires writing a declaration every time one uses a selector, which I find unnecessarily verbose compared to E.

I repeat my disagreement with "unnecessarily verbose". Again, Simplicity != concision. Simplicity != fewer things. We've made this point a half dozen times in this WG with the example of curly braces being optional around the branches of a C-style if statement when the branch body is only one statement. Perhaps it was made optional because it was "verbose" to type those extra curly braces, and that decision created a problem that has to be solved these days with linters requiring that we always type the curly braces, because the extra typing confers simplicity aka clarity & readability & avoids errors.

I far far prefer Option F over Options D, E, G, & H because of simplicity, which is achieved with a few extra characters, yes.

I would much rather type extra characters ("unnecessarily") than I would have distinct separate rules interacting with each other unnecessarily (complexity).

I personally think that .local should be avoided as much as possible because it introduces the possibility of message name collision with the passed in argument set. ... Requiring .local thus makes me nervous.

I appreciate the intent behind this statement, and maybe this statement explains a lot of the motivation of Option E. When it comes to .local vs .input when .input can apply, it's a reasonable instinct. We can document that, have linters, etc.

However, if we want to also reduce .local via making .match also behave like .input? Are the costs worth benefits?

The problems with Option A are laid out by the design document (primarily in the Use Cases). ... It is exceedingly weird to me to say that this message "does not shadow $num" because I can see $num being "declared" twice:

.input {$num :number minimumFractionDigits=1}
.match {$num :integer}
one {{You have {$num} banana. This message is wrong, because 1.1 banana is wrong}}
*   {{You have {$num} bananas.}}

Yes, that is an error-prone situation. Both Option F and Option A+linting are alternative ways of solving the problem.

Or we get rid of the .input construct (it is derivative of .local), which also prevent the above example from being possible as it avoids the naming confusion caused by the shadowing of .input.

Going back to the design doc--if you think there are "negative impacts to the user", can you describe them in use case/user story language? Or are they already covered? Impacts should appear in the design doc, because it's hard to respond to assertions of impact without knowing what you mean. I'm not saying such impact doesn't exist, but by making it visible allows us to have a conversation about the relative priority of each.

I created #867 to record these points & examples, and fitting the concise writing style of design docs. In my previous attempt to be thorough in the Delimiting Variant Patterns, you recommended to stay concise by removing details, which I did (relegated them to a collapsed zipper). Our problem these days isn't documentation but rather better adhering to the conclusions and reviewing the decision thought processes therein.

@aphillips
Copy link
Member Author

@echeran noted:

So now with Option E, if you want to reuse an annotation (function & options) of an external input value, you must provide it in a selector expression in the .match line...

I don't agree with this. If you want to "reuse an annotation", you just reuse it:

.input {$foo :function opt=val}
.match {$foo}
* {{{$foo} reuses the .input. Option E does not allow an expression in the .match}}

I also don't agree with the gender/personName examples. I think the disconnect is here:

In this example, $host now refers to male, female, etc. in the patterns

Actually, $host is immutable, so the annotation does not change its value. The key matching behavior of the function :gender doesn't alter $host in any way. You're already familiar with this from :number:

.local $x = {42}
.match {$x :number}
one {{The value of {$x} is never the string 'one' here.}}
few {{The value of {$x} is not 'few'. It is still 42. The selector matches the key 'few' in some locales}}
*   {{The meaning of life is {$x}}}

You cannot think of .input or .local as assigning anything. Their affect on the operand's value is an open question--we have use cases for functions affecting the value of the operand (e.g. {moo :uppercase} => "MOO" right?).


I'll have a look at #867.

@aphillips
Copy link
Member Author

See WG resolution at the end of #873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants