Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax the indentation requirement #87

Closed
stasm opened this issue Jan 30, 2018 · 5 comments
Closed

Relax the indentation requirement #87

stasm opened this issue Jan 30, 2018 · 5 comments
Labels
forwards incompatible Old parsers won't parse newer files. syntax

Comments

@stasm
Copy link
Contributor

stasm commented Jan 30, 2018

This is a follow-up to #42. I'd like to go further and relax the indentation requirement for everything other than text content. This includes attribute and variant keys as well as expressions.

This doesn't mean that serializers would stop indenting message bodies for readability. We'll still need to define the style guidelines for linting FTL files in #27.

Thanks to #63 this change shouldn't hinder parsers' ability to recover at the next nearest message definition.

A few canonical examples of what would become possible:

multiline-text =
    Lorem ipsum dolor sit amet, consectetur adipiscing elit,
    sed do eiusmod tempor incididunt ut labore et dolore
    magna aliqua.

block-expr = { $num ->
   *[one]
        Lorem ipsum dolor sit amet, consectetur adipiscing elit,
        sed do eiusmod tempor incididunt ut labore et dolore
        magna aliqua.
}

And a few extreme ones (please don't write your FTL this way):

key1 = {
$num ->
*[one]
 One
}

key2 =
.attr = Attribute

key3 = {
key1
}

key4 = {
DATETIME
(
$date,
weekday
:
"short"
)
}

The goal is not to encourage people to write FTL like the above but rather, to accept as much input as possible while being as forgiving as possible.

@stasm stasm added this to the Syntax 0.7 milestone Jan 30, 2018
@stasm stasm added the syntax label Mar 26, 2018
@stasm stasm added syntax and removed syntax labels May 15, 2018
@stasm stasm removed this from the Syntax 0.7 milestone May 23, 2018
@stasm stasm added the forwards incompatible Old parsers won't parse newer files. label May 23, 2018
@Pike
Copy link
Contributor

Pike commented Aug 3, 2018

I've looked at the implementation in #159, and I like it.

I like that we can relax the acceptance of the parser without reducing the readability of the spec.

I also agree that we should continue to serialize with indention.

I'd also stick to serializing with trailing-} indent for now. Mostly because that allows us to not immediately push all implementations to production ;-) I'm happy to have a separate conversation about that in a quarter or so, though.

@zbraniecki
Copy link
Collaborator

I like it.

There's one thing I'm wondering about, both from the parser, and from the human eye perspective and it taps to the long standing debate about how the reader spots new message while scanning a file.

One opinion is that it happens when the reader sees a line starting with an identifier followed by = mark. Another is that it happens when the reader sees a line starting with an identifier.

Let's take a (slightly messy) example. Try to quickly count how many Messages are in the following sample:

this-is-my-key = This is my value
  .this-attribute1 = This is an attribute
this-is-another-key =
  This is another value for
  this-is-my-key and friends
this-is-a-third-key =
  This is a multiline message for
  which always has = after the message id
this-is-a-fourth-key = Yet another message
  with another line of text
this-is-a-fifth-key =
  .this-attribute1 = Yet another attribute

If you're in camp 1, you likely quickly scanned for = signs and backpaddled to see if the = is after an identifier which should be at the beginning of a line.
If you're in camp 2, you just tried to see an identifier character at the beginning of the line, and roughly assumed that it should be a beginning of the message.

I was always in camp 2, and it made me struggle to see the value of enforced = because I didn't use it in examples like that, but for FTL lists that had a lot of messages with just attributes and no value (like this-is-a-fifth-key) it brought noise.

Anyway, the reason I'm bringing it up is that in the current proposal, you relax the indentation rules around a different concept. One, that is not aligned with my human reading patterns despite several months of working with = enforced syntax.

From the extreme examples you provided:

key1 = {
$num ->
*[one]
 One
}

key2 =
.attr = Attribute

key3 = {
key1
}

key4 = {
DATETIME
(
$date,
weekday
:
"short"
)
}

key1 works because the line beginnings are $ and * which I know to be sigils, and the only line that could be confusing, conveniently starts with capital letter and a space. While a single space is not a lot, it's enough that combined with capital letter, I take a deeper look to "parse" the message and confirm that One is not a new key.

key2 works in a similar fashion - I see a . which tells me it's not a new message.

key3 is the one that really gets me. It feels like there's a broken message inside of a message. I understand that lack of = should help me, but it doesn't work at all.

key4 has mostly sigils to start lines except of all-caps DATETIME, which is not confusing mostly because it is all-caps and my brain don't associate message ids to be all-caps. The weekday line gets me again.

So, what did I learn about my processing? I learned that in order to mimic it in the syntax, we could say that a a-z after new line is a new message and forbid it anywhere else.
This means that:

  • error recovery is trivial. If it's a-z at the beginning of new line, it's a new message.
  • readability is high for all in camp2, but I assume that camp1 wouldn't have a problem with simpler recognition either
  • in syntax all we have to forbid is a-z showing up in patterns without indent.

While the logic of this proposal would change, the outcome would change only in a couple extreme cases:

  • The key3 example would require the variable to be indented
  • message indentation would only be required if the second+ line of the message starts with a-z.

As for caps A-Z we could either limit Identifiers to not start with them, or add them to the limitation. I haven't seen anyone trying to use caps in identifiers in a long time in any programming language, and since we want to use them in built-ins, I think it would be ok to not use them in ids.

I know my comment is long, but the proposal alteration is not that significant. Just instead of relax the indentation requirement for everything other than text content we'd use relax the indentation requirement for everything except of identifier-start.

wdyt?

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

Thanks for sharing your thoughts, @zbraniecki!

Why do you think that this should be enforced by the grammar, rather than by linting and style guides? I honestly don't suppose anyone will write their FTL the way key4 is written. It's an extreme example of what's possible but definitely not of what's recommended. It's similar to how other languages operate: if you write obfuscated and minified JS by hand, you don't expect people to instantly get it.

We can't protect users from their sense of aesthetics (or the lack thereof). If, as a localizer or developer, you want readable translation files, Fluent should give you every tool you need to do so. But you can be a bit sloppy, too, and things should mostly work, too. That's the Tolerance design principle at work.

we'd use relax the indentation requirement for everything except of identifier-start.

This is much harder to implement in practice. Or at least I'm not sure how to go about it. The way I'm thinking about the relaxation allows the parser to switch into a completely whitespace-insensitive mode as soon as it enters a placeable. The way you propose it, there would still be restrictions that apply depending on what's inside of the placeable.

Also, would you allow {\n-3.14} or require {\n -3.14}?

@zbraniecki
Copy link
Collaborator

My only point is that error recovery (find the next start of the message) which partially informs the decision here, can (and I argue that it should) be aligned with human readability - because both, the parser and the human are tasked with the same task - "find the start of the message".

And what we're trying to do here is find what we have to forbid in order for that task to be easy/possible.

As I said, I'm not interested in blocking your proposal - just provided an alternative. :) I'm happy to go with your approach.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

Thanks, @zbraniecki. I went ahead and merged the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forwards incompatible Old parsers won't parse newer files. syntax
Projects
None yet
Development

No branches or pull requests

3 participants