-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create explicit AST node for escape sequences #195
Comments
I've looked at how other languages deal with it and I didn't find any which would put escape sequences into separate AST nodes. I used https://astexplorer.net/ and https://docs.python.org/3/library/ast.html (also available online at https://python-ast-explorer.com/). Can we achieve the same thing by specifying in written text how the known escapes, which are well defined by the grammar, should be handled? |
@Manishearth - can you advise us on this issue? |
Looking at the ECMA262, the spec has a separate paragraph about the semantics of string values in which it defines the list of known escape sequences and their expect values. Maybe it would be enough to do the same in Fluent's specification? |
Typically escape sequences are handled by the tokenizer, not the parser, and are absent from the AST. |
One aspect that's perhaps a bit special about the tooling design around Fluent is that we care about round-tripping between parsing and serialization. In that sense, a parsing step that discards information is something we try to avoid. I'm also looking at this from the POV of other people implementing runtimes. The more explicit our reference implementation is, the easier it should be for others to implement a spec-compliant Fluent implementation. I assume that it's easier to optimize things out of the algorithms in the reference implementation than to add them. That goes along the lines of what Manish said, often runtime implementations will deal with things like escape sequences really early, and throw the detail information away. I wonder if we should re-introduce a |
Can you expand on why? You're already filtering out whitespace, presumably. Most languages deal with this by tagging the AST with spans, so you also know what the original code looked like if you ever need this from a tooling perspective (e.g. if writing an autoformatter). Spans are in general a really convenient way of supporting a lot of related tooling use cases and usually help keep the AST simple and usable. |
Tools like Pontoon parse translations stored in Fluent files, let localizers edit them, and then serialize them back to Fluent. It's true that the serialization currently discards the original whitespace and use its own pretty-printing rules. We still want to preserve as much of the original content as possible. If the localizer uses |
This could be specified as an implementation note accompanying the |
Well, you can go the other way and serialize as escapes, but that's not great either. Yeah, you want to use a span here so that the original string is obtainable. Alternatively, store the original string alongside the parsed one in case there were escape sequences. I think for tooling you'll eventually need spans anyway, but storing strings alongside is a valid solution that doesn't require adding support for spans. |
So then we'd have Would you also recommend that |
I should also mention that our tooling parsers do support spans. They are useful in many read-only scenarios, but are they also a good solution when tools like Pontoon allow localizers to edit the content of translations stored in the AST? |
I'd have
Yep, because you don't need to store a |
Please also consider that we're creating Fluent AST from scratch for migrations. In my experience, spans are often in the way of testing and mocking, as you end up creating fake content with fake offsets into it. |
In that case just store the optional rawValue :) Spans let you have an optimization so that you don't have to store rawValue, but that's just an optimization |
Thanks for sharing your thoughts and experiences. I'd like to propose a way forward here. My goal is to keep things as simple as possible while still allowing the specification to be, well, specific about the expected behavior of the escapes. Following @Manishearth's advice, let's add a
{
"type": "StringLiteral",
"raw": "a\\u2013z",
"value": "a−z"
} For simplicity, I would make I'll prepare a PR. |
#203 added a |
Right now, the runtime behavior of
\XXX
is only implicitly defined by the js runtime.To fix this, we should have a dedicated object in our AST tree that represents a escaped character, that is Unicode escapes, and escapes for syntax chars.
This is half of #156 with a different rationale. The AST might look similar to #123 (comment).
The text was updated successfully, but these errors were encountered: