Change AST to allow for zero-copy parsing #156

zbraniecki · 2018-07-26T06:18:15Z

Zero-copy parsing can be extremely fast and brings significant memory savings.
What's more, ability to zero-copy parse does not prevent the AST from taking ownership over the data allowing for the original string to be discarded and the AST to be transferred, when needed.

In order to allow for zero-copy parsing, we'll need to introduce two changes:

Comments will become a vector of ropes (Rust: Vec<&str>) which will omit the comment sigil and store empty lines as "")
Pattern::TextElement will store unescaped strings and the only escaping needed on the parser level is \{ which will not terminate the TextElement.

This would mean that we'd need a separate step to process the text element in pattern and comment when necessary. @Manishearth suggested using COW [0] to lazily resolve/process those two data structures into an owner, unescaped and processed structures when needed.

[0] https://doc.rust-lang.org/std/borrow/enum.Cow.html

The text was updated successfully, but these errors were encountered:

stasm · 2018-07-26T06:59:59Z

This would mean that we'd need a separate step to process the text element in pattern

Can you explain what you mean by this? In 0.6, escape sequences are already stored in their raw form. Unescaping is left up to the resolver.

Furthermore, depending on the discussion in #115 and #123, we might be able to remove some escape sequences from the grammar.

Pike · 2018-07-26T09:09:44Z

I think that, high-level, zero-copy parsing isn't a design goal for fluent. I think that some implementations might benefit from it, but it does come with down-sides. I'm not so happy about having designed compare-locales with a zero-copy parser, for example.

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I also have some more questions on the big scheme:

How abstract is the abstract syntax tree?
Do Fluent parsers need to return the AST used in the reference parser?
-- if so, do they need to do so directly, or are intermediate representations OK?
Are AST nodes data containers or classes?
-- i.e., is Comment.content data or an @property (py), .content() (rs)
Follow-up question to that, is the AST read-only or mutable?

These questions are all around what it actually means to implement Fluent. What's normative, what's informative, what's "we just had to write something here". That's going to be more relevant as we see people-not-us implementing Fluent.

Detail implementation note: it might be beneficial to document text-specials, syntax highlighters probably want those. Comments might benefit from an internal refactor if we include semantic comments in a formal way. It's probably not totally random that these have overlap with zero-copying.

zbraniecki · 2018-07-26T16:03:13Z

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I'm not sure if I agree.
In my view parsing is one of the most "popular" steps in the widest variety of operations on FTL. Some of those use cases further "use" the patterns - like runtime, others may just list IDs and peek into the pattern/value only occasionally.

Having a very robust and cheap parser benefits all of the use cases, so pushing out everything that is not necessary in parser out of parser, seems like a good design decision.
Whether we should prioritize it is a different question from whether we should aim for it.

I don't have ready answers to your listed questions, and I agree that they're the right ones to ask and discuss, but I have to one of them in context of my proposal:

Follow-up question to that, is the AST read-only or mutable?

zero-copy parser makes it transparent and allows us to start with a read-only cheap semantic slicing of the source string, that can be extended and mutated lazily when needed. That's where Manish suggested COW as a common mechanism used for that kind of behavior.

stasm added the FUTURE Ideas and requests to consider after Fluent 1.0 label Jul 27, 2018

stasm added the syntax label Oct 16, 2018

This was referenced Oct 19, 2018

Remove backslash escapes from TextElement #123

Closed

Create explicit AST node for escape sequences #195

Closed

stasm mentioned this issue Feb 21, 2019

Remove unescaped string literals from the AST #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change AST to allow for zero-copy parsing #156

Change AST to allow for zero-copy parsing #156

zbraniecki commented Jul 26, 2018

stasm commented Jul 26, 2018 •

edited

Loading

Pike commented Jul 26, 2018

zbraniecki commented Jul 26, 2018

Change AST to allow for zero-copy parsing #156

Change AST to allow for zero-copy parsing #156

Comments

zbraniecki commented Jul 26, 2018

stasm commented Jul 26, 2018 • edited Loading

Pike commented Jul 26, 2018

zbraniecki commented Jul 26, 2018

stasm commented Jul 26, 2018 •

edited

Loading