Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change AST to allow for zero-copy parsing #156

Open
zbraniecki opened this issue Jul 26, 2018 · 3 comments
Open

Change AST to allow for zero-copy parsing #156

zbraniecki opened this issue Jul 26, 2018 · 3 comments
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 syntax

Comments

@zbraniecki
Copy link
Collaborator

Zero-copy parsing can be extremely fast and brings significant memory savings.
What's more, ability to zero-copy parse does not prevent the AST from taking ownership over the data allowing for the original string to be discarded and the AST to be transferred, when needed.

In order to allow for zero-copy parsing, we'll need to introduce two changes:

  • Comments will become a vector of ropes (Rust: Vec<&str>) which will omit the comment sigil and store empty lines as "")
  • Pattern::TextElement will store unescaped strings and the only escaping needed on the parser level is \{ which will not terminate the TextElement.

This would mean that we'd need a separate step to process the text element in pattern and comment when necessary. @Manishearth suggested using COW [0] to lazily resolve/process those two data structures into an owner, unescaped and processed structures when needed.

[0] https://doc.rust-lang.org/std/borrow/enum.Cow.html

@stasm
Copy link
Contributor

stasm commented Jul 26, 2018

This would mean that we'd need a separate step to process the text element in pattern

Can you explain what you mean by this? In 0.6, escape sequences are already stored in their raw form. Unescaping is left up to the resolver.

Furthermore, depending on the discussion in #115 and #123, we might be able to remove some escape sequences from the grammar.

@Pike
Copy link
Contributor

Pike commented Jul 26, 2018

I think that, high-level, zero-copy parsing isn't a design goal for fluent. I think that some implementations might benefit from it, but it does come with down-sides. I'm not so happy about having designed compare-locales with a zero-copy parser, for example.

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I also have some more questions on the big scheme:

  • How abstract is the abstract syntax tree?
  • Do Fluent parsers need to return the AST used in the reference parser?
    -- if so, do they need to do so directly, or are intermediate representations OK?
  • Are AST nodes data containers or classes?
    -- i.e., is Comment.content data or an @property (py), .content() (rs)
  • Follow-up question to that, is the AST read-only or mutable?

These questions are all around what it actually means to implement Fluent. What's normative, what's informative, what's "we just had to write something here". That's going to be more relevant as we see people-not-us implementing Fluent.

Detail implementation note: it might be beneficial to document text-specials, syntax highlighters probably want those. Comments might benefit from an internal refactor if we include semantic comments in a formal way. It's probably not totally random that these have overlap with zero-copying.

@zbraniecki
Copy link
Collaborator Author

We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think.

I'm not sure if I agree.
In my view parsing is one of the most "popular" steps in the widest variety of operations on FTL. Some of those use cases further "use" the patterns - like runtime, others may just list IDs and peek into the pattern/value only occasionally.

Having a very robust and cheap parser benefits all of the use cases, so pushing out everything that is not necessary in parser out of parser, seems like a good design decision.
Whether we should prioritize it is a different question from whether we should aim for it.

I don't have ready answers to your listed questions, and I agree that they're the right ones to ask and discuss, but I have to one of them in context of my proposal:

Follow-up question to that, is the AST read-only or mutable?

zero-copy parser makes it transparent and allows us to start with a read-only cheap semantic slicing of the source string, that can be extended and mutated lazily when needed. That's where Manish suggested COW as a common mechanism used for that kind of behavior.

@stasm stasm added the FUTURE Ideas and requests to consider after Fluent 1.0 label Jul 27, 2018
@stasm stasm added the syntax label Oct 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 syntax
Projects
None yet
Development

No branches or pull requests

3 participants