-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change AST to allow for zero-copy parsing #156
Comments
Can you explain what you mean by this? In 0.6, escape sequences are already stored in their raw form. Unescaping is left up to the resolver. Furthermore, depending on the discussion in #115 and #123, we might be able to remove some escape sequences from the grammar. |
I think that, high-level, zero-copy parsing isn't a design goal for fluent. I think that some implementations might benefit from it, but it does come with down-sides. I'm not so happy about having designed compare-locales with a zero-copy parser, for example. We shouldn't change the textual representation of a Fluent message or term in support of zero-copy parsing, I think. I also have some more questions on the big scheme:
These questions are all around what it actually means to implement Fluent. What's normative, what's informative, what's "we just had to write something here". That's going to be more relevant as we see people-not-us implementing Fluent. Detail implementation note: it might be beneficial to document text-specials, syntax highlighters probably want those. Comments might benefit from an internal refactor if we include semantic comments in a formal way. It's probably not totally random that these have overlap with zero-copying. |
I'm not sure if I agree. Having a very robust and cheap parser benefits all of the use cases, so pushing out everything that is not necessary in parser out of parser, seems like a good design decision. I don't have ready answers to your listed questions, and I agree that they're the right ones to ask and discuss, but I have to one of them in context of my proposal:
zero-copy parser makes it transparent and allows us to start with a read-only cheap semantic slicing of the source string, that can be extended and mutated lazily when needed. That's where Manish suggested COW as a common mechanism used for that kind of behavior. |
Zero-copy parsing can be extremely fast and brings significant memory savings.
What's more, ability to zero-copy parse does not prevent the AST from taking ownership over the data allowing for the original string to be discarded and the AST to be transferred, when needed.
In order to allow for zero-copy parsing, we'll need to introduce two changes:
Vec<&str>
) which will omit the comment sigil and store empty lines as""
)\{
which will not terminate the TextElement.This would mean that we'd need a separate step to process the text element in pattern and comment when necessary. @Manishearth suggested using COW [0] to lazily resolve/process those two data structures into an owner, unescaped and processed structures when needed.
[0] https://doc.rust-lang.org/std/borrow/enum.Cow.html
The text was updated successfully, but these errors were encountered: