-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial AST dump #2
Comments
Should we overall start with AST or EBNF? Fluent's EBNF is here: https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf |
I chose this subset because I think it captures the essence of multiple valuable traits of Fluent that I would like to offer for consideration for MF 2.0:
This allows per-environment to do:
and
which addresses the part of the MF2.0 purpose of "being more flexible" - unicode-org/message-format-wg#84 In particular, it makes |
I have some comments:
I'll stop there, and hopefully some of that makes sense. I may have misunderstood things about Fluent, so please correct (and @mihnita, chime in on corrections). |
How about if we started by example, in terms of the use cases we'd like to handle? I personally find it hard to figure out whether an AST or EBNF actually supports what I'd like to do by staring at a wall of text. :) |
IMHO, examples of what we want structured comes even before that.
There is a practical point to keeping the "computational" part of a format string separate from human-readable (human-translatable) string as well. At some point (looking back to the ICU conference last October), it seemed to make sense to separate out parameter binding, values based on those parameters and pattern matching. Especially because I'd like to expand the set of possible transformations beyond plural and gender into inflections and then things get increasingly more interesting.
|
The main difference, in this mini-AST, would be that then each variant could have its own comments. I don't know if there's a value to that?
Good point. We can achieve it by doing: #[derive(Debug, PartialEq)]
pub struct Variant {
- pub key: VariantKey,
+ pub key: Vec<VariantKey>,
pub value: Pattern,
pub default: bool,
}
pub enum Expression {
InlineExpression(InlineExpression),
SelectExpression {
- selector: InlineExpression,
+ selector: Vec<InlineExpression>,
variants: Vec<Variant>,
},
} Does it sound good?
They don't yet in Fluent :( We so far only got to do it via nested selectors:
and plan to get back to flatten selectors here: projectfluent/fluent#4 to get
or
I believe we should support the flatten approach in MF 2.0. |
Some high-level thoughts about the things mentioned in this thread so far:
I'd suggest starting with the data model alone. No parsing, no EBNF. I think the prototype should be a vehicle for discussion about semantics, use-cases and requirements rather than about the syntax.
It would be interesting to experiment with a different approach than the one we know from MessageFormat and Fluent where select expressions go into placeables. I mean the approach where the branching logic happens first, before patterns are defined. I call this the exploded message approach; I'm sure there are better names ;) Rather than allow
I think there is! In fact, I think it would be intersting to consider what happens if all or most data nodes can have meta data attached to them. Things like: context, comments, examples, whether it can be translated, whether it can be re-positioned in the sentence, which grammatical case is used, etc. |
Thinking aloud: is there a requirement that MessageFormat 2.0 be encodable as a string? If it were encoded as a struct, it seems like the parsing machinery would not even be needed; or could reuse existing generic parsers like YAML. |
I agree that it would be interesting to try that. But we need to answer the question about nested selections in such a case. What happens when you have
This may be relatively easy to represent in the datamodel, but may be very very hard to represent in textual form. Maybe it's ok to have a more open datamodel, and let the textual representation be capable of expressing just some of the metadata.
We're not certain yet. For now we focus on non-textual representation, but I expect that for the Web usage we'll want a resource format, similarly to how we don't encode CSS in JSON/YAML, but rather have its own dedicated textual format. Bottom line is - I think for now we should focus on AST and data model, but the way we imagine what we want to express should take into account that one day we'll want to express it in a human-readable/writable format. |
I opened #6 to discuss AST of selectors vs placeholders. |
That's a great question, and I think it's something we can answer with a prototype :) Thanks for filing #6, I'll continue there. |
I think there is value. |
TLDR: I am with stasm@ on this one
So just data model + examples to show that it works. I think that EBNF focuses too much on the syntax part. It says stuff like:
when what we want is really: foo is an array of item(s) If we look at the EBNF doc used by So in this respect the rust code is more readable:
(or the same thing in proto syntax, |
I think it is. But likely not at this stage. That would have several benefits:
|
For the initial work, I suggest we take the fluent-rs AST: https://github.com/projectfluent/fluent-rs/blob/master/fluent-syntax/src/ast.rs
and design a vastly simplified subset of it that captures a single Message.
Something along the lines of:
The text was updated successfully, but these errors were encountered: