Refactor of the code, 0.8 compatibility, zero-copy AST/parsing, Rust 2018 goodies #76

zbraniecki · 2018-12-21T04:37:42Z

This is a major rewrite, to the point where I'm not even sure how to exactly plan reviewing it and landing, but I think it is getting close to being done.

There's one open question about FluentBundle::add_messages which I posted in projectfluent/fluent.js#328

This, unfortunately, break a lot of examples/tests in fluent-bundle as before they freely passed &str and constructed FluentResource on fly.
We'll need to answer that question, and then clean up the code, make sure that such zero-copy abstraction works well for Amethyst, and begin reviewing it to land.

I believe that the proposed structure and memory allocation model should bring us close to the model we'll use for fluent-rs 1.0.

Add a simple binary for testing Support menubar.ftl except of expressions Support multiline patterns Add EOF tests support Support basic JSON fixtures Support first set of comparison tests Support most of the comments Use serializer instead of deserializer Uncomment some fixture tests with slight modifications Support Junk Use ps.take_if in more places Support all reference fixtures Fix benches Support TermReference Support variants and select expression Update to rust 2018 preview 2 Use 2018 use extern macros Clean up whitespace collecting Remove try Update bench to Rust 2018 Support parsing of 0.7 reference tests Support CallExpressions

zbraniecki · 2018-12-21T04:43:43Z

My initial thinking is to split this PR into 4 PR's each updating a separate crate and get them reviewed separately, releasing them in order fluent-syntax -> fluent-cli -> fluent-bundle -> fluent.

zbraniecki · 2018-12-21T04:44:55Z

For fluent's ResourceManager I'll want to migrate it to use Manish's elsa - https://github.com/Manishearth/elsa

zbraniecki · 2018-12-21T04:45:28Z

This also resolves #16

This is a massive rewrite of the whole crate. It separates out three crates: fluent-syntax becomes a zero-copy AST+parser(+serialized in the future) compabile with Fluent 0.8 fluent-bundle becomes aligned with fluent-bundle in JS fluent becomes a top-level crate with added simple ResourceManager. All of the code is compatible with current stable Rust 2018.

stasm · 2018-12-27T14:08:38Z

My initial thinking is to split this PR into 4 PR's each updating a separate crate and get them reviewed separately, releasing them in order fluent-syntax -> fluent-cli -> fluent-bundle -> fluent.

What is the benefit of going for 4 separate packages? In my experience, too much granularity makes publishing new versions tedious and it also means that there are more permutations of packages installed in the environment at the same time, which makes it easy to run into errors due to mismatched dependencies.

I can see the benefit of publishing fluent-cli as a separate package, since that's a bin-type crate, rather than lib. I would recommend putting the other three in a single fluent crate.

zbraniecki · 2018-12-28T21:09:17Z

What is the benefit of going for 4 separate packages?

Each package has different set of requirements and "adds up". The fluent-syntax, much like in case of JS and python will be useful for tooling environments which don't need anything else beyond parse/ast/serialize cycle.
fluent-bundle adds the resolver but remains independent of any I/O making it the right level for environments such as gaming systems or Gecko which provide their own resource management systems.
fluent is the complete package which provides its own, minimal, ResourceManager minimizing the amount of custom code one has to write to make their project localizable.
fluent-cli (or maybe fluent-cli-tools?) is a set of binaries to work with ftl/fluent which may eventually become an umbrella for a collection of separately released tools (fluent-cli-parse, fluent-cli-validator etc.).

Does it make sense?

In my experience, too much granularity makes publishing new versions tedious and it also means that there are more permutations of packages installed in the environment at the same time, which makes it easy to run into errors due to mismatched dependencies.

I think Rust cargo is really good with that and the ecosystem actively encourages dividing into many small crates which allows other to better pick which parts they need (see the family of serde crates) and allows for parallel compilation of the crates.

I would recommend putting the other three in a single fluent crate.

It would mean that any tool would have to compile the Bundle/Resolver/ResourceManager parts of Fluent even if all they need it FTL parser. I think that this use case is distinct enough to warrant splitting it out (much like librustc uses syntax, syntax_pos and even serialize as separate crates - https://github.com/rust-lang/rust/blob/master/src/librustc/Cargo.toml#L28).

As to if fluent-bundle should be separate crate, I'm less strongly opinionated.

… Cow

…ager via Cow" This reverts commit 6e02272.

zbraniecki · 2019-01-05T02:04:57Z

Hooray! This branch now compiles and all tests pass! The performance is good, but can be better with time.

Initially, with no optimization work put into the parser, I'm seeing:

master:

test bench_menubar_parse ... bench:     407,840 ns/iter (+/- 32,920)
test bench_simple_parse  ... bench:      45,287 ns/iter (+/- 4,698)

zero-copy-resolver:

test bench_menubar_parse ... bench:     155,191 ns/iter (+/- 31,100)
test bench_simple_parse  ... bench:      22,147 ns/iter (+/- 15,138)

the resolver benchmarks are preserved.

I'd like to land this update soon now, to bring the rust implementation to Fluent 0.8 syntax and move to work on documentation, errors, ergonomics and higher level bits (like ResourceManager).

zbraniecki · 2019-01-05T02:06:36Z

I asked @Manishearth to review the examples and crate split and he said that it seems like a good natural split of meta-crates.

I'd like to preserve the split between fluent and fluent-bundle and eventually consider moving the ResourceManager to its own crate like fluent-resource-manager and referencing it from the catch-all fluent crate.

zbraniecki · 2019-01-05T02:10:53Z

@Xaeroxe - can you verify that the zero-copy-resolver branch works with Amethyst model? In particular I'd like to make sure that:

The zero-copy parser into fluent-bundle::resource::FluentResource which uses rental to allocate its String works well.
The API changes I'm making in fluent-bundle which allow libs like Amethyst to plug their own resource manager will work for amethyst-locale.
The POC of the fluent::resource_manager::ResourceManager is something that you'll either want to use, or replace with your own resource manager.

Thank you!

Added docs for FluentBundle. Also, corrected some errors in the pre-existing docs.

JohnHeitmann · 2019-01-07T22:18:54Z

fluent-bundle/src/resource.rs

+pub struct FluentResource(rentals::FluentResource);
+
+impl FluentResource {
+    pub fn try_new(source: String) -> Result<Self, (Self, Vec<ParserError>)> {


Can/should this take Into<String> for better caller ergonomics?

We considered that with @Manishearth and he recommended not doing that in order to avoid hiding the allocation.
While ergonomically that would be nice, it would also make it harder for the reader to understand that we will allocate the String and store it inside of the FluentResource.

In the future, when static &str will become implicitly convertible to String [0] it should be easier.

[0] rust-lang/rust#46993

Note: a thing that could be done here is to make the return value tuple into a proper public struct so it's easier to document

JohnHeitmann · 2019-01-07T22:55:08Z

fluent-bundle/src/bundle.rs

+                            // XXX: In the future we'll want to get the partial
+                            // value out of resolver and return it here.
+                            // We also expect to get a Vec or errors out of resolver.
+                            return Some((path.to_string(), errors));


Can format_message be removed as part of this change? It looks like format does everything a caller might want. If not I'll add it to my doc pr.

It doesn't do everything. format allows you to get the message's value or a single attribute, but it doesn't allow you to retrieve value+attributes together.

format_message will be useful for higher level crates that will apply a localization message onto a widget/element.

format_message will be called compound in fluent.js. Would you like to make this change in fluent-rs in this PR or in a follow-up?

…rred others.

stasm · 2019-01-15T14:53:42Z

Each package has different set of requirements and "adds up". The fluent-syntax, much like in case of JS and python will be useful for tooling environments which don't need anything else beyond parse/ast/serialize cycle.
fluent-bundle adds the resolver but remains independent of any I/O making it the right level for environments such as gaming systems or Gecko which provide their own resource management systems.
fluent is the complete package which provides its own, minimal, ResourceManager minimizing the amount of custom code one has to write to make their project localizable.
fluent-cli (or maybe fluent-cli-tools?) is a set of binaries to work with ftl/fluent which may eventually become an umbrella for a collection of separately released tools (fluent-cli-parse, fluent-cli-validator etc.).

Does it make sense?

Yes, it makes sense, thanks. My question was driven by pragmatism and the following question: do we have use-cases for so much granularity? Sometimes it's better to design hierarchies based on the observed use-cases rather than on how things derive from each other logically.

However, since you already did all the work to separate them, it's probably best to keep them that way.

I think Rust cargo is really good with that and the ecosystem actively encourages dividing into many small crates which allows other to better pick which parts they need (see the family of serde crates) and allows for parallel compilation of the crates.

This isn't as much about cargo as it is about the rigor to always update the readmes, the changelogs, the dependencies of higher-level packages etc. Having more packages to juggle also means that it's easier to create permutations of versions which we haven't thought about.

stasm

I'll need a few more days to finish the review. Thanks for the patience. I have a few comments and questions which perhaps you could answer in the meantime.

It would indeed be easier to review this as a series of logically separate patches, as you suggested in an earlier comment. At the same time I realize that doing so would break some (or a lot of) tests, and fixing them in between the patches would be a lot of unnecessary work. In the future, I think I'd be OK with disabling tests aggressively until all parts of the refactor come together. For this PR, let's leave it as-is. (You might still want to clean up the commit history a bit, unless you're planning to squash all of it into a single commit).

I would also (strongly) prefer to land the ResourceManager in a separate PR altogether. Let's make this PR about fluent-syntax, fluent-bundle and fluent-cli.

I'll try to finish the review by the end of this week.

stasm · 2019-01-15T14:34:36Z

fluent-bundle/src/bundle.rs

+                            // XXX: In the future we'll want to get the partial
+                            // value out of resolver and return it here.
+                            // We also expect to get a Vec or errors out of resolver.
+                            return Some((path.to_string(), errors));


format_message will be called compound in fluent.js. Would you like to make this change in fluent-rs in this PR or in a follow-up?

stasm · 2019-01-15T14:42:50Z

fluent-bundle/src/lib.rs

+pub mod errors;
+pub mod resolve;
+pub mod resource;
+pub mod types;


Could we use pub use here to re-export the most common symbols, similar to how we do it in fluent.js?

For consumers, this would simplify their use imports:

// BEFORE use fluent_bundle::bundle::FluentBundle; use fluent_bundle::types::FluentValue; use fluent_bundle::resource::FluentResource;

// AFTER use fluent_bundle::{FluentBundle, FluentResource, FluentValue};

I wanted to wait with such convenience parts until the code stabilizes a little bit. In my mind it's easier to add it than to remove it.
Do you feel like those three structs are certain to be the ones we want to expose on the main crate?

These are the three that we've seen used by the consumers of fluent in JS. I think it's OK to do it in a follow-up, too. Up to you.

fluent-bundle/src/resolve.rs

fluent-cli/Cargo.toml

stasm · 2019-01-15T15:35:25Z

fluent-syntax/src/parser/ftlstream.rs

-            _ => false,
-        };
+    pub fn get_slice(&self, start: usize, end: usize) -> &'p str {
+        unsafe { str::from_utf8_unchecked(&self.source[start..end]) }


What are the consequences of using unsafe and str::from_utf8_unchecked here? Does the code check somewhere that the input is valid UTF8? Perhaps we should allow the users of the library to opt into the use of the fast & unsafe code (when they are sure all input is valid UTF8), but the code should default to using str::from_utf8?

The consequence is that I'm using our collective brain power, instead of compiler's brain power, to ensure that I'm not slicing outside of the boundary.
The boundary checks are relatively expensive and since we're a full byte parser I believe we always attempt to slice within the range.

So, it's not about whether the input is valid UTF8, it's only about whether the parser always selects the right start/end combo that's within the source slice.

I'm going to use a fuzzer to further verify that claim, but I'd prefer not to overblow the lib with options that don't really mean too much to the user. Currently if you pick wrong start/end, the parser crashes, but if we instead use from_utf8 we'll likely use it with expect which will panic as well :)

If while working on 1.0 we'll discover that we have a lot of places in the parser that pick start/end wrong we can try to mitigate it here, but I'd honestly prefer to fix them in the parser.

Please, not that unsafe code is not "bad" per design. It makes total sense to use it in many cases, and people like Henri are actively asking for improvements in how we calculate boundary check requirements to minimize when we need them: https://hsivonen.fi/rust2019/ .

Our problem is that we have code like:

let source = "id = Foo".as_bytes(); let ptr = 0; while let Some(b) = source.get(ptr) { if b == b'i' { if let Some(b2) = source.get(ptr + 1) { if b2 == b'd' { let id = unsafe { str::from_utf8_unchecked(ptr, ptr+1); } } } } ptr += 1; }

(sorry for how stupid the code is). It would be insanely hard for the compiler to deduct that it can use boundary check elision, but we can deduct that quite easily - we just checked for what is at ptr, and what is at ptr+1.

why not store the string as &str and use as_bytes() in your processing code everywhere, so that you can safely subslice it with byte indices?

The consequence is that I'm using our collective brain power, instead of compiler's brain power, to ensure that I'm not slicing outside of the boundary.

I understand that unsafe is sometimes necessary and a good design may benefit from it. I question whether it's the right choice for the first iteration of this parser. Relying on collective brain power seems to be specifically what Rust was designed to work around, because the history shows us that brains are fallible.

The boundary checks are relatively expensive and since we're a full byte parser I believe we always attempt to slice within the range.

Just looking at how get_slice is implemented in skip_unicode_escape_sequence I was able to come up with the following test case which makes the parser panic:

x = {"\\u

I tested it with the following command:

./target/debug/fluent-cli <(echo -n 'x = {"\\u')

We should at least take special care when slicing using self.ptr + 1. With the checked version of from_utf8 we'd be forced to support the out of bounds index in the code, which would prevent the panic.

If while working on 1.0 we'll discover that we have a lot of places in the parser that pick start/end wrong we can try to mitigate it here, but I'd honestly prefer to fix them in the parser.

I'd prefer to start with the safer (even if slower) approach and only consider unsafe when we need to squeeze even more performance from the parser.

Digging a bit deeper into this, the panic seems to originate from &self.source[start..end] rather than the unsafe call to from_utf8_unchecked. Is there a checked way to index into a slice?

OK, so I think there are two things going on here. Can you please confirm that my understanding is correct?

When creating the slice of bytes with &self.source[start..end], end might go out of bounds. We should check that it isn't or maybe use an iterator and the checked slice::get method. The x = {"\\u testcase exposes this bug.

This new slice is then passed to from_utf8_unchecked which is an unsafe method. If the slice of bytes contains a sequence of bytes which is invalid in UTF8, this method will panic. Realistically this is unlikely to happen with files which are read from disk and validated as UTF8, but might still be an issue when the data is not guaranteed to be UTF8.

(…) might still be an issue when the data is not guaranteed to be UTF8.

Is this even possible given that FluentResource::try_new takes a String?

The x = {"\u testcase exposes this bug.

I fixed it and reviewed other places where this may happen. Found no other so I think we're good.

This new slice is then passed to from_utf8_unchecked which is an unsafe method. If the slice of bytes contains a sequence of bytes which is invalid in UTF8, this method will panic

No, we don't care if the file is UTF8 or if it contains invalid UTF8 tokens because the boundaries we set are validated to be. In other words, since Fluent tokens end on valid UTF8 chars like "\n", "{" etc. we can establish a range and know that we're slicing at the right place. What's inside, doesn't really matter to us. So, if the entry looks like this:

key = This is a XXX value { $foo }

then we slice out three times:

key which has a start and end known to be a valid alpha character and every character inside as well.

This is a XXX value which has start as the first non-space char, and end as one-before { char, so also no risk of cutting in the middle of a multi-byte char.

foo - similar to (1)

Even if in (2) the XXX is invalid UTF8, we don't care about it. It still slices properly because of what is a boundary in Fluent.

stasm · 2019-01-15T15:38:04Z

fluent-syntax/src/parser/mod.rs

@@ -1,14 +1,842 @@
-//! AST, parser and serializer operations


Dir layout: Is there a tangible benefit to keeping the parser code in its own directory, and then the errors in parser/errors/mod.rs? For my own code, I try to avoid nested directory hierarchies. As a rule of thumb, a directory with one or two files in it is probably not worth it :) I'd recommend placing all the files in fluent-syntax/src/.

I think there's a value in keeping the parser in its own directory. This way I can easily tell which parts (ftlstream/errors) are parser specific, while ast stays crate specific (and in the future I expect serializer to get its own dir). I removed the errors dir.

I still think that this is unnecessary proliferation of directories, but it's also a detail, so not blocking at all.

I prefer to keep it in its own directory. I'm experimenting with adding lexer.rs there as well and I like to keep parser specific code in it away from ast/serializer.

Add docs for FluentBundle

zbraniecki · 2019-01-16T20:19:28Z

I would also (strongly) prefer to land the ResourceManager in a separate PR altogether. Let's make this PR about fluent-syntax, fluent-bundle and fluent-cli.

Done!

(You might still want to clean up the commit history a bit, unless you're planning to squash all of it into a single commit).

I do plan to squash.

I'm keeping @Manishearth suggestion open, and I believe the rest of the feedback has been applied. I'll try to investigate his suggestion, but I believe this shouldn't block the rest of review and can be done in a separate PR prior to release.

Pike

I went through the parser/mod.rs this time, and I have to admit, I didn't understand all of it.

Some questions below are somewhat rust-related questions, some are just code questions.

I noticed that escapes in literals don't work yet. Also, I noticed two panics in the examples.

Pike · 2019-01-18T15:49:54Z

fluent-syntax/src/parser/mod.rs

+    Ok(pattern.map(ast::Value::Pattern))
+}
+
+fn get_attributes<'p>(ps: &mut ParserStream<'p>) -> Result<Vec<ast::Attribute<'p>>> {


Should we make explicit that this method is effectively infallible?

For now, yes. I'd like to consider improving the recovery here eventually by recognizing that the error originated in the attribute, jumping to the next line that either is a start of new entry or starts with __"." and if the latter, resume parsing attributes for that entry.

This would allow us to recover from:

key = Value .attr = Value { $4 } .attr2 = Value 2

by salvaging the attr2 into key message. But that's beyond the scope of this patch, so for now, I fixed it :)

Pike · 2019-01-18T15:58:03Z

fluent-syntax/src/parser/mod.rs

+    while let Some(b) = ps.source.get(ptr) {
+        if ps.is_byte_alphabetic(*b) {
+            ptr += 1;
+        } else if ptr == ps.ptr {


I find the start_of_id logic in the middle of the loop a tad hard to grasp. Is this actually simpler than doing an explicit first char?

Yes. Since get_identifier is such a hot loop I tested tons of variants and this is the fastest one. I'm going to switch to a more readable one for the sake of landing it sooner, and we can reevaluate perf optimizations later.

Pike · 2019-01-18T16:15:00Z

fluent-syntax/src/parser/mod.rs

+                    TextElementTermination::LineFeed,
+                ));
+            }
+            b'\r' if ps.is_byte_at(b'\n', ps.ptr + 1) => {


Can this go out-of-bounds for a stream ending in \r? No idea what happens in rust if that happens.

I think this is OK. ParserStream::is_byte_at uses the checked slice::get method which returns None whenever the unchecked one would panic.

Pike · 2019-01-18T16:24:18Z

fluent-syntax/src/parser/mod.rs

+    };
+
+    if !is_valid {
+        //XXX: Generalize error type


Or the opposite, make the error type more specific?

Pike · 2019-01-18T16:27:44Z

fluent-syntax/src/parser/mod.rs

+                named,
+            }
+        } else {
+            return error!(ErrorKind::ForbiddenCallee, ps.ptr);


Does rust do early returns? I.e, could we just return an error on invalid, and not indent the valid part?

stasm · 2019-01-22T12:44:23Z

fluent-cli/src/main.rs

+        .version("1.0")
+        .about("Parses FTL file into an AST")
+        .args_from_usage(
+            "-s, --silence 'disable output'


Nit: call it --silent and use sentence case for the help text, Disables the output, to make it consistent with the rest of the usage description.

Also, what is the goal of this option? It looks like it disables all output. I would expect it would silence the error reporting but still produce the AST.

I think I copied it from the JS impl and the goal was to be able to time the parser without timing the formatting and printing of the output. I changed it to only disable errors for now.

stasm · 2019-01-22T12:44:54Z

fluent-cli/src/main.rs

+        .about("Parses FTL file into an AST")
+        .args_from_usage(
+            "-s, --silence 'disable output'
+             <INPUT> 'Sets the input file to use'",


Nit: Call it <FILE> and change the help text to Fluent file to parse.

stasm · 2019-01-22T14:02:37Z

fluent-syntax/src/parser/ftlstream.rs

-        if self.ch == Some(ch) {
-            self.next();
-            return true;
+    pub fn take_if(&mut self, b: u8) -> bool {


Nit: Perhaps rename this to take_byte or consume_byte for symmetry with expect_byte?

expect_* is a fallible method that returns an error. I'll rename it to take_byte_if.

stasm · 2019-01-22T14:07:18Z

fluent-syntax/src/parser/mod.rs

+                    TextElementTermination::LineFeed,
+                ));
+            }
+            b'\r' if ps.is_byte_at(b'\n', ps.ptr + 1) => {


I think this is OK. ParserStream::is_byte_at uses the checked slice::get method which returns None whenever the unchecked one would panic.

stasm

Another round of comments. I'll finish the review today, I'd like to take another look at the ResolveValue trait.

I believe this shouldn't block the rest of review and can be done in a separate PR prior to release.

I don't mean to block this PR, but I really think there's something not right about the use of unsafe in ParserStream. Would you agree to use the checked variant of from_utf8 in this PR to land it without controversies, and then discuss switching to unsafe in a follow-up?

stasm · 2019-01-22T14:14:09Z

fluent-syntax/src/parser/ftlstream.rs

-            }
-        }
+    pub fn is_byte_digit(&self, b: u8) -> bool {
+        b >= b'0' && b <= b'9'


Use u8::is_ascii_digit here?

stasm · 2019-01-22T14:14:19Z

fluent-syntax/src/parser/ftlstream.rs

-            return Ok('-');
-        }
+    pub fn is_byte_alphabetic(&self, b: u8) -> bool {
+        (b >= b'a' && b <= b'z') || (b >= b'A' && b <= b'Z')


Use u8::is_ascii_alphabetic here?

stasm · 2019-01-22T15:51:34Z

fluent-syntax/src/parser/ftlstream.rs

-            _ => false,
-        };
+    pub fn get_slice(&self, start: usize, end: usize) -> &'p str {
+        unsafe { str::from_utf8_unchecked(&self.source[start..end]) }


The consequence is that I'm using our collective brain power, instead of compiler's brain power, to ensure that I'm not slicing outside of the boundary.

I understand that unsafe is sometimes necessary and a good design may benefit from it. I question whether it's the right choice for the first iteration of this parser. Relying on collective brain power seems to be specifically what Rust was designed to work around, because the history shows us that brains are fallible.

The boundary checks are relatively expensive and since we're a full byte parser I believe we always attempt to slice within the range.

Just looking at how get_slice is implemented in skip_unicode_escape_sequence I was able to come up with the following test case which makes the parser panic:

x = {"\\u

I tested it with the following command:

./target/debug/fluent-cli <(echo -n 'x = {"\\u')

We should at least take special care when slicing using self.ptr + 1. With the checked version of from_utf8 we'd be forced to support the out of bounds index in the code, which would prevent the panic.

If while working on 1.0 we'll discover that we have a lot of places in the parser that pick start/end wrong we can try to mitigate it here, but I'd honestly prefer to fix them in the parser.

I'd prefer to start with the safer (even if slower) approach and only consider unsafe when we need to squeeze even more performance from the parser.

stasm

OK, I think I'm mostly done with the review. It's a lot of code; thanks for the patience. I'd like to see a version of this PR with my comments addressed. The most important ones are:

Comment the enums and their role in get_pattern.
Add protection against out-of-bounds indexing into ps.source. (And fix the bug I found yesterday.)
Sometimes the code uses unwrap in places where I imagine panic are not possible if the AST was parsed by the parser. Please consider starting a conversation about the expectation towards this code when this assumption is not satisfied.
File issues for the things we know are follow-up material.

Thanks!

stasm · 2019-01-23T15:48:16Z

fluent-bundle/src/resolve.rs

+        match self {
+            ast::InlineExpression::StringLiteral { raw } => {
+                // XXX: We need to decode the raw into unicode here.
+                Ok(FluentValue::from(*raw))


Please file an issue about this and reference it here.

stasm · 2019-01-23T16:12:34Z

fluent-bundle/src/resolve.rs

+                Ok(FluentValue::from(*raw))
+            }
+            ast::InlineExpression::NumberLiteral { value } => {
+                Ok(FluentValue::as_number(*value).unwrap())


I assume that you're using unwrap here (and a few more places) because the as_number conversion cannot fail if the AST node comes from the parser. Any invalid input would be caught by the parser as a syntax error. This poses an interesting question about the goals of the resolver: what should it do about AST nodes which have been tinkered with in code? I think it's totally OK to expect them to not resolve properly. However, I'm not sure if the program should panic in such cases. Do you have thoughts on this?

Not really. I must say that I see this patchset as bringing the parser to current state, and I expect us to only after that start cleaning up the resolver, so I was kind of ok with introducing half-baked unwraps like this. I'll file a follow-up for them.

stasm · 2019-01-23T16:13:43Z

fluent-bundle/src/resolve.rs

+                ref positional,
+                ref named,
+            } => {
+                let mut resolved_unnamed_args = Vec::new();


Perhaps rename this to resolved_positional_args?

stasm · 2019-01-23T16:58:02Z

fluent-syntax/src/parser/mod.rs

+                },
+                ptr
+            );
+        } else if ps.is_byte_digit(*b) || [b'_', b'-'].contains(&b) {


Nit: you don't need &b in contains() here.

stasm · 2019-01-23T17:33:46Z

fluent-syntax/src/parser/mod.rs

+#[derive(Debug)]
+enum PatternElementPointers<'a> {
+    Placeable(ast::Expression<'a>),
+    TextElement(usize, usize, usize, TextElementPosition),


Nit: Maybe call them PatternElementPlaceholders?

stasm · 2019-01-23T19:35:07Z

fluent-syntax/src/parser/mod.rs

+}
+
+#[derive(Debug, PartialEq)]
+enum TextElementTermination {


All the enums here need docs, which explain the why's of this code. I suggested how to improve the docs of CarriageReturn in a comment below.

Can TextElementTermination be combined with TextElementPosition?

I'm not sure, and I'd prefer not to dig into rethinking of get_pattern now. Is it ok not to block on this?

stasm · 2019-01-23T19:41:23Z

fluent-syntax/src/parser/mod.rs

+#[derive(Debug, PartialEq)]
+enum TextElementType {
+    Blank,
+    NonBlank,


I'm not entirely sure why Blank vs. NonBlank are needed. Please add a comment.

stasm · 2019-01-23T19:41:36Z

fluent-syntax/src/parser/mod.rs

+    if let Some(last_non_blank) = last_non_blank {
+        let elements = elements
+            .into_iter()
+            .take(last_non_blank + 1)


Because of how take works. If last_non_blank is 0, we need to take 1.

Yeah, but why? :) What does last_non_blank keep track of?

It keeps the track of which of the elements is the last one to be non_blank. Let's imagine this scenario:

key = This is multiline value and it still works key2 = Value 2

For key it produces the following TextElements:

("This is\n", " multiline value\n", "\n", " and it still works\n", "\n", "\n")

So, we want to only take the lines 0..3. It may be that since we filter out blank lines, it would be better to remove that variable and just filter out blank lines. I'll look into this after landing this :)

stasm · 2019-01-23T19:47:57Z

fluent-syntax/src/parser/mod.rs

+    }
+}
+
+fn get_literal<'p>(ps: &mut ParserStream<'p>) -> Result<ast::InlineExpression<'p>> {


There are only two literals in Fluent spec: StringLiteral and NumberLiteral. In fluent.js we called this function getSimpleExpression.

stasm · 2019-01-23T19:51:44Z

fluent-syntax/tests/fixtures/Makefile

@@ -0,0 +1,11 @@
+FTL_FIXTURES := $(wildcard *.ftl)


Remove this file please.

…emented potentially beyond the slice range.

zbraniecki · 2019-01-23T22:39:26Z

Applied all feedback.

OK, I think I'm mostly done with the review. It's a lot of code; thanks for the patience. I'd like to see a version of this PR with my comments addressed. The most important ones are:
1. Comment the enums and their role in `get_pattern`.

Added.

2. Add protection against out-of-bounds indexing into `ps.source`. (And fix the bug I found yesterday.)

Removed the unsafe code for now, fixed the bug and verified that no other place uses get_slice with pointers that are not already checked to be in range.

3. Sometimes the code uses `unwrap` in places where I imagine panic are not possible if the AST was parsed by the parser. Please consider starting a conversation about the expectation towards this code when this assumption is not satisfied.

#80

4. File issues for the things we know are follow-up material.

#79

stasm

Thanks for applying the feedback, @zbraniecki! Nice work on this refactor, let's ship it.

stasm · 2019-01-24T08:26:22Z

fluent-cli/src/main.rs

    match res {
        Ok(res) => print_entries_resource(&res),
        Err((res, errors)) => {
            print_entries_resource(&res);
+
+            if matches.is_present("silence") {


Does this need to be called silent now?

stasm · 2019-01-24T13:55:04Z

fluent-syntax/src/parser/ftlstream.rs

@@ -186,7 +189,7 @@ impl<'p> ParserStream<'p> {
    }

    pub fn get_slice(&self, start: usize, end: usize) -> &'p str {
-        unsafe { str::from_utf8_unchecked(&self.source[start..end]) }
+        str::from_utf8(&self.source[start..end]).expect("Slicing the source failed")


Thanks for removing the unsafe block here, at least for now. Would you mind filing a follow-up for discussing re-adding it, please? I'd like to look into it a bit more, given that we're likely protected from invalid UTF8 input by the sheer fact of accepting String into FluentResource::try_new.

stasm · 2019-01-24T13:56:43Z

fluent-syntax/src/parser/mod.rs

+//
+// CRLF variant is specific because we want
+// to skip it in text elements production.
+// For example `a\r\nb` will produce


There's a missing space before b in the example here. Also (uber-nit), what's up with the narrow line width?

stasm · 2019-01-24T13:57:27Z

fluent-syntax/src/parser/mod.rs

+// proper state for the next line.
+//
+// CRLF variant is specific because we want
+// to skip it in text elements production.


…we want to skip CR but keep the LF….

stasm · 2019-01-24T13:59:23Z

fluent-syntax/src/parser/mod.rs

+    if let Some(last_non_blank) = last_non_blank {
+        let elements = elements
+            .into_iter()
+            .take(last_non_blank + 1)


Yeah, but why? :) What does last_non_blank keep track of?

stasm · 2019-01-24T13:59:56Z

fluent-syntax/src/parser/mod.rs

+// is blank or not.
+// This is important to identify text elements
+// which should not be taken into account
+// when calculating common indent.


Ah, nice, I understand the purpose of those enums better now, thanks!

stasm · 2019-01-24T14:07:49Z

fluent-syntax/src/parser/mod.rs

+}
+
+#[derive(Debug, PartialEq)]
+enum TextElementTermination {


zbraniecki force-pushed the copy-free-resolver branch from 9d66bfa to 7996f12 Compare December 21, 2018 04:41

zbraniecki force-pushed the copy-free-resolver branch from 7996f12 to 77c2b38 Compare December 21, 2018 22:08

Zibi Braniecki added 8 commits December 28, 2018 14:17

Switch to elsa

bdbdaad

Merge fluent-bundle and fluent and attempt to use ResourceManager via…

6e02272

… Cow

Revert "Merge fluent-bundle and fluent and attempt to use ResourceMan…

652b623

…ager via Cow" This reverts commit 6e02272.

Drop FluentBundle::add_messages

9a9c3ab

Convert simple-app example to use add_resource

957942e

Switch to use rental crate to store source string on FluentResource

377ea56

Fix benches for fluent-bundle

d50fc3e

Fix the crate for stable rust

e26dd25

zbraniecki mentioned this pull request Jan 5, 2019

Add docs for FluentBundle #73

Closed

Zibi Braniecki and others added 6 commits January 5, 2019 13:53

Use annotate-snippets 0.5

bbac435

Use newer serde to simplify the json ast serialization

bc57ec6

Add docs for FluentBundle

2242b29

Added docs for FluentBundle. Also, corrected some errors in the pre-existing docs.

Anonymized example. Tiny style tweak.

bf6a75e

Corrected error description for FluentBundle.format

bd74a4a

More clarification on the various flavors of format error.

b9f0886

JohnHeitmann reviewed Jan 7, 2019

View reviewed changes

Merge fixes

c555be5

JohnHeitmann reviewed Jan 7, 2019

View reviewed changes

Zibi Braniecki and others added 2 commits January 9, 2019 12:52

Some more parser cleanups.

aebb85d

Extra docs for format_message. Clarified parts of error handling, blu…

4956beb

…rred others.

stasm reviewed Jan 15, 2019

View reviewed changes

zbraniecki and others added 6 commits January 16, 2019 12:04

Merge branch 'copy-free-resolver' into docs

9b1d55e

Merge pull request #1 from JohnHeitmann/docs

442a979

Add docs for FluentBundle

Rename format_message to compound

be4b1ee

Remove ResourceManager to a separate PR

37695b7

Rename lifetime in resolver

96e4b91

Various minor reviewer feedback

4083bba

Pike reviewed Jan 18, 2019

View reviewed changes

stasm reviewed Jan 22, 2019

View reviewed changes

stasm suggested changes Jan 23, 2019

View reviewed changes

Zibi Braniecki added 2 commits January 23, 2019 13:37

Fix the only case where we attempted to get slice with a pointer incr…

60357e6

…emented potentially beyond the slice range.

Apply reviewers feedback

6ce04db

zbraniecki mentioned this pull request Jan 23, 2019

Decode raw StringLiteral into unicode in Resolver #79

Closed

Zibi Braniecki added 2 commits January 23, 2019 14:34

Another round of feedback

793b019

Move get_slice to use safe code

a4af23a

zbraniecki requested a review from stasm January 23, 2019 22:39

stasm approved these changes Jan 24, 2019

View reviewed changes

This was referenced Jan 24, 2019

Consider removing last_non_blank #81

Closed

Consider readding unsafe slicing in the parser #82

Closed

Round of feedback

61298f3

zbraniecki merged commit 12cc561 into projectfluent:master Jan 24, 2019

zbraniecki deleted the copy-free-resolver branch January 24, 2019 20:16

This was referenced Jan 24, 2019

Consider rewriting the parser to make it copy-free #16

Closed

Align parser testing with fluent-syntax in JS and Python #28

Closed

Refactor of the code, 0.8 compatibility, zero-copy AST/parsing, Rust 2018 goodies #76

Refactor of the code, 0.8 compatibility, zero-copy AST/parsing, Rust 2018 goodies #76

Conversation

zbraniecki commented Dec 21, 2018

zbraniecki commented Dec 21, 2018

zbraniecki commented Dec 21, 2018

zbraniecki commented Dec 21, 2018

stasm commented Dec 27, 2018

zbraniecki commented Dec 28, 2018

zbraniecki commented Jan 5, 2019

zbraniecki commented Jan 5, 2019

zbraniecki commented Jan 5, 2019

JohnHeitmann Jan 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm commented Jan 15, 2019

stasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zbraniecki commented Jan 16, 2019

Pike left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm Jan 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm Jan 22, 2019 • edited Loading

Choose a reason for hiding this comment

stasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohnHeitmann Jan 7, 2019 •

edited

Loading

stasm Jan 22, 2019 •

edited

Loading

stasm Jan 22, 2019 •

edited

Loading