Stop parsing Junk on lines which look like Entries #211

stasm · 2018-11-07T12:48:51Z

The second commit has a few more tests for this specific change. I didn't want them to get lost in the many changes to fixtures this PR has. I'll squash it when landing.

Pike

You mentioned in email that this method would be what the tooling parsers use.

Comparing with the results of fluent-syntax fixtures, it seems they stop on blank lines, too?

Pike · 2018-11-08T10:58:43Z

test/fixtures/call_expressions.json

@@ -213,7 +213,7 @@
        {
            "type": "Junk",
            "annotations": [],
-            "content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n"
+            "content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n\n"


Asking about these ^^^

Great question! I'm not happy with the answer, but the good news is that this PR will help in making this right.

The reference fixtures in fluent-syntax are copied directly from the reference parser's tests. They're not generated by the tooling parser like the other kinds of fixtures. I've taken a note to document this in a README.

So what you're seeing in fluent-syntax/test/fixtures_reference are actually the reference parser's fixtures from Syntax 0.7.

The actual output of the tooling parser includes those trailing blank lines. You can verify that in the Playground. The fixtures_reference tests pass in fluent-syntax because the test runner explicitly ignores Junk due to its being parsed differently in Syntax 0.7. With this PR, we're getting much closer to being able to test junk too :)

spec/fluent.ebnf

Pike

I did check a bit, and found a bug in c-l.

This is technically OK, but can we implement this in a way that the EBNF comes out right? 'cause I think that line - "#" doesn't mean anything? Maybe just a junk_block_line or so that starts with a negative regex?

Pike · 2018-11-08T12:49:37Z

spec/fluent.ebnf

+ * be a beginning of a new message, term, or a comment. Any whitespace
+ * following a broken Entry is also considered part of Junk.
+ */
+Junk                ::= junk_line (junk_line - "#" - "-" - [a-zA-Z])*


This rendering to ebnf doesn't make any sense, right?

I think it does. We're using the EBNF syntax as defined in the XML spec, with an extension of allowing regexes in a few places. The XML one reads:

A - B matches any string that matches A but does not match B.

So junk_line - "#" matches a line of junk which doesn't start with a #. I think that's exactly what we want to say here.

To me, - is defined usefully only on single character productions in the XML spec. Or literally using matches any string that matches A but does not match B, # foo\n does match junkline, but it doesn't match #, so it's a junk line.

# foo\n does match junkline, but it doesn't match #, so it's a junk line.

Can you rephrase this please?

To me, the EBNF in this PR clearly expresses the intent. This is already a slippery slope because we're trying to define how to parse unparsed content. I don't want to overthink it. I'm also not sure how you'd like to write this differently. I could look at a PR if you'd like to prepare one :)

I just took the literal quote from the xml spec, and replaced A and B with junk_line and #, resp. And tested that against a candidate line line # foo\n.

I know that you don't believe in the value of the EBNF, and I don't care much either about this one.

I just took the literal quote from the xml spec, and replaced A and B with junk_line and #, resp. And tested that against a candidate line line # foo\n.

Ah, I see what you mean, thanks. # matches # foo\n partially and that's enough for a negative lookahead to work here. I guess we could try to refactor this into something like sequence(and(not("#"), any_char), junk_line) but it would require special handling of blank lines inside of junk. (any_char doesn't parse newlines.) All in all, I favor the expressiveness of the approach I implemented in this PR.

stasm mentioned this pull request Nov 7, 2018

Adjacent broken entries shouldn't be merged into a single Junk #185

Closed

stasm requested a review from Pike November 7, 2018 12:51

Pike reviewed Nov 8, 2018

View reviewed changes

stasm added 3 commits November 8, 2018 16:48

Stop parsing Junk on lines which look like Entries

e6a2476

squash! Add tests specific to this change

09bac61

Capitalize Junk

c054562

stasm force-pushed the consecutive-junk branch from 09c634d to c054562 Compare November 8, 2018 15:48

Pike approved these changes Nov 8, 2018

View reviewed changes

stasm merged commit bf8fff4 into projectfluent:master Nov 8, 2018

stasm deleted the consecutive-junk branch November 8, 2018 17:35

Pike mentioned this pull request Nov 9, 2018

End junk on blank lines to produce smaller Junk entries projectfluent/fluent.js#298

Closed

This was referenced Nov 13, 2018

Multiple consecutive Junk entries should be allowed projectfluent/fluent.js#248

Closed

Implement Syntax 0.8 projectfluent/fluent.js#303

Closed

stasm mentioned this pull request Nov 27, 2018

Syntax 0.8, part 5: Update reference fixtures projectfluent/fluent.js#312

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop parsing Junk on lines which look like Entries #211

Stop parsing Junk on lines which look like Entries #211

stasm commented Nov 7, 2018 •

edited

Loading

Pike left a comment

Pike Nov 8, 2018

stasm Nov 8, 2018

Pike left a comment

Pike Nov 8, 2018

stasm Nov 8, 2018

Pike Nov 8, 2018

stasm Nov 8, 2018 •

edited

Loading

Pike Nov 8, 2018

stasm Nov 8, 2018

Stop parsing Junk on lines which look like Entries #211

Stop parsing Junk on lines which look like Entries #211

Conversation

stasm commented Nov 7, 2018 • edited Loading

Pike left a comment

Choose a reason for hiding this comment

Pike Nov 8, 2018

Choose a reason for hiding this comment

stasm Nov 8, 2018

Choose a reason for hiding this comment

Pike left a comment

Choose a reason for hiding this comment

Pike Nov 8, 2018

Choose a reason for hiding this comment

stasm Nov 8, 2018

Choose a reason for hiding this comment

Pike Nov 8, 2018

Choose a reason for hiding this comment

stasm Nov 8, 2018 • edited Loading

Choose a reason for hiding this comment

Pike Nov 8, 2018

Choose a reason for hiding this comment

stasm Nov 8, 2018

Choose a reason for hiding this comment

stasm commented Nov 7, 2018 •

edited

Loading

stasm Nov 8, 2018 •

edited

Loading