Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop parsing Junk on lines which look like Entries #211

Merged
merged 3 commits into from
Nov 8, 2018

Conversation

stasm
Copy link
Contributor

@stasm stasm commented Nov 7, 2018

Fix #185.

The second commit has a few more tests for this specific change. I didn't want them to get lost in the many changes to fixtures this PR has. I'll squash it when landing.

Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in email that this method would be what the tooling parsers use.

Comparing with the results of fluent-syntax fixtures, it seems they stop on blank lines, too?

@@ -213,7 +213,7 @@
{
"type": "Junk",
"annotations": [],
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n"
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking about these ^^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I'm not happy with the answer, but the good news is that this PR will help in making this right.

The reference fixtures in fluent-syntax are copied directly from the reference parser's tests. They're not generated by the tooling parser like the other kinds of fixtures. I've taken a note to document this in a README.

So what you're seeing in fluent-syntax/test/fixtures_reference are actually the reference parser's fixtures from Syntax 0.7.

The actual output of the tooling parser includes those trailing blank lines. You can verify that in the Playground. The fixtures_reference tests pass in fluent-syntax because the test runner explicitly ignores Junk due to its being parsed differently in Syntax 0.7. With this PR, we're getting much closer to being able to test junk too :)

spec/fluent.ebnf Outdated Show resolved Hide resolved
Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did check a bit, and found a bug in c-l.

This is technically OK, but can we implement this in a way that the EBNF comes out right? 'cause I think that line - "#" doesn't mean anything? Maybe just a junk_block_line or so that starts with a negative regex?

* be a beginning of a new message, term, or a comment. Any whitespace
* following a broken Entry is also considered part of Junk.
*/
Junk ::= junk_line (junk_line - "#" - "-" - [a-zA-Z])*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rendering to ebnf doesn't make any sense, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does. We're using the EBNF syntax as defined in the XML spec, with an extension of allowing regexes in a few places. The XML one reads:

 A - B
    matches any string that matches A but does not match B.

So junk_line - "#" matches a line of junk which doesn't start with a #. I think that's exactly what we want to say here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, - is defined usefully only on single character productions in the XML spec. Or literally using matches any string that matches A but does not match B, # foo\n does match junkline, but it doesn't match #, so it's a junk line.

Copy link
Contributor Author

@stasm stasm Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# foo\n does match junkline, but it doesn't match #, so it's a junk line.

Can you rephrase this please?

To me, the EBNF in this PR clearly expresses the intent. This is already a slippery slope because we're trying to define how to parse unparsed content. I don't want to overthink it. I'm also not sure how you'd like to write this differently. I could look at a PR if you'd like to prepare one :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took the literal quote from the xml spec, and replaced A and B with junk_line and #, resp. And tested that against a candidate line line # foo\n.

I know that you don't believe in the value of the EBNF, and I don't care much either about this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took the literal quote from the xml spec, and replaced A and B with junk_line and #, resp. And tested that against a candidate line line # foo\n.

Ah, I see what you mean, thanks. # matches # foo\n partially and that's enough for a negative lookahead to work here. I guess we could try to refactor this into something like sequence(and(not("#"), any_char), junk_line) but it would require special handling of blank lines inside of junk. (any_char doesn't parse newlines.) All in all, I favor the expressiveness of the approach I implemented in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adjacent broken entries shouldn't be merged into a single Junk
2 participants