-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop parsing Junk on lines which look like Entries #211
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned in email that this method would be what the tooling parsers use.
Comparing with the results of fluent-syntax
fixtures, it seems they stop on blank lines, too?
@@ -213,7 +213,7 @@ | |||
{ | |||
"type": "Junk", | |||
"annotations": [], | |||
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n" | |||
"content": "shuffled-args = {FUN(1, x: 1, \"a\", y: \"Y\", msg)}\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking about these ^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question! I'm not happy with the answer, but the good news is that this PR will help in making this right.
The reference fixtures in fluent-syntax
are copied directly from the reference parser's tests. They're not generated by the tooling parser like the other kinds of fixtures. I've taken a note to document this in a README
.
So what you're seeing in fluent-syntax/test/fixtures_reference
are actually the reference parser's fixtures from Syntax 0.7.
The actual output of the tooling parser includes those trailing blank lines. You can verify that in the Playground. The fixtures_reference
tests pass in fluent-syntax
because the test runner explicitly ignores Junk
due to its being parsed differently in Syntax 0.7. With this PR, we're getting much closer to being able to test junk too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check a bit, and found a bug in c-l.
This is technically OK, but can we implement this in a way that the EBNF comes out right? 'cause I think that line - "#"
doesn't mean anything? Maybe just a junk_block_line
or so that starts with a negative regex?
* be a beginning of a new message, term, or a comment. Any whitespace | ||
* following a broken Entry is also considered part of Junk. | ||
*/ | ||
Junk ::= junk_line (junk_line - "#" - "-" - [a-zA-Z])* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rendering to ebnf doesn't make any sense, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does. We're using the EBNF syntax as defined in the XML spec, with an extension of allowing regexes in a few places. The XML one reads:
A - B
matches any string that matches A but does not match B.
So junk_line - "#"
matches a line of junk which doesn't start with a #
. I think that's exactly what we want to say here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, -
is defined usefully only on single character productions in the XML spec. Or literally using matches any string that matches A but does not match B, # foo\n
does match junkline
, but it doesn't match #
, so it's a junk line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# foo\n
does match junkline, but it doesn't match #, so it's a junk line.
Can you rephrase this please?
To me, the EBNF in this PR clearly expresses the intent. This is already a slippery slope because we're trying to define how to parse unparsed content. I don't want to overthink it. I'm also not sure how you'd like to write this differently. I could look at a PR if you'd like to prepare one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took the literal quote from the xml spec, and replaced A
and B
with junk_line
and #
, resp. And tested that against a candidate line line # foo\n
.
I know that you don't believe in the value of the EBNF, and I don't care much either about this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took the literal quote from the xml spec, and replaced A and B with junk_line and #, resp. And tested that against a candidate line line # foo\n.
Ah, I see what you mean, thanks. #
matches # foo\n
partially and that's enough for a negative lookahead to work here. I guess we could try to refactor this into something like sequence(and(not("#"), any_char), junk_line)
but it would require special handling of blank lines inside of junk. (any_char
doesn't parse newlines.) All in all, I favor the expressiveness of the approach I implemented in this PR.
09c634d
to
c054562
Compare
Fix #185.
The second commit has a few more tests for this specific change. I didn't want them to get lost in the many changes to fixtures this PR has. I'll squash it when landing.