Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangling rules #1

Open
serge1 opened this issue Sep 3, 2020 · 6 comments
Open

Dangling rules #1

serge1 opened this issue Sep 3, 2020 · 6 comments

Comments

@serge1
Copy link

serge1 commented Sep 3, 2020

Hi,

Thank you for this effort. Having this extracted grammar is much more convenient for processing.

Would you please clarify why are there "dangling rules" in the grammar - the rules that are defined but not referred from anywhere.
For example rule [008] "c-sequence-start" is never used.

Thank you,
Serge

@perlpunk
Copy link
Member

perlpunk commented Sep 3, 2020

The problem is that the spec defines rules like this:

[8]	c-sequence-start	::=	“[”

but then it simply uses [ directly instead of the c-sequence-start.
I think @ingydotnet thought about fixing this in the generated grammar, but had also reasons not to do this.

@serge1
Copy link
Author

serge1 commented Sep 3, 2020

But, maybe, it worth to remove the confusing rules which are not in use? I guess, using your scripts, it is relatively simple to figure out the list of them. As a bonus, the amount of rules to implement will decrease.

@ingydotnet
Copy link
Member

Hi @serge1 :)

I'm glad you like this project. It should have been done many many years ago, imho.

I should point out that while Clark, Oren and I put equal time into inventing YAML, it was mostly Oren who finished the spec rules using a BNF form that I believe he made up himself. As I work through the YAML 1.2 Spec/Grammar trying to generate actual code, I am finding many things to be problematic. Hopefully this project makes everything clear for developers and produces 100% perfect parsers in every language that uses YAML.

We are working on simplifying the YAML grammar for YAML 1.3 and beyond. The YAML 1.2 grammar cannot really be changed at this point. It is what it is. And what it is is pretty horrifying. I'm adding comments throughout the https://github.com/yaml/yaml-grammar/blob/master/spec-1.2.yaml to point out all the confusing parts.

We have the YAML Test Suite project here: https://github.com/yaml/yaml-test-suite/
You can see a comparison of several real parsers: https://matrix.yaml.io/ (made by @perlpunk)

Now that I have the 1.2 grammar in a YAML form I am using that to generate actual YAML parsers on this branch: https://github.com/yaml/yaml-grammar/tree/parser/parser/src

As you can see I have 2 parsers generated, one in coffeescript (which I enjoy) and one in Perl. They are both running but are in their infancy. The coffeescript one can already parse simple YAML. The Perl one has a bug at the moment in the middle of a simple parse. You can run them using the instructions in this commit: 61bb3bd

Once we can generate perfect parsers in a dozen programming languages that pass the entire test suite, then we can start rearranging the grammar (for 1.3) to make it simpler to understand. ie We can make changes to the grammar and regenerate and test in every language in a matter of seconds.

@perlpunk right. On one hand it's nice to just use the plain characters in the grammar. On the other hand it nice to define what they mean. Those rules end up being just documentation. I suspect that Oren originally used the named rules but later found it too hard to read. :-D

@serge1
Copy link
Author

serge1 commented Sep 3, 2020

Thank you for your kind reply!
In any mean I didn't propose to change the standard. My proposal is to cleanup the extracted grammar used for parser generators within this project only. I don't think that having a dead code in a parser is beneficial. Furthermore, potentially, it can lead to a parser rules ambiguity.
Thank you again,
Serge

@ingydotnet
Copy link
Member

@serge1 Yes, true, the generator could certainly comment out the unreferenced rules in the grammar. I'll reopen this issue and do that soon. Thanks!

PS I just added a 3rd parser for JavaScript, which was trivial since it is just generated from coffeescript. Of course it worked immediately. You can try it by running this command on the parser branch:

make -C parser/src/javascript test TRACE=1

@ingydotnet ingydotnet reopened this Sep 3, 2020
@ingydotnet
Copy link
Member

Sorry, forgot to push that branch. Pushed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants