-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error generating Gocc using Gocc #13
Comments
gocc3 has not been worked on for a few years. I think "gocc3 user guide will be published shortly" should be removed from the manual. @goccmack would have to comment on what the primary changes would have been between gocc2 and gocc3. I can't remember. I think you can dive into the gocc code base confident that big changes are not going to happen anytime soon and that gocc is here to stay. |
gocc3 included two major changes:
I am of the opinion that EBNF encourages grammars which are outside LR(1). Most EBNF grammars I wrote had lots of LR(1) conflicts. I would advise against adding EBNF to gocc again. When I have the time again I would like to add Pager's PGM again because it generates smaller tables for full LR(1) grammars -- similar in size to LALR for the same grammar. I also think that the original paper is beautiful and that it is a pity that LALR became popular instead of this technique. The PGM parse table generator was working in gocc3 but not extensively tested. I recalled gocc3 because of errors in the translation of EBNF to BNF for parser generation, which I didn't consider worth fixing because I peferred to revert to BNF input. |
Hi Walter and Marius, Thanks for providing me with some background regarding Gocc 3. From what I understood, version 3 added EBNF which turned out to encourage grammars outside of LR(1), and is therefore not intended for inclusion in any future release of Gocc. It is good that you mentioned about Pager's PGM method, and that it generates tables within the same range as LALR, because I was thinking of working on adding LALR support for Gocc in the future. Maybe, I'll read Pager's paper instead. As you already have experience with the algorithm, it would probably be easier for you to add support for it to Gocc 2, but I'd still be happy to learn as I go :) I understand if both of you are busy, so I just wanted to ask one thing. How much work would you estimate is required to patch Gocc, or the grammar of Gocc 2 to be able to generate itself once more? Is the current master branch considered to be Gocc 2? The the code for Gocc 3 ever appear on the public repository? Cheers |
Hi Robin, The errors you got when you ran gocc on its own BNF were for the lexical elements which are not declared the grammar. gocc2 still uses a hand written scanner. If you look at the gocc BNF you will see that the lexical definitions are commented out. If I recall correctly I made gocc3 from gocc2 in about 3 weeks, but that included the PGM state machine generator and major refactoring of the package structure and AST. I also refactored the parse table generation so that I could use it for both Knuth and PGM state machines. gocc3 is still in the repo. The last release candidate is labelled "gocc3 rc3". SHA Id: 1e6bdfb To modify gocc to compile itself would require first of all to modify the BNF to contain the lexical elements used by the syntax part. Then modify main.go to use the generated lexer instead of the current scanner. There were one or two issues with name clashes between lexical elements in the target grammar and those used in gocc2 for generating the new gocc. You may have to rename things like char_lit to char_literal. I don't remember the details anymore but it wasn't very difficult. The PGM parser generator is in the gocc3 code base. If you look in package parser/lrk you will find knuth and pgm packages, which generate the respective state machines. Parser generation is in package parser/lrk/codegen. The reason for the elaborate package structure is that I prepared to add support for parser generation in other languages too. parser/lrk/codegen is where that would plug in. If you want to tackle this project I suggest you start by reading both codebases to understand the differences and then work on gocc3. Remove the gocc3 EBNF rewriting and use the gocc2 BNF for input. By the end of the project you will understand LR(1) code generation very well! Pager's paper isn't freely available. Here's the reference: It was published about a year after Knuths paper on LR(1) and a bit before the first LALR paper. I find PGM much easier to understand than most LALR descriptions. I bought the paper for the price of a small African farm from Springer who owns Acta Informatica's archives. Pager had a PhD student who evaluated his two LR(k) table compression methods against LALR and newer methods LR(2) methods. Perhaps I can send you more detail in a private email. Regards, |
Oh and Robin, I assume you will fix the lexer backtracking as part of such project:) |
llir/llvm@4c184db was supposed to reference #33 not #13. Sorry for the noise. |
I get the following errors when trying to generate Gocc from the gocc2.bnf grammar, using the gen.sh script.
The gocc2.bnf grammar contains the following TODO which seems related.
It is possible that the grammar has been left in an incomplete state, in wait for Gocc version 3. As mentioned in the readme (gocc3 user guide will be published shortly). Any news with regards to Gocc 3? What is the primary changes from version 2 to 3? Has the grammar been updated?
I'd love to dive into the Gocc code base and get thoroughly accustomed to it. Just want to make sure, I don't spend too much time reading code which has been made obsolte by a yet to be released version 3.
Cheers /u
The text was updated successfully, but these errors were encountered: