-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it okay for the D grammar to compile in 10 minutes and use 45 gigs of ram? #294
Comments
What version of Pegged are you using? If it is a beta, can you compare against v0.4.4 please? |
@veelo i'm on For completeness sake, i tried |
I appended
and ran But this generates the parser at compile-time, which is probably the reason why this takes so long. With bigger grammars it is better to use |
OK, I don't really know what we should do with this example. As it is now, the example is incomplete and merely defines a string enum, and there is not much point to test this in CI. I have no idea why this example is so demanding, FYI regarding your observation "leaks memory like crazy": this is considered a feature of the compiler, which refrains from freeing memory by default. The |
oh, i thought that like
My bad, dmd taking all that time made me think that there's something somehow stuck and using memory, want to note too that the resulting executable is 20Mb, isn't that bizarre? |
Yes, something is going on here that would be interesting to investigate. However, I myself don't consider it worth my time. I think this example grew out of a curiosity of Philippe to see how far the capabilities of Pegged stretch. It was never complete, is now out of date, and also has some extra experiments (such as macros). It is uncertain to me whether the complete D grammar can in fact be accurately defined as a PEG. And if possible, what point is there now that dmd can be used as a library? In my eyes resources are better spent on improving things for grammars that are in actual use, if necessary. Pegged is very heavy on templates, and it is well known that templates can result in considerable bloat in the executable. There are individuals that think they can develop a better template engine in the compiler (Stefan Koch) but as usual the problem is time (funding). Pegged itself also bears signs of unfinished experiments and unneeded complexity, possibly things can be improved a bit by a cleanup. |
@veelo really appreciate the input, when i asked that something might be wrong i meant a problem that prevailed in the |
Your concerns are legitimate, and there is no guarantee that you won't run into problems. As usual with volunteer-run Open Source projects, you may have to put in an unexpected amount of extra work, but at least you have the possibility to do that. I ran into a fundamental problem myself (left-recursion) and had to do what felt like moving a mountain, but succeeded at last. Now left-recursion is not a problem anymore. I ended up looking after Pegged as a maintainer, but it is just one link in a chain of tools that I use and hence does not receive my full attention. Out of curiosity, is your planned grammar that of an existing language, or will you be developing it as you go? PEGs are not a good match for every language, due to its use of prioritised choice. Pegged does support a longest match alternator to combat that limitation, but that breaks the linear time promise of a pure PEG. |
@veelo totally understand the oss and that no maintainer is obligated to do nothing if he/she doesn't feel like doing it, i was just sharing a concern nothing more, and if i get deep enough into pegged i'll be happy to contribute back, about the language, It's a C-like language that i am developing, the reason i picked PEGs is that coding recursive decent parsers gets really tedious after a while, and you end up with so much moving parts that even small changes become a hassle, so i started looking for a better, context free grammars in general didn't fit because more often than not they have to deal with ambiguity in weird ways, but PEGs on the other hand are recursive decent naturally. One more thing that made me consider it and feel safe that i am not making a stupid choice is that Python is moving to PEG for it's parser/lexer/ast combo and the author made a pretty nice argument for that decision. So yeah will see how it goes |
Interesting reference, nice to see they tackled the left-recursion challenge as well. Since you are developing a language as a PEG, I bet you'll have a much better experience than trying to convert an existing language into a PEG. Welcome on board! (The D language specification is reverse engineered from a manually written lexer+parser, and converting that into a PEG is a different game.) |
Why is PEG unsuitable for the D grammar?
I'm new to parser generators and combinators, and to PEGs too. My understanding is that the theory behind it models Recursive Descent parsers. Can you share your thoughts from that the time on where in the D grammar a PEG would fail? I've also been looking at the DMD implementation of the lexer/parser but it's quite big. Thank you :) cc @veelo |
Hi @munael. I think delimited strings is one aspect of the D grammar that is impossible to capture in a typical PEG parser generator: string hello = q"HELLO
This is a HELLO delimited string.
Double quotes " may be unbalanced!
HELLO"; This is because the delimiter is unknown until parsing is halfway through the expression to be parsed. However, Pegged is not typical, and I recently found out that you can use a user defined parser for these kinds of language constructs: #342. This is cool! So nowadays I would nuance myself and say that it may be possible to create a Pegged generated parser for the entire D language, but I can't be certain until somebody does it. Another aspect is mixins, which may be tricky. However, Pegged is currently not implemented optimally regarding speed, and trumping DMD in terms of speed is a tall order. So if anybody would be interested in working on this, the motivation would be something else than speed. And then there will always be the issue of keeping it up to date. |
That's a good example 👀 Semantic actions suggest no, but semantic actions act after rules and can't create their own captures as far as I understand it. The implementation may support it, but it doesn't look like a proper PEG. I would've expected delimited strings to be handled at the lexer level, to be honest. A custom parser could act as one, but what would the pre-parsed tokens in a delimited string look like? |
Yes. https://scg.unibe.ch/assets/download/lectures/cc/CC-09-PEGs.pdf
Correct.
A PEG parser is a lexer and parser in one, so there is no lexer level. |
Compiling the
dgrammer.d
takes close to 10 minutes and apparently leaks memory like crazy, with dmd taking 45 gigs of ramI am compiling on osx and here're the stats
Ram:
Pegged Version:
TIme:
dmd/dub Version:
The text was updated successfully, but these errors were encountered: