Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support left recursion #123

Merged
merged 14 commits into from
Sep 22, 2023
Merged

Conversation

sc07kvm
Copy link

@sc07kvm sc07kvm commented Jun 24, 2023

This PR is the first step towards the implementation of #120.
Implementation details are peeped here https://github.com/we-like-parsers/pegen_experiments/tree/master/pegen

Fixes: #120
It also fixes #79

Copy link
Collaborator

@breml breml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed of this PR and I am by far not through yet (especially, I have not yet started to look into the parts for the left recursion handling). I have some general questions though:

  • Can you please provide a general summary, how the left recursion handling is implemented.
  • Why do all the generated parsers change even if the flag support-left-recursion is not set to true? Can you please explain the changes.
  • How does this change impact the performance of the generated parsers (optimized and not optimized). A good example for this is the json parser in examples/json.
  • Can you please check how the documentation would need to be changed (e.g. README.md, doc.go)

Additionally, I left some minor initial comments.

test/left_recursion/left_recursion.peg Outdated Show resolved Hide resolved
test/left_recursion/left_recursion.peg Outdated Show resolved Hide resolved
test/left_recursion/left_recursion.peg Outdated Show resolved Hide resolved
test/max_expr_cnt/maxexpr.peg Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
builder/scc_test.go Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
ast/ast.go Outdated Show resolved Hide resolved
builder/left_recursion_test.go Outdated Show resolved Hide resolved
builder/left_recursion_test.go Show resolved Hide resolved
@sc07kvm
Copy link
Author

sc07kvm commented Jun 25, 2023

Can you please provide a general summary, how the left recursion handling is implemented.

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

It is unlikely that I can describe it better than Guido van Rossum:

Priority support is implemented by saving rules and expressions in the stack. Therefore, at the stage of the new choice, we can check what choice we made last time.

@sc07kvm
Copy link
Author

sc07kvm commented Jun 25, 2023

Why do all the generated parsers change even if the flag support-left-recursion is not set to true? Can you please explain the changes.

I tweaked the code a bit, now there are fewer changes. The remaining changes are needed to more conveniently integrate support for left recursion.

Major change:

func (p *parser) parseRule(rule *rule) (any, bool) {
	// debug code
	// memoize code
	// parse rule
	// debug code
	// memoize code
	return val, ok
}

->

func (p *parser) parseRuleWrap(rule *rule) (any, bool) {
	// debug code
	// choosing the right method
	if ... {
		val, ok := p.parseRule(rule)
	} else if ... {
		val, ok := p.parseRuleMemoize(rule)
	} else if ... {
		...
	}
	// debug code
	return val, ok
}

func (p *parser) parseRule(rule *rule) (any, bool) {
	// parse rule
}

func (p *parser) parseRuleMemoize(rule *rule) (any, bool) {
	// memoize code
	val, ok := p.parseRule(rule)
	// memoize code
	return val, ok
}

@sc07kvm
Copy link
Author

sc07kvm commented Jun 26, 2023

How does this change impact the performance of the generated parsers (optimized and not optimized). A good example for this is the json parser in examples/json.

Performance decreased by ~2% on examples/json

goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10                         157           7561956 ns/op         5003014 B/op     139782 allocs/op
BenchmarkPigeonJSONMemo-10                            49          24026827 ns/op        26645190 B/op     184036 allocs/op
BenchmarkPigeonJSONOptimized-10                      447           2694039 ns/op         2770617 B/op      69738 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10               164           7232733 ns/op         4998874 B/op     139647 allocs/op
BenchmarkStdlibJSON-10                              8139            138571 ns/op           74200 B/op       1010 allocs/op
PASS
ok      github.com/mna/pigeon/examples/json     7.893s
goos: darwin
goarch: arm64
pkg: github.com/mna/pigeon/examples/json
BenchmarkPigeonJSONNoMemo-10                         153           7679382 ns/op         5003028 B/op     139781 allocs/op
BenchmarkPigeonJSONMemo-10                            45          24330618 ns/op        26642812 B/op     184029 allocs/op
BenchmarkPigeonJSONOptimized-10                      439           2711443 ns/op         2770550 B/op      69737 allocs/op
BenchmarkPigeonJSONOptimizedGrammar-10               162           7397897 ns/op         4998891 B/op     139647 allocs/op
BenchmarkStdlibJSON-10                              8037            139898 ns/op           74205 B/op       1010 allocs/op
PASS
ok      github.com/mna/pigeon/examples/json     8.478s

@sc07kvm
Copy link
Author

sc07kvm commented Jun 26, 2023

Can you please check how the documentation would need to be changed (e.g. README.md, doc.go)

Added to doc.go, main.go

@breml
Copy link
Collaborator

breml commented Jun 26, 2023

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags -optimize-parser and -support-left-recursion are not compatible since with -optimize-parser the support for memoization is removed. Is this correct?

Copy link
Collaborator

@breml breml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now went through the complete PR and I have to admit, that from this first complete read, I am not able to fully follow how it works. This makes my currently feel somewhat unsure on how to proceed with this. In the end, I need to be confident in this code in order to keep maintaining it in the future.

Additional review comment:

  • I think map[...]bool can be replaced everywhere with map[...]struct{}, which is more memory efficient.

test/max_expr_cnt/maxexpr.peg Outdated Show resolved Hide resolved
doc.go Outdated Show resolved Hide resolved
doc.go Outdated Show resolved Hide resolved
builder/left_recursion_test.go Show resolved Hide resolved
builder/builder.go Outdated Show resolved Hide resolved
builder/static_code.go Outdated Show resolved Hide resolved
builder/static_code.go Outdated Show resolved Hide resolved
builder/static_code.go Outdated Show resolved Hide resolved
builder/static_code.go Outdated Show resolved Hide resolved
builder/static_code.go Outdated Show resolved Hide resolved
@breml
Copy link
Collaborator

breml commented Jun 27, 2023

@sc07kvm One more idea to gain some insights into the robustness of this implementation as well as coverage of edge cases would be to have two grammars for parsing the same input (e.g. arithmetic expressions), one with left recursion and one without and then use fuzzing to compare the results of the two.

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

Replace map[string]bool with map[string]struct{}
Add msgs in tests
Correct typos
Add testify library license
Fix flaky test
@sc07kvm
Copy link
Author

sc07kvm commented Jun 28, 2023

For the left recursion rule, we apply it in a loop, while memoizing the result of its last execution. For the first time, we consider that the rule was not executed successfully. We stop when we cannot get a longer result.

If I understand this correctly, left recursion support is only possible, if memoization is available. Of so, the flags -optimize-parser and -support-left-recursion are not compatible since with -optimize-parser the support for memoization is removed. Is this correct?

Conceptually yes, memoization is needed.

But if the -optimize-parser, -support-left-recursion parameters are used and there is left recursion in the grammar, then the code responsible for memoization is added to the parser and memoization is applied only on one rule from each cycle in the grammar.

I added a test for this case: test/left_recursion/optimized

testutils/testutils.go Outdated Show resolved Hide resolved
test/left_recursion/optimized/left_recursion_test.go Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
doc.go Outdated Show resolved Hide resolved
doc.go Outdated Show resolved Hide resolved
builder/scc_test.go Show resolved Hide resolved
@sc07kvm
Copy link
Author

sc07kvm commented Jul 5, 2023

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

I need some time to do this

@sc07kvm
Copy link
Author

sc07kvm commented Jul 6, 2023

An other angle, that is worrying me a little bit is, that at the moment we do not have grammar examples combining left recursion with some of the other features like throw-recover and state so it is currently unknown, if this implementation does behave as expected in these cases.

Added:

  • ./test/left_recursion_state/ - modified ./test/left_recursion/ in the likeness of ./test/state
  • ./test/left_recursion_labeled_failures/ - modified /test/labeled_failures/
  • ./test/left_recursion_thrownrecover/ - modified ./test/thrownrecover/

@breml
Copy link
Collaborator

breml commented Jul 15, 2023

@sc07kvm I have not forgotten you nor this PR. I am currently abroad and therefore only seldom online. I will try to have a look again, but I can not promise anything. I am sorry for the inconvenience.

@breml
Copy link
Collaborator

breml commented Aug 26, 2023

@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?

@sc07kvm
Copy link
Author

sc07kvm commented Aug 29, 2023

@sc07kvm I am back at work again and I checked this PR again. Thanks for all the work you have put into this. Do you mind to give me a brief summary about the current state of this PR. Do you have some open points you are work on or is this PR ready from you point of view?

In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.

@breml
Copy link
Collaborator

breml commented Aug 30, 2023

In my opinion PR is ready. I did everything I wanted on it, there were no unanswered questions on it. The status is as follows - support for left recursion has been implemented, a check for the presence of left recursion has been implemented, tests have been written for the operation of left recursion with all other grammar features.

Thanks for the summary, this all sounds very good. I went through the PR again and this is one of the biggest PR ever for pigeon (at least in the recent time) and therefore I am still a little bit concerned about eventual negative side effects.

In order to get some feedback from the community, I would like to call out to all of you that follow this PR as well as to some power users of pigeon (@mna, @xcoulon, @flowchartsman) with the request to give this PR a spin and report back if for your use-cases everything still works as intended. Thank you very much.

@breml
Copy link
Collaborator

breml commented Aug 30, 2023

I just successfully tested pigeon built from this branch with:

@mna
Copy link
Owner

mna commented Aug 30, 2023

Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.

I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).

Sorry I cannot be of more help testing this.

@breml
Copy link
Collaborator

breml commented Aug 31, 2023

Hey @breml , thanks for reaching out, I understand your concerns as this is a huge PR. Thanks to @sc07kvm for the significant contribution! I personally haven't used pigeon in a long time and I'm afraid I don't have any additional use-cases to check outside the test suite.

@mna Thanks for your feedback, this is always appreciated.

I think a valid question is whether or not left recursion is a big enough problem to add that complexity in the parser (vs fixing/adjusting the grammar)? I sincerely don't know the answer to that, left recursion has generally not been an issue in my grammars but then again they've been more or less fully under my control. And I haven't followed up on the previous discussions, so apologies if that has already been discussed at length (and no disrespect to the huge effort made to implement this! The answer may very well be that it's worth adding this, and I guess it's the likely answer at this point since there's been a good amount of back and forth from what I can see).

While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon actually removes complexity for the users.

As you have mentioned, we already put quite some effort in testing and refining this PR. I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code.
Also maintenance and bug fixing of this code could become an issue due to the increased complexity.

@mna
Copy link
Owner

mna commented Aug 31, 2023

While it is true, that the grammars can always rearranged such that there are no left recursions, it is also true, that this often comes with the cost, that the grammars become less natural to read and therefore harder to maintain. Therefore adding this complexity to pigeon actually removes complexity for the users.

Totally agree with you @breml .

I am seriously considering to add this, but before doing so, I would like to have a very high confidence, that this does not break existing code.
Also maintenance and bug fixing of this code could become an issue due to the increased complexity.

Yeah that makes sense to me. The fact that it is generally used as a tool (and not imported as a package) makes it harder to automate some kind of "testing in the wild", as no importers are reported in pkg.go.dev.

One thing I did that resulted in a reasonable number of repos using pigeon is that github search: https://github.com/search?q=pigeon+-o+language%3AGo+&type=code

This could be a starting point to automate testing the changes on a wide variety of grammars.

@breml
Copy link
Collaborator

breml commented Sep 19, 2023

tl/dr: tests with left recursion pigeon: ✅ 17 ➖ 13 ❌ 0
Bottom line is, that I was not able to find any blocker due to the changes in pigeon, which gives me quite some confidence to release this PR.

Today I ran some tests with (popular) packages, that use pigeon. I used this search https://github.com/search?utf8=%E2%9C%93&q=generate+pigeon+peg+path%3A*.go&type=code to find packages, that have the keywords generate, pigeon and peg in one of their .go files.

Then I basically worked with the following sequence:

  • git clone
  • if Go version in go.mod is < 1.18 update Go version
  • if no go.mod file: go mod init && go mod tidy
  • go test ./...
  • go generate ./...
  • Verify changed file(s) are parsers
  • go test ./...

This is based on the assumption, that there are some tests, that cover the correct function of the generated parser.

These are the results:

https://github.com/bytesparadise/libasciidoc - worked after updating the Go version in go.mod for any
https://github.com/kiteco/kiteco-public - failed to execute go test
https://github.com/flowchartsman/aql - worked after manually updating one of the tests, since pigeon returns a new error message (flowchartsman/aql#2)
https://github.com/hashicorp/go-bexpr
https://github.com/frankbraun/asciiart - worked after adding a go.mod file
https://github.com/owncloud/ocis
https://github.com/Workiva/frugal
https://github.com/mmcloughlin/addchain - worked after updating the Go version in go.mod for any
https://github.com/pinpt/go-common - failed to execute go test
https://github.com/lanl/QA-Prolog - no tests
https://github.com/samuel/go-thrift - worked after updating the Go version in go.mod for any
https://github.com/d4l3k/wikigopher - failed to execute go test on master branch, failed to generate parser with the version from this branch as well as the one from latest master. This is sad, since it would have been an interesting case, since the PEG grammar is > 2400 lines long
https://github.com/pinpt/go-common - failed to execute go test
https://github.com/rigetti/openapi-cli-generator - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary
https://github.com/lanl/edif2qmasm - no tests
https://github.com/eiffel-community/eiffel-goer - no tests
https://github.com/tcard/queson-go - worked after updating the Go version in go.mod for any
https://github.com/fermuch/telemathings-analog-parser - failed to execute go test on master branch
https://github.com/sylvinus/ifql - failed to execute go test ./...
https://github.com/philandstuff/dhall-golang - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary
https://github.com/jacobsimpson/msh - worked with the new version of pigeon, but the test failed on the original checkout 🤔
https://github.com/symbolicsoft/verifpal
https://github.com/tmc/graphql - failed to execute go test ./...
https://github.com/jacobsimpson/mp3tag - worked after updating the Go version in go.mod for any
https://github.com/SerenityHellp/thrift_parser_lib - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary
https://github.com/vivekmurali/km - failed to execute go test ./...
https://github.com/Yelp/opa - failed to execute go test ./...
https://github.com/jacobsimpson/jt - failed to execute go test ./...
https://github.com/mmcloughlin/ec3 - tests only worked for the parser sub-package, updating the Go version in go.mod for any was necessary
https://github.com/mmcloughlin/ssarules - worked after updating the Go version in go.mod for any

Result:
✅ 17 ➖ 13 ❌ 0

Concusion:
For none of the tested packages, where executing the tests has been successful, changing pigeon to the new version from this branch with the left recursion support, broke the tests available for the respective packages.
For a surprisingly large list of packages, it was not straight forward to just execute the tests.
Bottom line is, that I was not able to find any blocker due to the changes in pigeon, which gives me quite some confidence to release this PR.

Any other thoughts?

@mna
Copy link
Owner

mna commented Sep 21, 2023

👍 Agree with the confidence boost that this provides.

@breml breml merged commit 35d4f9c into mna:master Sep 22, 2023
@breml
Copy link
Collaborator

breml commented Sep 22, 2023

🚀 It is merged! Thank you very much @sc07kvm for your great effort to make this happen. ❤️

A new release will follow soon.

flowchartsman added a commit to flowchartsman/aql that referenced this pull request Sep 26, 2023
Addresses issues mentioned in #2 necessary for testing mna/pigeon#123
- remove old go generate directive from package parser
- bring test in line with newer error message
- update to mna/pigeon@latest & regenerate parser.
flowchartsman added a commit to flowchartsman/aql that referenced this pull request Oct 4, 2023
Addresses issues mentioned in #2 necessary for testing mna/pigeon#123
- remove old go generate directive from package parser
- bring test in line with newer error message
- update to mna/pigeon@latest & regenerate parser.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Left recursion support Should not allow left recursion grammar to pass conversion
3 participants