-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
The new parser segfaults when parsing invalid input #84838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The new peg parser segfaults when parsing the attached reproducer. this is due to the fact that the exception set by |
I think we may need to test for the error indicator (and maybe PyErr_Ocurred for safety) before every alternative. Something like: diff --git a/Tools/peg_generator/pegen/c_generator.py b/Tools/peg_generator/pegen/c_generator.py with self.indent():
- self.print("if (p->error_indicator) {")
- with self.indent():
- self.print("return NULL;")
- self.print("}")
self.print(f"{result_type} _res = NULL;")
if memoize:
self.print(f"if (_PyPegen_is_memoized(p, {node.name}_type, &_res))")
@@ -685,6 +681,12 @@ class CParserGenerator(ParserGenerator, GrammarVisitor):
def visit_Alt(
self, node: Alt, is_loop: bool, is_gather: bool, rulename: Optional[str]
) -> None:
+ self.print("if (p->error_indicator == 1 || PyErr_Occurred()) {")
+ with self.indent():
+ self.print("p->error_indicator = 1;")
+ self.print("return NULL;")
+ self.print("}")
+
self.print(f"{{ // {node}")
with self.indent():
# Prepare variable declarations for the alternative |
Indeed, that diff solves the problem |
How costly is PyErr_Occurred()? That worries me most, otherwise I’d accept this right away. |
A quick benchmark using xxl.py: Base time (master): With the patch in this issue: Sadly I could not test with PGO/LTO and without CPU isolation, so it would be great if someone could double-check these numbers. Also, I will be unable to do a PR until this night/tomorrow morning (London time) :( |
I see almost no time difference for 'make time_stdlib': before 3.471, after 3.451. But I see a serious difference for 'make time_compile': before 3.474, after 4.996. That's over 40% slower (on the extreme case, xxl.py). I'll prepare a PR just in case. |
I understand from Paul Ganssle that this bug was found using Hypothesmith in my stdlib property tests (reported at Zac-HD/stdlib-property-tests#14). As discussed in we-like-parsers#91 and https://pyfound.blogspot.com/2020/05/property-based-testing-for-python.html I'm keen to help out how I can, so if there's anything more specific than "write tools, write test, and wait" please let me know! Best, |
Zac: The reproducer here apparently uses a long string of weird accented characters. I'm not sure how to generalize from that to other things that Hyothes* could find. But maybe this helps: #20106 (comment) |
I know what else it might find either, but I still think it's worth running property-based tests in CI to find out! The demo I wrote for my language summit talk doesn't have any parser tests, but still would have caught this bug in the pull request that introduced it. The specific reproducer here is odd, because it's reported as an internal error in Hypothesmith - I use the It's structurally less complex than typical outputs because it's only a fragment of the tree being generated, but because shrinking doesn't run for generation-time errors it's also much harder to interpret than usual. |
Unfortunately, I do not understand enough about how Hypothes* works to be helpful here, other than by offering encouragement. (And please don't try to educate me. I have too many other things. Sorry.) |
I don't think that such bug should block Python 3.9 beta1 (I'm talking about the "release blocker" priority). There is still time before 3.9 final to fix it. |
Okay, deferring the blocker. But I'll still land Lysandros' fix ASAP. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: