-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid error handling duplication for starred, yield, lambda expressions #10809
Merged
dhruvmanila
merged 5 commits into
dhruv/parser
from
dhruv/deduplicate-expr-error-handling
Apr 9, 2024
Merged
Avoid error handling duplication for starred, yield, lambda expressions #10809
dhruvmanila
merged 5 commits into
dhruv/parser
from
dhruv/deduplicate-expr-error-handling
Apr 9, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dhruvmanila
force-pushed
the
dhruv/deduplicate-expr-error-handling
branch
from
April 7, 2024 03:54
0a27639
to
909d497
Compare
|
dhruvmanila
force-pushed
the
dhruv/parser
branch
from
April 7, 2024 11:23
ef330fb
to
d8cd78c
Compare
dhruvmanila
force-pushed
the
dhruv/deduplicate-expr-error-handling
branch
from
April 7, 2024 11:37
909d497
to
46b8821
Compare
MichaReiser
approved these changes
Apr 8, 2024
crates/ruff_python_parser/tests/snapshots/invalid_syntax@expressions__if__recover.py.snap
Show resolved
Hide resolved
dhruvmanila
force-pushed
the
dhruv/deduplicate-expr-error-handling
branch
from
April 9, 2024 08:48
0ce341d
to
1698b32
Compare
This was referenced Apr 9, 2024
dhruvmanila
added a commit
that referenced
this pull request
Apr 10, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 11, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 11, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 15, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 15, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 16, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 16, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 16, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 16, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 17, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 17, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 18, 2024
…ns (#10809) ## Summary This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are `lambda`, starred and `yield` expression. ### Problem The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using `parse_lhs_expression` which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods: 1. `parse_expression_list`: 14 references 2. `parse_star_expression_list`: 4 references 3. `parse_star_expression_or_higher`: 8 references 4. `parse_named_expression_or_higher`: 10 references 5. `parse_conditional_expression_or_higher`: 25 references 6. `parse_simple_expression`: 4 references The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, `parse_expression_list` calls into `parse_conditional_expression_or_higher` but not the other way around. ### Solution We'll take the above expression one at a time to understand the solution: #### Lambda expression Lambda expressions are only allowed in `expression` grammar rule which corresponds to `parse_conditional_expression_or_higher`. This means that this expression is only allowed when using either of the following functions: 1. `parse_expression_list` 2. `parse_star_expression_list` 3. `parse_star_expression_or_higher` 4. `parse_named_expression_or_higher` 5. `parse_conditional_expression_or_higher` The solution is to move the error handling in `parse_simple_expression` and parameterize it where any of the above listed function would always use `AllowLambdaExpression::Yes`. #### Starred expression There are two grammar rules related to starred expression: 1. `star_expression` which corresponds to `parse_star_expression_or_higher` 2. `starred_expression` which is parsed in LHS parsing Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (`*x`). The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized: 1. `parse_expression_list` 2. `parse_named_expression_or_higher` 3. `parse_conditional_expression_or_higher` Now, `parse_star_expression_list` and `parse_star_expression_or_higher` aren't parameterized because they handle the `star_expression` grammar which means that the caller wants to parse a starred expression but with a limited precedence. #### Yield expression Yield expressions are only allowed in the following context: 1. Top level as yield statement 2. Parenthesized 3. F-string expression 4. Assignment (including annotated and augmented) value We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed. The solution is to add a `parse_yield_expression_or_else` method which parses a yield expression if the parser is at `yield` token or else calls the given method to parse the expression. The call site would like: ```rs // (yield_expr | named_expression) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_named_expression_or_higher()) // (yield_expr | star_expressions) self.try_parse_yield_expression() .unwrap_or_else(|| self.parse_star_expression_list()) ``` An added benefit for this is that the call site looks exactly like the grammar. ## Review * The reviewer would mainly just look at the de-duplication logic. * The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR. ## Test Plan Run existing test cases and verify the snapshot updates. Additional test cases will be added when working on specific nodes.
dhruvmanila
added a commit
that referenced
this pull request
Apr 18, 2024
## Summary This PR removes a couple of inline tests TODO which was a leftover from #10809
dhruvmanila
added a commit
that referenced
this pull request
Apr 23, 2024
## Summary This PR adds a new `ExpressionContext` struct which is used in expression parsing. This solves the following problem: 1. Allowing starred expression with different precedence 2. Allowing yield expression in certain context 3. Remove ambiguity with `in` keyword when parsing a `for ... in` statement For context, (1) was solved by adding `parse_star_expression_list` and `parse_star_expression_or_higher` in #10623, (2) was solved by by adding `parse_yield_expression_or_else` in #10809, and (3) was fixed in #11009. All of the mentioned functions have been removed in favor of the context flags. As mentioned in #11009, an ideal solution would be to implement an expression context which is what this PR implements. This is passed around as function parameter and the call stack is used to automatically reset the context. ### Recovery How should the parser recover if the target expression is invalid when an expression can consume the `in` keyword? 1. Should the `in` keyword be part of the target expression? 2. Or, should the expression parsing stop as soon as `in` keyword is encountered, no matter the expression? For example: ```python for yield x in y: ... # Here, should this be parsed as for (yield x) in (y): ... # Or for (yield x in y): ... # where the `in iter` part is missing ``` Or, for binary expression parsing: ```python for x or y in z: ... # Should this be parsed as for (x or y) in z: ... # Or for (x or y in z): ... # where the `in iter` part is missing ``` This need not be solved now, but is very easy to change. For context this PR does the following: * For binary, comparison, and unary expressions, stop at `in` * For lambda, yield expressions, consume the `in` ## Test Plan 1. Add test cases for the `for ... in` statement and verify the snapshots 2. Make sure the existing test suite pass 3. Run the fuzzer for around 3000 generated source code 4. Run the updated logic on a dozen or so open source repositories (codename "parser-checkouts")
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are
lambda
, starred andyield
expression.Problem
The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using
parse_lhs_expression
which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods:parse_expression_list
: 14 referencesparse_star_expression_list
: 4 referencesparse_star_expression_or_higher
: 8 referencesparse_named_expression_or_higher
: 10 referencesparse_conditional_expression_or_higher
: 25 referencesparse_simple_expression
: 4 referencesThe numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example,
parse_expression_list
calls intoparse_conditional_expression_or_higher
but not the other way around.Solution
We'll take the above expression one at a time to understand the solution:
Lambda expression
Lambda expressions are only allowed in
expression
grammar rule which corresponds toparse_conditional_expression_or_higher
. This means that this expression is only allowed when using either of the following functions:parse_expression_list
parse_star_expression_list
parse_star_expression_or_higher
parse_named_expression_or_higher
parse_conditional_expression_or_higher
The solution is to move the error handling in
parse_simple_expression
and parameterize it where any of the above listed function would always useAllowLambdaExpression::Yes
.Starred expression
There are two grammar rules related to starred expression:
star_expression
which corresponds toparse_star_expression_or_higher
starred_expression
which is parsed in LHS parsingRemember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (
*x
).The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized:
parse_expression_list
parse_named_expression_or_higher
parse_conditional_expression_or_higher
Now,
parse_star_expression_list
andparse_star_expression_or_higher
aren't parameterized because they handle thestar_expression
grammar which means that the caller wants to parse a starred expression but with a limited precedence.Yield expression
Yield expressions are only allowed in the following context:
We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed.
The solution is to add a
parse_yield_expression_or_else
method which parses a yield expression if the parser is atyield
token or else calls the given method to parse the expression. The call site would like:An added benefit for this is that the call site looks exactly like the grammar.
Review
Test Plan
Run existing test cases and verify the snapshot updates.
Additional test cases will be added when working on specific nodes.