Implement rules around star expressions with different precedence #10623

dhruvmanila · 2024-03-27T08:39:37Z

Summary

This PR adds implementations around the following grammar rules to support the other expression and statement parsing. This PR doesn't update the related expressions and statements as they will be done in their own PRs.

star_expressions
star_expression
'*' bitwise_or

Problem

There are two grammar entries around star expressions:

star_expression:
    | '*' bitwise_or 
    | expression

starred_expression:
    '*' expression

The main difference is the precedence at which the expression should be parsed if a * token is encountered. Currently, we parse both rules at the same precedence which makes the parser more lenient. There are various expressions and statements which uses either one of the above rules.

Solutions

Current solution

Add the following new methods to the parser:

parse_star_expression_list which would be similar to parse_expression_list and would match the star_expressions grammar rule
parse_star_expression_or_higher which would match the star_expression or star_named_expression grammar rule
parse_expression_with_bitwise_or_precedence which would match the bitwise_or grammar rule

Here, we could possibly merge the parse_expression_list and parse_star_expression_list into a single function as it only differs in the method it delegates the parsing to. The parameter could be called allow_star_expression. It could be confusing because even if it’s false, we would allow starred expression but at a higher binding power.

This solution closely resembles the actual grammar and is easier to follow in the parser code. It doesn't require a lot of verification by the caller as the precedence parser makes sure to not allow that.

Follow-up

There's a related follow-up issue which I'm working on and is related to starred expressions.
The starred expression is only allowed in certain expressions and context which the parser needs to make sure and report an error if found otherwise.

Test Plan

Tested this locally by changing various expression and statement to use these methods. As mentioned they will be in their own PR.

github-actions · 2024-03-27T08:58:51Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2024-03-27T10:10:03Z

crates/ruff_python_parser/src/parser/expression.rs

@@ -83,7 +83,7 @@ impl<'src> Parser<'src> {
    /// Matches the `expressions` rule in the [Python grammar].
    ///
    /// [Python grammar]: https://docs.python.org/3/reference/grammar.html
-    pub(super) fn parse_expressions(&mut self) -> ParsedExpr {
+    pub(super) fn parse_expression_list(&mut self) -> ParsedExpr {


Nit: i find the postfix list a bit confusing because we aren't parsing a list (and the term is overloaded in the Parser context where we have parse_list and parse_list_expression) but I don't have a better name recommendation.

dhruvmanila · 2024-03-29T07:47:31Z

For future reference, this change is going to be updated to use the "other" solution mentioned in the PR description. Not all will be reverted but instead of having a parse_expression_with_bitwise_or_precedence, we'll allow the starred expression to be parsed with the highest precedence and limit it later by checking if it's allowed as per "bitwise or" precedence and report an error.

I don't want to add the change here as it'll require updating all of the PRs in the stack. I'll link the PR when it's opened.

…0623) ## Summary This PR adds implementations around the following grammar rules to support the other expression and statement parsing. This PR doesn't update the related expressions and statements as they will be done in their own PRs. * `star_expressions` * `star_expression` * `'*' bitwise_or` ### Problem There are two grammar entries around star expressions: ``` star_expression: | '*' bitwise_or | expression starred_expression: '*' expression ``` The main difference is the precedence at which the expression should be parsed if a `*` token is encountered. Currently, we parse both rules at the same precedence which makes the parser more lenient. There are various expressions and statements which uses either one of the above rules. ### Solutions #### Current solution Add the following new methods to the parser: 1. `parse_star_expression_list` which would be similar to `parse_expression_list` and would match the `star_expressions` grammar rule 2. `parse_star_expression_or_higher` which would match the `star_expression` or `star_named_expression` grammar rule 3. `parse_expression_with_bitwise_or_precedence` which would match the `bitwise_or` grammar rule Here, we could possibly merge the `parse_expression_list` and `parse_star_expression_list` into a single function as it only differs in the method it delegates the parsing to. The parameter could be called `allow_star_expression`. It could be confusing because even if it’s false, we would allow starred expression but at a higher binding power. This solution closely resembles the actual grammar and is easier to follow in the parser code. It doesn't require a lot of verification by the caller as the precedence parser makes sure to not allow that. #### Other solution Add a new `verify_bitwise_or_precedence` method which reports an error if the grammar expected a starred expression with `bitwise_or` precedence: ```rs pub(super) fn verify_star_expressions( &mut self, expr: &Expr, allow_named_expression: AllowNamedExpression, ) { match expr { Expr::Tuple(ast::ExprTuple { elts, .. }) => { for expr in elts { self.verify_star_expressions(expr, allow_named_expression); } } Expr::Starred(ast::ExprStarred { value, .. }) => { let expr_kind = match &**value { Expr::Compare(_) => "comparison", Expr::UnaryOp(ast::ExprUnaryOp { op: ast::UnaryOp::Not, .. }) => "unary `not`", Expr::BoolOp(_) => "boolean", Expr::If(_) => "`if-else`", Expr::Lambda(_) => "lambda", Expr::Named(_) if allow_named_expression.is_no() => "named", _ => return, }; self.add_error( ParseErrorType::OtherError(format!( "{expr_kind} expression cannot be used here", )), expr.range(), ); } _ => {} } } ``` For this solution, the parser will need to verify the precedence of a starred expression wherever grammar expected the `star_expressions`, `star_expression` or `star_named_expression` rule. We could possibly adopt this solution for better error recovery but I think for now having a strict parser is better suited because we don't really know what complete error recovery looks like. ### Follow-up There's a related follow-up issue which I'm working on and is related to starred expressions. The starred expression is only allowed in certain expressions and context which the parser needs to make sure and report an error if found otherwise. ## Test Plan Tested this locally by changing various expression and statement to use these methods. As mentioned they will be in their own PR.

## Summary This PR adds a new `ExpressionContext` struct which is used in expression parsing. This solves the following problem: 1. Allowing starred expression with different precedence 2. Allowing yield expression in certain context 3. Remove ambiguity with `in` keyword when parsing a `for ... in` statement For context, (1) was solved by adding `parse_star_expression_list` and `parse_star_expression_or_higher` in #10623, (2) was solved by by adding `parse_yield_expression_or_else` in #10809, and (3) was fixed in #11009. All of the mentioned functions have been removed in favor of the context flags. As mentioned in #11009, an ideal solution would be to implement an expression context which is what this PR implements. This is passed around as function parameter and the call stack is used to automatically reset the context. ### Recovery How should the parser recover if the target expression is invalid when an expression can consume the `in` keyword? 1. Should the `in` keyword be part of the target expression? 2. Or, should the expression parsing stop as soon as `in` keyword is encountered, no matter the expression? For example: ```python for yield x in y: ... # Here, should this be parsed as for (yield x) in (y): ... # Or for (yield x in y): ... # where the `in iter` part is missing ``` Or, for binary expression parsing: ```python for x or y in z: ... # Should this be parsed as for (x or y) in z: ... # Or for (x or y in z): ... # where the `in iter` part is missing ``` This need not be solved now, but is very easy to change. For context this PR does the following: * For binary, comparison, and unary expressions, stop at `in` * For lambda, yield expressions, consume the `in` ## Test Plan 1. Add test cases for the `for ... in` statement and verify the snapshots 2. Make sure the existing test suite pass 3. Run the fuzzer for around 3000 generated source code 4. Run the updated logic on a dozen or so open source repositories (codename "parser-checkouts")

Implement rules around star expressions with different precedence

27516a7

dhruvmanila added the parser Related to the parser label Mar 27, 2024

dhruvmanila requested a review from MichaReiser as a code owner March 27, 2024 08:39

dhruvmanila marked this pull request as draft March 27, 2024 08:57

This comment was marked as outdated.

Sign in to view

dhruvmanila marked this pull request as ready for review March 27, 2024 09:01

MichaReiser approved these changes Mar 27, 2024

View reviewed changes

dhruvmanila merged commit 499d412 into dhruv/parser Mar 29, 2024
17 checks passed

dhruvmanila deleted the dhruv/star-expression-list branch March 29, 2024 07:48

dhruvmanila mentioned this pull request Apr 20, 2024

Add ExpressionContext for expression parsing #11055

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rules around star expressions with different precedence #10623

Implement rules around star expressions with different precedence #10623

dhruvmanila commented Mar 27, 2024

This comment was marked as outdated.

github-actions bot commented Mar 27, 2024

MichaReiser Mar 27, 2024

dhruvmanila commented Mar 29, 2024

Implement rules around star expressions with different precedence #10623

Implement rules around star expressions with different precedence #10623

Conversation

dhruvmanila commented Mar 27, 2024

Summary

Problem

Solutions

Current solution

Other solution

Follow-up

Test Plan

This comment was marked as outdated.

github-actions bot commented Mar 27, 2024

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser Mar 27, 2024

Choose a reason for hiding this comment

dhruvmanila commented Mar 29, 2024

`ruff-ecosystem` results