Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid error handling duplication for starred, yield, lambda expressions #10809

Merged
merged 5 commits into from
Apr 9, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Apr 7, 2024

Summary

This PR updates the error handling logic for certain expressions in a way to either perform it automatically or provide an option for the user. The expression in discussion here are lambda, starred and yield expression.

Problem

The current parser allows these expressions at arbitrary context. This is because the mentioned expressions are parsed using parse_lhs_expression which is part of other higher level grammar rules. This means that the caller needs to validate the parsed expression and report an error if it isn't allowed in that context. This can get quite cumbersome to do so as it needs to be done for all of the call sites for following methods:

  1. parse_expression_list: 14 references
  2. parse_star_expression_list: 4 references
  3. parse_star_expression_or_higher: 8 references
  4. parse_named_expression_or_higher: 10 references
  5. parse_conditional_expression_or_higher: 25 references
  6. parse_simple_expression: 4 references

The numbers corresponding to the methods are the number of references as of today. This list is also in the correct hierarchy of grammar precedence. For example, parse_expression_list calls into parse_conditional_expression_or_higher but not the other way around.

Solution

We'll take the above expression one at a time to understand the solution:

Lambda expression

Lambda expressions are only allowed in expression grammar rule which corresponds to parse_conditional_expression_or_higher. This means that this expression is only allowed when using either of the following functions:

  1. parse_expression_list
  2. parse_star_expression_list
  3. parse_star_expression_or_higher
  4. parse_named_expression_or_higher
  5. parse_conditional_expression_or_higher

The solution is to move the error handling in parse_simple_expression and parameterize it where any of the above listed function would always use AllowLambdaExpression::Yes.

Starred expression

There are two grammar rules related to starred expression:

  1. star_expression which corresponds to parse_star_expression_or_higher
  2. starred_expression which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of the above listed functions in the problem section. Now, starred expressions are allowed in a lot of places but sometimes in a limited capacity. For example, an assignment target can have a starred expression but only if it is a name node (*x).

The solution here is to adopt the one used in star pattern matching which is to use a parameter. The following functions are parameterized:

  1. parse_expression_list
  2. parse_named_expression_or_higher
  3. parse_conditional_expression_or_higher

Now, parse_star_expression_list and parse_star_expression_or_higher aren't parameterized because they handle the star_expression grammar which means that the caller wants to parse a starred expression but with a limited precedence.

Yield expression

Yield expressions are only allowed in the following context:

  1. Top level as yield statement
  2. Parenthesized
  3. F-string expression
  4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems like a waste given the limited number of locations they're allowed.

The solution is to add a parse_yield_expression_or_else method which parses a yield expression if the parser is at yield token or else calls the given method to parse the expression. The call site would like:

// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())

An added benefit for this is that the call site looks exactly like the grammar.

Review

  • The reviewer would mainly just look at the de-duplication logic.
  • The reviewer doesn't really need to verify the call sites as they're verified by existing test cases. For nodes which aren't yet tested, they will be done so in their own PR.

Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.

@dhruvmanila dhruvmanila added the internal An internal refactor or improvement label Apr 7, 2024
@dhruvmanila dhruvmanila force-pushed the dhruv/deduplicate-expr-error-handling branch from 0a27639 to 909d497 Compare April 7, 2024 03:54
Copy link
Contributor

github-actions bot commented Apr 7, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dhruvmanila dhruvmanila force-pushed the dhruv/deduplicate-expr-error-handling branch from 909d497 to 46b8821 Compare April 7, 2024 11:37
@dhruvmanila dhruvmanila marked this pull request as ready for review April 7, 2024 11:42
@dhruvmanila dhruvmanila requested a review from MichaReiser as a code owner April 7, 2024 11:42
@dhruvmanila dhruvmanila force-pushed the dhruv/deduplicate-expr-error-handling branch from 0ce341d to 1698b32 Compare April 9, 2024 08:48
@dhruvmanila dhruvmanila merged commit cd1311e into dhruv/parser Apr 9, 2024
5 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/deduplicate-expr-error-handling branch April 9, 2024 08:48
dhruvmanila added a commit that referenced this pull request Apr 10, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 11, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 11, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 15, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 15, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 17, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 17, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 18, 2024
…ns (#10809)

## Summary

This PR updates the error handling logic for certain expressions in a
way to either perform it automatically or provide an option for the
user. The expression in discussion here are `lambda`, starred and
`yield` expression.

### Problem

The current parser allows these expressions at arbitrary context. This
is because the mentioned expressions are parsed using
`parse_lhs_expression` which is part of other higher level grammar
rules. This means that the caller needs to validate the parsed
expression and report an error if it isn't allowed in that context. This
can get quite cumbersome to do so as it needs to be done for all of the
call sites for following methods:

1. `parse_expression_list`: 14 references
2. `parse_star_expression_list`: 4 references
3. `parse_star_expression_or_higher`: 8 references
4. `parse_named_expression_or_higher`: 10 references
5. `parse_conditional_expression_or_higher`: 25 references
6. `parse_simple_expression`: 4 references

The numbers corresponding to the methods are the number of references as
of today. This list is also in the correct hierarchy of grammar
precedence. For example, `parse_expression_list` calls into
`parse_conditional_expression_or_higher` but not the other way around.

### Solution

We'll take the above expression one at a time to understand the
solution:

#### Lambda expression

Lambda expressions are only allowed in `expression` grammar rule which
corresponds to `parse_conditional_expression_or_higher`. This means that
this expression is only allowed when using either of the following
functions:

1. `parse_expression_list`
2. `parse_star_expression_list`
3. `parse_star_expression_or_higher`
4. `parse_named_expression_or_higher`
5. `parse_conditional_expression_or_higher`

The solution is to move the error handling in `parse_simple_expression`
and parameterize it where any of the above listed function would always
use `AllowLambdaExpression::Yes`.

#### Starred expression

There are two grammar rules related to starred expression:
1. `star_expression` which corresponds to
`parse_star_expression_or_higher`
2. `starred_expression` which is parsed in LHS parsing

Remember that LHS parsing isn't accessed directly but only via any of
the above listed functions in the problem section. Now, starred
expressions are allowed in a lot of places but sometimes in a limited
capacity. For example, an assignment target can have a starred
expression but only if it is a name node (`*x`).

The solution here is to adopt the one used in star pattern matching
which is to use a parameter. The following functions are parameterized:
1. `parse_expression_list`
2. `parse_named_expression_or_higher`
3. `parse_conditional_expression_or_higher`

Now, `parse_star_expression_list` and `parse_star_expression_or_higher`
aren't parameterized because they handle the `star_expression` grammar
which means that the caller wants to parse a starred expression but with
a limited precedence.

#### Yield expression

Yield expressions are only allowed in the following context:
1. Top level as yield statement
2. Parenthesized
3. F-string expression
4. Assignment (including annotated and augmented) value

We could parameterize it similar to starred expression but that seems
like a waste given the limited number of locations they're allowed.

The solution is to add a `parse_yield_expression_or_else` method which
parses a yield expression if the parser is at `yield` token or else
calls the given method to parse the expression. The call site would
like:

```rs
// (yield_expr | named_expression)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_named_expression_or_higher())

// (yield_expr | star_expressions)
self.try_parse_yield_expression()
  .unwrap_or_else(|| self.parse_star_expression_list())
```

An added benefit for this is that the call site looks exactly like the
grammar.

## Review

* The reviewer would mainly just look at the de-duplication logic.
* The reviewer doesn't really need to verify the call sites as they're
verified by existing test cases. For nodes which aren't yet tested, they
will be done so in their own PR.

## Test Plan

Run existing test cases and verify the snapshot updates.

Additional test cases will be added when working on specific nodes.
dhruvmanila added a commit that referenced this pull request Apr 18, 2024
## Summary

This PR removes a couple of inline tests TODO which was a leftover from
#10809
dhruvmanila added a commit that referenced this pull request Apr 23, 2024
## Summary

This PR adds a new `ExpressionContext` struct which is used in
expression parsing.

This solves the following problem:
1. Allowing starred expression with different precedence
2. Allowing yield expression in certain context
3. Remove ambiguity with `in` keyword when parsing a `for ... in`
statement

For context, (1) was solved by adding `parse_star_expression_list` and
`parse_star_expression_or_higher` in #10623, (2) was solved by by adding
`parse_yield_expression_or_else` in #10809, and (3) was fixed in #11009.
All of the mentioned functions have been removed in favor of the context
flags.

As mentioned in #11009, an ideal solution would be to implement an
expression context which is what this PR implements. This is passed
around as function parameter and the call stack is used to automatically
reset the context.

### Recovery

How should the parser recover if the target expression is invalid when
an expression can consume the `in` keyword?

1. Should the `in` keyword be part of the target expression?
2. Or, should the expression parsing stop as soon as `in` keyword is
encountered, no matter the expression?

For example:
```python
for yield x in y: ...

# Here, should this be parsed as
for (yield x) in (y): ...
# Or
for (yield x in y): ...
# where the `in iter` part is missing
```

Or, for binary expression parsing:
```python
for x or y in z: ...

# Should this be parsed as
for (x or y) in z: ...
# Or
for (x or y in z): ...
# where the `in iter` part is missing
```

This need not be solved now, but is very easy to change. For context
this PR does the following:
* For binary, comparison, and unary expressions, stop at `in`
* For lambda, yield expressions, consume the `in`

## Test Plan

1. Add test cases for the `for ... in` statement and verify the
snapshots
2. Make sure the existing test suite pass
3. Run the fuzzer for around 3000 generated source code
4. Run the updated logic on a dozen or so open source repositories
(codename "parser-checkouts")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants