Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the current set of rules for precedence partial ordering #1096

Merged
merged 7 commits into from
Mar 3, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions executable_semantics/syntax/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,134 @@ techniques can be applied to other kinds of AST nodes as needed. See the
handling of the `UNIMPL_EXAMPLE` token for an example of how this is done, and
see [`unimplemented_example_test.cpp`](unimplemented_example_test.cpp) for an
example of how to test it.

## Precedence and associativity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking comment: reading this section, I still have a hard time working out what rules I should follow when adding/modifying grammar rules, or conversely how to read grammar rules to understand the structure they describe. I think part of the issue is that the "what" (as opposed to "why") part is split between the idealised rules on lines 45-63 and the modifications to that ideal on lines 96-102.

However, I'm not sure that trying to wordsmith this is a good use of time right now; we'll probably have a better idea of how to clarify this once we've worked with it for a while. I think all the necessary information is here (modulo my question below), and that's probably enough for now.


The [Bison expression grammar](parser.ypp) uses the
[precedence climbing method](https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method)
to model precedence and associativity, suitably modified to handle Carbon's
partial precedence order without grammar ambiguities.

Consider this example
[precedence diagram](/docs/design/expressions/README.md#precedence):

```mermaid
graph BT
%%{init: {'themeVariables': {'fontFamily': 'monospace'}}}%%
minus["minus<br>-x"]
mul>"mul<br>x * y"]
add>"add<br>x + y"]
mod["mod<br>x % y"]
eq["eq<br>x = y"]

eq --> add & mod
add --> mul
mul & mod --> minus
```

For each precedence level, we have up to three grammar productions:

- `foo_expression` represents an expression at that precedence level or
higher, and includes as productions all of the expression kinds that are
immediately higher in the precedence graph:
```bison
add_expression:
mul_expression | add_lhs '+' add_operand ;
```
- `foo_operand` represents an operand of a `foo_expression` that is not itself
a `foo_expression`.
```bison
eq_operand:
add_expression | mod_expression ;
```
- For left-associative operators, `foo_lhs` represents either a `foo_operand`
or a `foo_expression`.
```
add_lhs:
add_operand | add_expression ;
```

The above approach leads to (benign) reduce-reduce conflicts. In our example
precedence diagram, the expression `-x == y` has two different parses:

- _eq_expression_
- _eq_operand_
- _add_expression_
- _mul_expression_
- _minus_expression_
- `-`
- `x`
- `==`
- _eq_operand_
- ...
- `y`

and

- _eq_expression_
- _eq_operand_
- _mod_expression_
- _minus_expression_
- `-`
- `x`
- `==`
- _eq_operand_
- ...
- `y`

These would invoke the same parsing actions, so the states can be combined, but
Bison isn't smart enough to see that.

In order to eliminate these conflicts, if there are multiple paths through the
precedence graph between a higher-precedence level `foo` and some lower
precedence level `bar` -- that is, if there's a diamond in the precedence graph
with `foo` at the top and `bar` at the bottom -- `foo_expression`s are excluded
from all intermediate `_expression` productions on the diamond between `foo` and
`bar`, and are added back in the downstream `_operand` productions in the
diamond instead:

```bison
minus_expression:
identifier | '-' identifier ;

// In the real grammar, trivial productions like this are inlined.
mul_operand:
minus_expression ;
mul_lhs:
mul_operand | mul_expression ;
// A minus_expression is not a mul_expression, even though it's a
// higher-precedence expression, because there are multiple paths from
// eq_expression to minus_expression, and this production is on such a path.
mul_expression:
mul_lhs '*' mul_operand

// minus_expression is listed here because it is excluded from mul_expression.
add_operand:
minus_expression | mul_expression ;
// This is notionally
// add_operand | add_expression
// but that introduces a reduce-reduce conflict when interpreting a
// mul_expression as an add_lhs.
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
add_lhs:
minus_expression | add_expression ;
// A mul_expression is an add_expression, because multiplication is
// higher-precedence, and mul is not at the top of a diamond in the precedence
// graph. minus_expression is excluded because we are within a diamond with it
// at the top.
add_expression:
mul_expression | add_lhs '+' add_operand ;

mod_operand:
minus_expression ;
mod_expression:
mod_operand '%' mod_operand ;

// We add back minus_expression here because it was excluded from add_expression
// and mod_expression.
eq_operand:
minus_expression | add_expression | mod_expression ;
// We also include minus_expression here because this is the bottom of the
// precedence diamond.
eq_expression:
minus_expression | add_expression | mod_expression | eq_operand '=' eq_operand ;
```
Loading