Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Consider all definitions after terminal statements unreachable #15676

Merged
merged 37 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
e0d2130
Add failing test cases for terminal control flow
dcreager Jan 22, 2025
f45af24
Record unreachable visibility after terminal statements
dcreager Jan 22, 2025
70b681b
More thorough reveal_types
dcreager Jan 22, 2025
81a3164
Actually we do support those now!
dcreager Jan 22, 2025
93b149c
Merge branch 'main' into dcreager/terminal-visibility
dcreager Jan 23, 2025
912ed62
Use happy constant
dcreager Jan 23, 2025
4102c89
Record reachability in semantic index builder
dcreager Jan 24, 2025
60eccc9
Apply suggestions from code review
dcreager Jan 24, 2025
4190b5b
Add (failing) eager subscope example
dcreager Jan 24, 2025
641ccb8
Also skip merging when lhs is unreachable
dcreager Jan 24, 2025
efdf6c3
We support this one too
dcreager Jan 24, 2025
57ea7ca
Add failing try/finally example
dcreager Jan 24, 2025
db98926
Use for loops
dcreager Jan 24, 2025
a434762
Add Carl's suggested return error
dcreager Jan 24, 2025
b9f91fb
Merge branch 'main' into dcreager/terminal-visibility
dcreager Jan 24, 2025
1388eb0
Merge branch 'main' into dcreager/terminal-visibility
dcreager Jan 27, 2025
79de3eb
Add failing early return example
dcreager Jan 27, 2025
c4d1599
Add TODOs for function return example
dcreager Jan 27, 2025
82e6e67
Add TODO link for continue false positives
dcreager Jan 27, 2025
bfe76da
Prefer the faster option first
dcreager Jan 27, 2025
0b7da66
Apply suggestions from code review
dcreager Jan 28, 2025
e4771bf
Better descriptions of continue and break; add nested tests
dcreager Jan 28, 2025
9431fb9
Remove tomllib benchmark error reproduction
dcreager Jan 28, 2025
08b0522
Add "both nested branches" tests
dcreager Jan 28, 2025
ee3ac28
Add separate then/else branch tests
dcreager Jan 28, 2025
add8a57
Fix comment
dcreager Jan 28, 2025
a0e6980
Remove duplicate test
dcreager Jan 28, 2025
9fb8d2a
Fix header
dcreager Jan 28, 2025
3684739
Add raise tests
dcreager Jan 28, 2025
7fbec5f
Add return-in-try tests
dcreager Jan 28, 2025
5c45e88
Fix comment
dcreager Jan 28, 2025
d2b21fc
Add statically-known terminal test case
dcreager Jan 28, 2025
2f62eb0
Apply suggestions from code review
dcreager Jan 29, 2025
32e8131
Update TODOs to not assume how we'll change exception tracking
dcreager Jan 29, 2025
2932bb4
Merge branch 'main' into dcreager/terminal-visibility
dcreager Jan 29, 2025
1cf65f5
Linter
dcreager Jan 29, 2025
681fae2
Update crates/red_knot_python_semantic/resources/mdtest/terminal_stat…
dcreager Jan 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Terminal statements

## Introduction

Terminal statements complicate a naive control-flow analysis.

As a simple example:

```py
def f(cond: bool) -> str:
if cond:
x = "test"
else:
raise ValueError
return x

def g(cond: bool):
if cond:
x = "test"
reveal_type(x) # revealed: Literal["test"]
else:
x = "unreachable"
reveal_type(x) # revealed: Literal["unreachable"]
carljm marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError
reveal_type(x) # revealed: Literal["test"]
```

In `f`, we should be able to determine that the `else` branch ends in a terminal statement, and that
the `return` statement can only be executed when the condition is true. We should therefore consider
the reference always bound, even though `x` is only bound in the true branch.

Similarly, in `g`, we should see that the assignment of the value `"unreachable"` can never be seen
by the final `reveal_type`.

## `return` is terminal

```py
def f(cond: bool) -> str:
if cond:
x = "test"
else:
return "early"
return x # no possibly-unresolved-reference diagnostic!

def g(cond: bool):
if cond:
x = "test"
reveal_type(x) # revealed: Literal["test"]
else:
x = "unreachable"
reveal_type(x) # revealed: Literal["unreachable"]
carljm marked this conversation as resolved.
Show resolved Hide resolved
return
reveal_type(x) # revealed: Literal["test"]
```

## `continue` is terminal within its loop scope
carljm marked this conversation as resolved.
Show resolved Hide resolved

TODO: We are not currently modeling the cyclic control flow for loops, pending fixpoint support in
Salsa. The false positives in this section are because of that, and not our terminal statement
support. See [ruff#14160](https://github.com/astral-sh/ruff/issues/14160) for more details.

```py
def f(cond: bool) -> str:
while True:
if cond:
x = "test"
else:
continue
return x

def g(cond: bool, i: int):
x = "before"
for _ in range(i):
if cond:
x = "loop"
reveal_type(x) # revealed: Literal["loop"]
else:
x = "continue"
reveal_type(x) # revealed: Literal["continue"]
continue
reveal_type(x) # revealed: Literal["loop"]
# TODO: Should be Literal["before", "loop", "continue"]
reveal_type(x) # revealed: Literal["before", "loop"]
dcreager marked this conversation as resolved.
Show resolved Hide resolved
carljm marked this conversation as resolved.
Show resolved Hide resolved
```

## `break` is terminal within its loop scope
carljm marked this conversation as resolved.
Show resolved Hide resolved

```py
def f(cond: bool) -> str:
while True:
if cond:
x = "test"
else:
break
return x
return x # error: [unresolved-reference]

def g(cond: bool, i: int):
x = "before"
for _ in range(i):
if cond:
x = "loop"
reveal_type(x) # revealed: Literal["loop"]
else:
x = "break"
reveal_type(x) # revealed: Literal["break"]
break
reveal_type(x) # revealed: Literal["loop"]
reveal_type(x) # revealed: Literal["before", "loop", "break"]
```

## `return` is terminal in nested conditionals
carljm marked this conversation as resolved.
Show resolved Hide resolved

```py
def f(cond1: bool, cond2: bool) -> str:
if cond1:
if cond2:
x = "test1"
else:
return "early"
else:
x = "test2"
return x

def g(cond1: bool, cond2: bool):
if cond1:
if cond2:
x = "test1"
reveal_type(x) # revealed: Literal["test1"]
else:
x = "unreachable"
reveal_type(x) # revealed: Literal["unreachable"]
return
reveal_type(x) # revealed: Literal["test1"]
else:
x = "test2"
reveal_type(x) # revealed: Literal["test2"]
reveal_type(x) # revealed: Literal["test1", "test2"]
```

## Terminal in a `finally` block
carljm marked this conversation as resolved.
Show resolved Hide resolved
dcreager marked this conversation as resolved.
Show resolved Hide resolved

Control-flow through finally isn't working right yet:
dcreager marked this conversation as resolved.
Show resolved Hide resolved

```py
def f():
x = 1
while True:
try:
break
finally:
x = 2
# TODO: should be Literal[2]
reveal_type(x) # revealed: Literal[1]
```

## Early returns and nested functions

Free references inside of a function body refer to variables defined in the containing scope.
Function bodies are _lazy scopes_: at runtime, these references are not resolved immediately at the
point of the function definition. Instead, they are resolved _at the time of the call_, which means
that their values (and types) can be different for different invocations. For simplicity, we instead
resolve free references _at the end of the containing scope_. That means that in the examples below,
all of the `x` bindings should be visible to the `reveal_type`, regardless of where we place the
`return` statements.

TODO: These currently produce the wrong results, but not because of our terminal statement support.
See [ruff#15777](https://github.com/astral-sh/ruff/issues/15777) for more details.

```py
def top_level_return(cond1: bool, cond2: bool):
x = 1

def g():
# TODO eliminate Unknown
reveal_type(x) # revealed: Unknown | Literal[1, 2, 3]
dcreager marked this conversation as resolved.
Show resolved Hide resolved
if cond1:
if cond2:
x = 2
else:
x = 3
return

def return_from_if(cond1: bool, cond2: bool):
x = 1

def g():
# TODO: Literal[1, 2, 3]
reveal_type(x) # revealed: Unknown | Literal[1]
if cond1:
if cond2:
x = 2
else:
x = 3
return

def return_from_nested_if(cond1: bool, cond2: bool):
x = 1

def g():
# TODO: Literal[1, 2, 3]
reveal_type(x) # revealed: Unknown | Literal[1, 3]
if cond1:
if cond2:
x = 2
return
else:
x = 3
```

## Early returns and list comprehensions

```py
def f(x: str) -> int:
y = [x for i in range(len(x))]
return 4
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A return at the end of the function isn't really an "early return", contrary to the test title.

Is this test testing some code that was added in this PR? It doesn't clearly seem to test anything about terminality of return.

This test intersects with two known-incorrect areas (closed-over vars in scopes with return statements, modeling of eagerly-executing nested scopes), has no reveal_type (so asserts nothing more than "no diagnostics here") and doesn't demonstrate any TODOs. This makes me question its value as a test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minimal reproduction of an error I was getting in the tomllib benchmark test. I thought to put it here to catch it earlier in the CI process, but since it's redundant with the tomllib test I'll remove it. (Maybe a better way to handle this is to move the assertions out of the benchmark and into a new test case that also analyzes tomllib? That way the benchmark is only concerned with performance, and the test with correctness.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine (even good) to take cases that we find from tomllib (or any other testing on real code), distill them down to their simplest form that illustrates a potential regression, and include them as mdtests. So that's not an issue. I think my question here really is trying to understand what the regression was (what did we do wrong in this example in some earlier version of this PR?) and clarify what behavior the test is trying to demonstrate (maybe just with some prose, maybe by adding a reveal_type, maybe both).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was using visibility constraints, this was a regression because:

  • the return statement would mark the x parameter as non-visible for the remainder of the flow;
  • list comprehensions would resolve free variables as of the end of the containing scope,
  • which is technically after the return statement, and so the body of the list comprehension wouldn't see the x formal parameter as a visible definition.

It's the a lack of an unresolved-reference error that shows that the regression isn't there anymore.

Talking through it in detail like this, I think this is superfluous with the "Early returns and nested functions" tests, because it was the "closed-over vars in scopes with return statements" part that was relevant, and the "modeling of eagerly-executing nested scopes" was a red herring.

26 changes: 21 additions & 5 deletions crates/red_knot_python_semantic/src/semantic_index/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,12 @@ impl<'db> SemanticIndexBuilder<'db> {
.record_visibility_constraint(VisibilityConstraint::VisibleIf(constraint))
}

/// Records that all remaining statements in the current block are unreachable, and therefore
/// not visible.
fn mark_unreachable(&mut self) {
self.current_use_def_map_mut().mark_unreachable();
}

/// Records a [`VisibilityConstraint::Ambiguous`] constraint.
fn record_ambiguous_visibility(&mut self) -> ScopedVisibilityConstraintId {
self.current_use_def_map_mut()
Expand Down Expand Up @@ -1019,11 +1025,6 @@ where
}
self.visit_body(body);
}
ast::Stmt::Break(_) => {
if self.loop_state().is_inside() {
self.loop_break_states.push(self.flow_snapshot());
}
}

ast::Stmt::For(
for_stmt @ ast::StmtFor {
Expand Down Expand Up @@ -1270,6 +1271,21 @@ where
// - https://github.com/astral-sh/ruff/pull/13633#discussion_r1788626702
self.visit_body(finalbody);
}

ast::Stmt::Raise(_) | ast::Stmt::Return(_) | ast::Stmt::Continue(_) => {
walk_stmt(self, stmt);
// Everything in the current block after a terminal statement is unreachable.
self.mark_unreachable();
}

ast::Stmt::Break(_) => {
if self.loop_state().is_inside() {
self.loop_break_states.push(self.flow_snapshot());
}
// Everything in the current block after a terminal statement is unreachable.
self.mark_unreachable();
}

_ => {
walk_stmt(self, stmt);
}
Expand Down
25 changes: 25 additions & 0 deletions crates/red_knot_python_semantic/src/semantic_index/use_def.rs
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,7 @@ impl std::iter::FusedIterator for DeclarationsIterator<'_, '_> {}
pub(super) struct FlowSnapshot {
symbol_states: IndexVec<ScopedSymbolId, SymbolState>,
scope_start_visibility: ScopedVisibilityConstraintId,
reachable: bool,
}

#[derive(Debug)]
Expand Down Expand Up @@ -503,6 +504,8 @@ pub(super) struct UseDefMapBuilder<'db> {

/// Currently live bindings and declarations for each symbol.
symbol_states: IndexVec<ScopedSymbolId, SymbolState>,

reachable: bool,
}

impl Default for UseDefMapBuilder<'_> {
Expand All @@ -515,11 +518,16 @@ impl Default for UseDefMapBuilder<'_> {
bindings_by_use: IndexVec::new(),
definitions_by_definition: FxHashMap::default(),
symbol_states: IndexVec::new(),
reachable: true,
}
}
}

impl<'db> UseDefMapBuilder<'db> {
pub(super) fn mark_unreachable(&mut self) {
self.reachable = false;
}

pub(super) fn add_symbol(&mut self, symbol: ScopedSymbolId) {
let new_symbol = self
.symbol_states
Expand Down Expand Up @@ -656,6 +664,7 @@ impl<'db> UseDefMapBuilder<'db> {
FlowSnapshot {
symbol_states: self.symbol_states.clone(),
scope_start_visibility: self.scope_start_visibility,
reachable: self.reachable,
}
}

Expand All @@ -678,12 +687,25 @@ impl<'db> UseDefMapBuilder<'db> {
num_symbols,
SymbolState::undefined(self.scope_start_visibility),
);

self.reachable = snapshot.reachable;
}

/// Merge the given snapshot into the current state, reflecting that we might have taken either
/// path to get here. The new state for each symbol should include definitions from both the
/// prior state and the snapshot.
pub(super) fn merge(&mut self, snapshot: FlowSnapshot) {
// Unreachable snapshots should not be merged: If the current snapshot is unreachable, it
// should be completely overwritten by the snapshot we're merging in. If the other snapshot
// is unreachable, we should return without merging.
if !snapshot.reachable {
return;
}
if !self.reachable {
self.restore(snapshot);
return;
}
Comment on lines +698 to +707
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the possibility that neither self nor snapshot is reachable. This code will correctly result in self still being marked unreachable in that case, but it seems a little odd that we keep the visible definitions state from self in that case. The logical extension of the idea that an unreachable state takes no part in a merge should be that in case neither are reachable, we reset self to a state with reachable: false and no visible definitions, right?

Not sure if it practically makes a difference, though; since the new state is still unreachable its visible definitions shouldn't "go" anywhere anyway, even if self is later merged into another state. I guess it will make a difference to the types we reveal in the following unreachable code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it seems a little odd that we keep the visible definitions state from self in that case

If we want this, I think it would be best to add an invariant that marking a flow as unreachable clears out all of its definitions, and update mark_unreachable and restore to maintain that invariant. Then this code in merge would do the right thing as you describe.

I guess it will make a difference to the types we reveal in the following unreachable code?

Yes, that's exactly right. (In the sense that that's what the code does, not necessarily that that's what we want it to do 😅) For now, I was punting on this, because this PR isn't currently addressing what we want to do for unreachable code. (Note that in the mdtests I've tried to not put in any reveal_types in unreachable positions.)

I think there are a couple of issues at play here. One is that, not even considering the merge, what do we want to report in the unreachable code within the same block after a terminal statement?

x = 2
return
reveal_type(x)  # ???

Should it be an unresolved-reference error? Or should it act as if the terminal statement weren't there, and show what x would be if control flow could somehow make it to that point? Or should we silence all diagnostics completely in unreachable code?

Whatever we choose, we should have the same result for

if cond:
    x = 2
    return
else:
    x = 3
    return
reveal_type(x)  # ???

If we decide that we want the first case to reveal Literal[2], then we'd want this case to reveal Literal[2, 3] — which means that we actually do want to merge all of the flows, even if they're unreachable, and it's just the visibility of the relevant symbols that needs to be tracked/adjusted somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that all makes sense, thanks for clarifying!

I think we should tackle this separately as a later problem; maybe file an issue for it? I don't think it's urgent, and I'm happy with doing the "least work" for now, even if it's less consistent (that is, neither eagerly clearing definitions when a branch becomes unreachable, nor merging definitions from two unreachable branches just so we can have fully consistent types in unreachable code).

Whatever we do for unreachable code should look consistent whether the origin of that unreachability is in terminals or in statically-known branches (that is, code under an if False should behave similarly to code after a return), which may place some additional constraints on how we handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-of-head thoughts on what behaviors we do/don't want (not for action now, just for consideration in writing up the issue):

  • I definitely don't think it would be useful to issue undefined-reference diagnostics for names used in unreachable code that would have been defined were the branch reachable.
  • In some sense I think Never is the "right" type for all expressions in unreachable code?
  • But I suspect that the most useful behavior is to check the unreachable code (and still raise diagnostics in it as normal), as if it were reachable.
  • We should look into mypy and pyright behavior here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy has terrible UX if it sees a reveal_type in a block of code it considers unreachable: it just doesn't emit any diagnostic for the reveal_type call at all. This has been the source of many bug reports at mypy over the years, because the rule that tells you off for having unreachable code in the first place is disabled by default, even if you opt into mypy's --strict flag. We shouldn't do what mypy does!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// We never remove symbols from `symbol_states` (it's an IndexVec, and the symbol
// IDs must line up), so the current number of known symbols must always be equal to or
// greater than the number of known symbols in a previously-taken snapshot.
Expand All @@ -705,6 +727,9 @@ impl<'db> UseDefMapBuilder<'db> {
self.scope_start_visibility = self
.visibility_constraints
.add_or_constraint(self.scope_start_visibility, snapshot.scope_start_visibility);

// At least one of the two snapshots was reachable, so the merged result is too.
carljm marked this conversation as resolved.
Show resolved Hide resolved
self.reachable = true;
}

pub(super) fn finish(mut self) -> UseDefMap<'db> {
Expand Down
17 changes: 0 additions & 17 deletions crates/ruff_benchmark/benches/red_knot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,25 +26,8 @@ const TOMLLIB_312_URL: &str = "https://raw.githubusercontent.com/python/cpython/
static EXPECTED_DIAGNOSTICS: &[&str] = &[
// We don't support `*` imports yet:
"error[lint:unresolved-import] /src/tomllib/_parser.py:7:29 Module `collections.abc` has no member `Iterable`",
// We don't support terminal statements in control flow yet:
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:66:18 Name `s` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:98:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:101:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:104:14 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:115:14 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:126:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:348:20 Name `nest` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:353:5 Name `nest` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:453:24 Name `nest` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:455:9 Name `nest` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:482:16 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:566:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:573:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:579:12 Name `char` used when possibly not defined",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:580:63 Name `char` used when possibly not defined",
carljm marked this conversation as resolved.
Show resolved Hide resolved
// We don't handle intersections in `is_assignable_to` yet
"error[lint:invalid-argument-type] /src/tomllib/_parser.py:626:46 Object of type `Unknown & ~AlwaysFalsy | @Todo & ~AlwaysFalsy` cannot be assigned to parameter 1 (`match`) of function `match_to_datetime`; expected type `Match`",
"warning[lint:possibly-unresolved-reference] /src/tomllib/_parser.py:629:38 Name `datetime_obj` used when possibly not defined",
"error[lint:invalid-argument-type] /src/tomllib/_parser.py:632:58 Object of type `Unknown & ~AlwaysFalsy | @Todo & ~AlwaysFalsy` cannot be assigned to parameter 1 (`match`) of function `match_to_localtime`; expected type `Match`",
"error[lint:invalid-argument-type] /src/tomllib/_parser.py:639:52 Object of type `Unknown & ~AlwaysFalsy | @Todo & ~AlwaysFalsy` cannot be assigned to parameter 1 (`match`) of function `match_to_number`; expected type `Match`",
"warning[lint:unused-ignore-comment] /src/tomllib/_parser.py:682:31 Unused blanket `type: ignore` directive",
Expand Down
Loading