Skip to content

Conversation

@MatthewMckee4
Copy link
Contributor

Summary

Fixes incorrect negated type eq and ne assertions in infer_binary_intersection_type_comparison

fixes #17360

Test Plan

Remove and update some now incorrect tests

@MatthewMckee4 MatthewMckee4 marked this pull request as ready for review April 12, 2025 23:21
@github-actions
Copy link
Contributor

github-actions bot commented Apr 12, 2025

mypy_primer results

No ecosystem changes detected ✅

@MatthewMckee4 MatthewMckee4 changed the title Fix eq static assert [red-knot] Fix eq static assert Apr 13, 2025
@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Apr 13, 2025
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines 53 to 54
reveal_type(x == "abc") # revealed: bool
reveal_type("abc" == x) # revealed: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, this is a bit unfortunate. If x has type ~Literal[0], then you can't say for sure that it won't compare equal to 0, and you can't even say for sure that it won't compare equal to 0 if it has type int & ~Literal[0]: e.g. the runtime object False inhabits the type int & ~Literal[0], but it compares equal to 0.

The same goes if x has type ~Literal["abc"] or str & ~Literal["abc"]: you can't say anything in particular about whether x will compare equal to "abc" or not at runtime, since x could be an instance of a str subclass and still compare equal to "abc". But if x has type LiteralString & ~Literal["abc"], we can say for sure that it won't compare equal to "abc", since for x to inhabit the type LiteralString we know that it must have exactly str as its __class__ (it can't be an instance of a str subclass).

Ideally we'd find a way of fixing this issue that didn't regress on our ability to narrow from a LiteralString into a string-literal type. I suppose this may involve some special-casing in infer_binary_intersection_type_comparison.

Having said that, it feels like achieving this might be fairly complex, and our current inference is quite incorrect in some obvious cases. I'd be okay with going with what you have now if you added some # TODO comments above these reveal_type calls saying that ideally we'd be able to narrow the LiteralString type to string-literal types using == and !=. The same goes for the assertions on lines 58-59

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Am i right in saying we should be able to determine these (as Literal[False]) too?

reveal_type(x == "something else")  # revealed: bool
reveal_type("something else" == x)  # revealed: bool

@AlexWaygood AlexWaygood changed the title [red-knot] Fix eq static assert [red-knot] Do not assume that x != 0 if x inhabits ~Literal[0] Apr 13, 2025
@MatthewMckee4
Copy link
Contributor Author

@AlexWaygood currently reveal_type(not_one is not 1) revealed bool, with my changes it now returns Literal[True]

Comment on lines 5322 to 5327
let non_empty_intersection_positive = if intersection.positive(self.db()).is_empty() {
vec![Type::object(self.db())]
} else {
intersection.iter_positive(self.db()).collect()
};
for pos in non_empty_intersection_positive {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of this implementation at the moment, but previously this function could return object, which seems wrong. This would happen when positive was empty. If anyone could advise on a better way to implement this that would be great

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should normalize our representation of negation-only types. This is probably necessary in order to get is_equivalent_to correct, also.

I think we should ensure that intersection.positive(db) (or a method we add that we use instead of accessing that directly) always returns at least object, and never empty; that will give the correct behavior in cases like this one.

Then the remaining question is whether we normalize to that by always ensuring in IntersectionBuilder that we add object to the positive elements of every union with otherwise empty positive elements, or whether we normalize to "empty positive elements", make positive private, and add a public method that returns an iterator over just object if positive elements is empty.

The former is simpler, but perhaps more costly. Not sure if negation types will be common enough that it matters, but we do generate them commonly in narrowing. I would probably go for the former first and see if it causes a detectable regression.

It might make sense to do this as a separate PR (with some tests for equivalence of such intersections), and then rebase this PR after landing that one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're looking at this method, its implementation seems kind of wasteful, in that we might end up calling infer_binary_type_comparison twice for every positive member of the intersection. Seems like we could collapse the two loops over positive elements into one loop, with early short-circuit if possible? But that probably doesn't belong in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make positive private, and add a public method that returns an iterator over just object if positive elements is empty.

Lots of pre-existing code calls .iter() or .len() on positive and negative.

#[salsa::interned(debug)]
pub struct IntersectionType<'db> {
    /// The intersection type includes only values in all of these types.
    #[return_ref]
    _positive: FxOrderSet<Type<'db>>,

    /// The intersection type does not include any value in any of these types.
    ///
    /// Negation types aren't expressible in annotations, and are most likely to arise from type
    /// narrowing along with intersections (e.g. `if not isinstance(...)`), so we represent them
    /// directly in intersections rather than as a separate type.
    #[return_ref]
    _negative: FxOrderSet<Type<'db>>,
}

impl<'db> IntersectionType<'db> {
    pub fn positive(&self, db: &'db dyn Db) -> Box<dyn Iterator<Item = Type<'db>> + 'db> {
        if self._positive(db).is_empty() {
            Box::new(std::iter::once(Type::object(db)))
        } else {
            Box::new(self._positive(db).iter().copied())
        }
    }

    pub fn negative(&self, db: &'db dyn Db) -> Box<dyn Iterator<Item = Type<'db>> + 'db> {
        Box::new(self._negative(db).iter().copied())
    }

    pub fn positive_len(&self, db: &'db dyn Db) -> usize {
        self._positive(db).len()
    }

    pub fn negative_len(&self, db: &'db dyn Db) -> usize {
        self._negative(db).len()
    }
}

Do you think this is a good change as it will require lots of other updates to the codebase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because object is so predictable in many cases (e.g. in subtyping), it's likely that in many/most cases we don't have to actually load object at all in order to provide the right behavior.

yeah, I've wondered in the past if we should have a Type::Object variant to avoid having so many typeshed lookups from loading the object type. object is such a special type that it feels like it would be pretty defensible to do that. But the refactor would probably be tricky to pull off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we expect this test to pass, or should these types be bool?

def static_truthiness(not_one: Not[Literal[1]], not_int: Not[int]) -> None:
    ...
    reveal_type(isinstance(not_int, int))  # revealed: Literal[False]
    reveal_type(not isinstance(not_int, int))  # revealed: Literal[True]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those tests are correct as you've written them. The type Not[int] excludes every object that is an instance of int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then i'll have a look into why they're failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably add this to a new PR

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this! This stuff is so subtle...

Comment on lines 5322 to 5327
let non_empty_intersection_positive = if intersection.positive(self.db()).is_empty() {
vec![Type::object(self.db())]
} else {
intersection.iter_positive(self.db()).collect()
};
for pos in non_empty_intersection_positive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should normalize our representation of negation-only types. This is probably necessary in order to get is_equivalent_to correct, also.

I think we should ensure that intersection.positive(db) (or a method we add that we use instead of accessing that directly) always returns at least object, and never empty; that will give the correct behavior in cases like this one.

Then the remaining question is whether we normalize to that by always ensuring in IntersectionBuilder that we add object to the positive elements of every union with otherwise empty positive elements, or whether we normalize to "empty positive elements", make positive private, and add a public method that returns an iterator over just object if positive elements is empty.

The former is simpler, but perhaps more costly. Not sure if negation types will be common enough that it matters, but we do generate them commonly in narrowing. I would probably go for the former first and see if it causes a detectable regression.

It might make sense to do this as a separate PR (with some tests for equivalence of such intersections), and then rebase this PR after landing that one?

Comment on lines -5267 to -5272
(ast::CmpOp::Eq, Some(Type::BooleanLiteral(true))) => {
return Ok(Type::BooleanLiteral(false));
}
(ast::CmpOp::NotEq, Some(Type::BooleanLiteral(false))) => {
return Ok(Type::BooleanLiteral(true));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this removal is correct. I guess to address the TODO comments added above (where we intersect with LiteralString, we would need to introduce some Type method which would be true for LiteralString, which would mean "inhabitants of this type can never be equal to any object that doesn't inhabit this type, and all proper subtypes are single-valued." And then here, we could apply these removed cases only if all positive members of the intersection are such a type.

But I think maybe LiteralString is the only such type we need to care about, so a Type method might be overkill, we could also just special-case LiteralString here for now.

Copy link
Member

@AlexWaygood AlexWaygood Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think maybe LiteralString is the only such type we need to care about, so a Type method might be overkill, we could also just special-case LiteralString here for now.

No, I don't think so. bool can be safely narrowed to Literal[True] using an == True comparison. And in due course we'll want to be able to narrow <instance of enum class> to <Literal member of that enum> using equality as well (assuming all enum members have unique values, and the the enum class doesn't have a custom __eq__ method -- either case would mean such narrowing would not be sound).

Comment on lines 5322 to 5327
let non_empty_intersection_positive = if intersection.positive(self.db()).is_empty() {
vec![Type::object(self.db())]
} else {
intersection.iter_positive(self.db()).collect()
};
for pos in non_empty_intersection_positive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're looking at this method, its implementation seems kind of wasteful, in that we might end up calling infer_binary_type_comparison twice for every positive member of the intersection. Seems like we could collapse the two loops over positive elements into one loop, with early short-circuit if possible? But that probably doesn't belong in this PR.

@MatthewMckee4 MatthewMckee4 requested a review from carljm April 15, 2025 23:30
@carljm carljm merged commit a2a7b1e into astral-sh:main Apr 16, 2025
23 checks passed
Comment on lines +30 to +34
# these are both boolean-literal types,
# since all possible runtime objects that are created by the literal syntax `1`
# are members of the type `Literal[1]`
reveal_type(not_one is not 1) # revealed: bool
reveal_type(not_one is 1) # revealed: bool
Copy link
Member

@AlexWaygood AlexWaygood Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't seem to reflect the code immediately below it anymore. The comment says "these are both boolean-literal types", but the assertion reveals bool, not Literal[True] or Literal[False]?

bool isn't incorrect here, but it would be great if we could fix this comment to something like

Suggested change
# these are both boolean-literal types,
# since all possible runtime objects that are created by the literal syntax `1`
# are members of the type `Literal[1]`
reveal_type(not_one is not 1) # revealed: bool
reveal_type(not_one is 1) # revealed: bool
# TODO: `bool` is not incorrect, but these would ideally be `Literal[True]` and `Literal[False]`
# respectively, since all possible runtime objects that are created by the literal syntax `1`
# are members of the type `Literal[1]`
reveal_type(not_one is not 1) # revealed: bool
reveal_type(not_one is 1) # revealed: bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -5267 to -5272
(ast::CmpOp::Eq, Some(Type::BooleanLiteral(true))) => {
return Ok(Type::BooleanLiteral(false));
}
(ast::CmpOp::NotEq, Some(Type::BooleanLiteral(false))) => {
return Ok(Type::BooleanLiteral(true));
}
Copy link
Member

@AlexWaygood AlexWaygood Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think maybe LiteralString is the only such type we need to care about, so a Type method might be overkill, we could also just special-case LiteralString here for now.

No, I don't think so. bool can be safely narrowed to Literal[True] using an == True comparison. And in due course we'll want to be able to narrow <instance of enum class> to <Literal member of that enum> using equality as well (assuming all enum members have unique values, and the the enum class doesn't have a custom __eq__ method -- either case would mean such narrowing would not be sound).

sharkdp pushed a commit that referenced this pull request Apr 16, 2025
…17370)

## Summary

Fixes incorrect negated type eq and ne assertions in
infer_binary_intersection_type_comparison

fixes #17360

## Test Plan

Remove and update some now incorrect tests
dcreager added a commit that referenced this pull request Apr 16, 2025
* main: (44 commits)
  [`airflow`] Extend `AIR311` rules (#17422)
  [red-knot] simplify union size limit handling (#17429)
  [`airflow`] Extract `AIR311` from `AIR301` rules (`AIR301`, `AIR311`) (#17310)
  [red-knot] set a size limit on unions of literals (#17419)
  [red-knot] make large-union benchmark slow again (#17418)
  [red-knot] optimize building large unions of literals (#17403)
  [red-knot] Fix comments in type_api.md (#17425)
  [red-knot] Do not assume that `x != 0` if `x` inhabits `~Literal[0]` (#17370)
  [red-knot] make large-union benchmark more challenging (#17416)
  [red-knot] Acknowledge that `T & anything` is assignable to `T` (#17413)
  Update Rust crate clap to v4.5.36 (#17381)
  Raise syntax error when `\` is at end of file (#17409)
  [red-knot] Add regression tests for narrowing constraints cycles (#17408)
  [red-knot] Add some knowledge of `__all__` to `*`-import machinery (#17373)
  Update taiki-e/install-action digest to be7c31b (#17379)
  Update Rust crate mimalloc to v0.1.46 (#17382)
  Update PyO3/maturin-action action to v1.49.1 (#17384)
  Update Rust crate anyhow to v1.0.98 (#17380)
  dependencies: switch from `chrono` to `jiff`
  Update Rust crate bstr to v1.12.0 (#17385)
  ...
@MatthewMckee4 MatthewMckee4 deleted the fix-eq-static-assert branch April 19, 2025 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[red-knot] x != 0 is wrongly inferred as Literal[True] if type of x is any subtype of ~Literal[0]

3 participants