Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dyn Trait comparison should not include the vtable pointer #106447

Closed
Amanieu opened this issue Jan 4, 2023 · 32 comments
Closed

dyn Trait comparison should not include the vtable pointer #106447

Amanieu opened this issue Jan 4, 2023 · 32 comments
Labels
disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@Amanieu
Copy link
Member

Amanieu commented Jan 4, 2023

This was previously raised in #103763 (comment). Currently, comparing a pair of *const dyn Trait raw pointers for equality will check that both the address and vtable are equal.

This is problematic because the vtable of a type is not guaranteed to be unique for each type: different types can share the same vtable and the same type can have its vtable instantiated multiple times.

To illustrate this issue, consider the following program which prints true in release mode and false in debug mode because LLVM is merging the two identical vtables (playground).

struct A;
struct B;

trait T {}
impl T for A {}
impl T for B {}

fn main() {
	let ab = (A, B);
    let a = &ab.0 as *const dyn T;
    let b = &ab.1 as *const dyn T;
    println!("{}", a == b);
    println!("{}", a as *const u8 == b as *const u8);
}

It is also possible to achieve the reverse effect where two dyn Trait pointers to the same value (with the same base type) compare unequal because a separate vtable was instantiated with each pointer (e.g. due to codegen units or separate crates).

In conclusion, vtables are completely useless as a source of truth for pointer equality, so we should only consider the pointer address when comparing dyn Trait raw pointers. We currently document this as a footgun (#103567), but it would be better to eliminate this footgun entirely from the language.

@Amanieu Amanieu added T-lang Relevant to the language team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 31, 2023
@scottmcm scottmcm added the I-lang-nominated Nominated for discussion during a lang team meeting. label Apr 6, 2023
@nbdd0121
Copy link
Contributor

nbdd0121 commented Apr 6, 2023

This was already discussed in lang triage meeting on 2022-11-08 (notes: https://hackmd.io/@rust-lang-team/SyDavQuri) when Arc::ptr_eq was discussed.

@scottmcm
Copy link
Member

In conclusion, vtables are completely useless as a source of truth for pointer equality

I agree with this in the narrow, since for equality it's not useful.

But for inequality, one can rely on it today, no? For example,

#[derive(Debug)]
#[repr(transparent)]
struct Foo(u8);

let f = Foo(1);
let p1: *const dyn Debug = &f;
let p2: *const dyn Debug = &f.0;

assert_ne!(p1, p2); // always passes today

So is it possible that people could be relying on this side of it already?

@Amanieu
Copy link
Member Author

Amanieu commented Apr 21, 2023

That's only true because u8 and Foo have different Debug impls. However it is perfectly valid for the compiler to merge 2 vtables if they are identical:

trait T {}
impl T for u8 {}
impl T for Foo {}

#[repr(transparent)]
struct Foo(u8);

let f = Foo(1);
let p1: *const dyn T = &f;
let p2: *const dyn T = &f.0;

assert_ne!(p1, p2); // FAILS IN RELEASE MODE, passes in debug mode

@nbdd0121
Copy link
Contributor

We could change that by making vtable not unnamed_addr.

@Amanieu
Copy link
Member Author

Amanieu commented Apr 21, 2023

True, but that doesn't solve the issue where multiple instances of the same vtable are emitted in different codegen units. I feel that overall this is a footgun in the language which people are already having to work around by explicitly casting to *mut u8.

@nikomatsakis
Copy link
Contributor

Discussed in @rust-lang/lang today. Conclusions:

  • The proposal as stated (just compare data pointers) means that x == y being true no longer means "these values are semantically equivalent". We feel that would be a breaking change, in that there could be sound code that checks whether x == y that would be broken.
  • We agree that the existing semantics of x == y (basically: if true, it means equivalent, but it may fail "randomly") are not desirable.
  • We think the desired semantics are that the data pointer is the same and the vtables are for the same type, but we understand that making vtables unique may not be viable across dynamic linking boundaries and so forth.
  • Are there other ways forward here? Do vtables always contain type-id, for example?

@Amanieu
Copy link
Member Author

Amanieu commented May 3, 2023

x == y can return true even if the values are not semantically equivalent. See my example in #106447 (comment) where the values compare equal despite having different types.

So today x == y provides no information: it can have false positives if the vtables are merged, and false negatives if multiple instances of the same vtable are emitted.

@nbdd0121
Copy link
Contributor

nbdd0121 commented May 3, 2023

Well, x == y does mean that they will behave equivalently. The type might be different, but the behaviour is since the code is the same.

@tmandry
Copy link
Member

tmandry commented May 3, 2023

The type might be different, but the behaviour is since the code is the same.

I'm not sure that's true at the semantic level. The behavior is the same given the same bit patterns, but (if I understand correctly) hypothetically you could have two types with the same bit patterns that have differing meanings, and the same machine code would therefore have differing semantic meanings. Plus, we are now exposing implementation details of an impl at the level of language semantics.

For trait objects I can't come up with a realistic example other than what @Amanieu showed above, though.

@tmandry
Copy link
Member

tmandry commented May 3, 2023

I would add that merging of functions is not a purely hypothetical concern. For instance, I've seen it break backtraces where a function that calls abort() gets merged with a function in a completely unrelated file (another part of libc, or something written in another language) that does the same thing. So you get something like this:

Abort
  1: abort in libc.so
  2: some_random_function in some_random_lib.so
  3: function_that_never_calls_the_above in your_code
  4: something_else in your_code
  ...
  N: main

This can be super confusing, especially when combined with inlining, when lacking line numbers (or they're unreliable) in a backtrace, and when (3) is a large function! And while it doesn't threaten language semantics to break backtraces (they're always best effort), it's an example of a situation where reasoning only about a function's behavior is not sufficient to recover the semantic meaning.

@nikomatsakis
Copy link
Contributor

So today x == y provides no information: it can have false positives if the vtables are merged, and false negatives if multiple instances of the same vtable are emitted.

Fair, but I think it is close to the sense that I specified. It seems like we could prevent this merging by including e.g. type-id in the vtable.

@Amanieu
Copy link
Member Author

Amanieu commented May 11, 2023

I think that x == y should be entirely deterministic (assuming both x and y point to live objects). This means that it should always produce a correct result, no matter what optimization level is used or how the compiler splits codegen units.

Practically speaking, this leaves 2 options:

  • Only compare the data pointer, not the vtable. This is what I am proposing here.
  • Include type-ids in the vtable and do type comparison by comparing the type-ids instead of the vtable pointer. This comes at a cost in vtable size though.

Either option is fine and would make dyn trait comparison have actually sane results.

@apiraino apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 16, 2023
@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 16, 2023

Crazy idea: deprecate pointer comparison in Rust 2024 in favor of some methods (cc @scottmcm)

@pnkfelix
Copy link
Member

pnkfelix commented May 16, 2023

I'm trying to understand the scope of the problem that we face here.

  1. There is the immediate issue of "what is the semantics of the expression a == b, occurring immediately in source code, where a and b are both fat native pointers (*const dyn Trait). There's various ways to address this, including "add new methods to directly express your intent", which to me makes this variant of the problem not terribly interesting to debate.

  2. But there's the broader question of "a == b" can result from generated code, namely from #[derive(PartialEq, Eq)]. One cannot readily just "add new methods" to deal with that.

    • but we, as the Rust project, can provide better tools for that case. Namely, we could provide a lint that warns people if they have a derive(PartialEq) on a struct that contains a *const or *mut dyn Trait, saying "you should provide a wrapper around this that expresses your actual intent for how the equality should go.
  3. My own gut feeling is that we should follow the principle of least-surprise and include the type-id in the comparison by default. This will presumably add the cost of putting the type-id into the vtables for all *dyn Trait, unless we also do some kind of global analysis to deduce conservative bounds on where the values flow to. (That, and/or add a compiler flag that just doesn't include the type-id in the vtable nor in the comparison, but then we definitely need a lint warning people about a == b, and about the inclusion of *dyn Trait fields in structs that derive PartialEq.)

@pnkfelix
Copy link
Member

pnkfelix commented May 16, 2023

@Amanieu wrote (and @tmandry commented on):

That's only true because u8 and Foo have different Debug impls. However it is perfectly valid for the compiler to merge 2 vtables if they are identical:

trait T {}
impl T for u8 {}
impl T for Foo {}

#[repr(transparent)]
struct Foo(u8);

let f = Foo(1);
let p1: *const dyn T = &f;
let p2: *const dyn T = &f.0;

assert_ne!(p1, p2); // FAILS IN RELEASE MODE, passes in debug mode

How is it ever observably a problem to treat two such cases as equal pointers? Given that the two vtables themselves are equal-enough to be merged, I would think that those two pointers must be contextually-equivalent to each other (in the sense that there is no way to construct a context that distinguishes them), and therefore reporting that they are equivalent is semantically valid.


In other words, can you supply me an example of a context that can have differing behavior for a vs b, but where a and b have compared as equal solely due the aforementioned vtable merging? (Notably, if I understand correctly, they each may have differing concrete types, but the dyn Trait that they are exposing to the outside world is the same, and thus the set of available methods is both the same set of method-names and each method-name maps to the same concrete implementation, which means IIUC that the difference in concrete type cannot be observed?)

@Amanieu
Copy link
Member Author

Amanieu commented May 16, 2023

In other words, can you supply me an example of a context that can have differing behavior for a vs b, but where a and b have compared as equal solely due the aforementioned vtable merging? (Notably, if I understand correctly, they each may have differing concrete types, but the dyn Trait that they are exposing to the outside world is the same, and thus the set of available methods is both the same set of method-names and each method-name maps to the same concrete implementation, which means IIUC that the difference in concrete type cannot be observed?)

The difference can be observed in that release and debug builds provide different results, which is usually a sign of a compiler bug and definitely very confusing for users who typically run their tests in debug mode.

Here's an example with actual methods in the trait: https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=892da2cc585f50f1f33dffe3488dd842

The vtable merging occurs if LLVM decides that all vtable methods have equivalent LLVM IR. This only works in recent LLVM versions with opaque pointer types. I don't think the Rust language should let the definition of what is "equivalent" be decided by a LLVM IR optimization.

@scottmcm
Copy link
Member

Do we have a survey of when people use == between raw pointers to dyn Trait?

I don't think I have a good sense of when people do that, and what they're attempting to accomplish when they do so.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 23, 2023

@rfcbot fcp close

We discussed this issue in the lang team meeting again. Meeting consensus was that while we don't know the best way to resolve the behavior of a == b (status quo leaks more impl details than we would like, for sure), we also don't like the actual proposal on the table. As @tmandry put it, "I don't think that a dyn Trait representing a tuple and a dyn trait representing the first element of the tuple should be equal".

Presuming this FCP completes, we could open an issue describing the problem today (but not proposing this solution) and linking to it.

@rfcbot
Copy link

rfcbot commented May 23, 2023

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels May 23, 2023
@rfcbot
Copy link

rfcbot commented May 23, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label May 23, 2023
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Jun 2, 2023
@rfcbot
Copy link

rfcbot commented Jun 2, 2023

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Jun 15, 2023
@scottmcm
Copy link
Member

scottmcm commented Sep 30, 2023

As a small incomplete step, I've proposed ptr::addr_eq to at least have a specific way to intentionally just compare the address, not any metadata, in rust-lang/libs-team#274

EDIT: This was added to nightly in #116325

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 2, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 4, 2023
Use `addr_eq` in `{Arc,Rc}::ptr_eq`

Since it's made for stuff like this (see rust-lang#106447)
@Amanieu
Copy link
Member Author

Amanieu commented Oct 18, 2023

We did discuss about potentially deprecating dyn Trait pointer comparison during the meeting, but it's hard to so because we can't deprecate trait impl.

I would like to propose the following plan:

  • Change the semantics of dyn trait pointer comparison to only look at the address. This is necessary to ensure that such pointer comparisons produce consistent results rather than depending on codegen unit splitting and how LLVM merges vtables.
  • Deprecate comparisons (Eq/Ord) on dyn trait pointers by adding a lint which encourages people to case the pointer to *mut u8 first. This makes the intent of such comparisons much clearer.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Nov 8, 2023

We discussed this in the rust-lang/lang meeting.

Meeting consensus was that we are

  • in favor of the lint
  • but are disinclined to change the behavior of ==.

Overall we want == to mean "if true, you cannot observe a difference whichever value you use". Currently this is what it means but it sometimes returns spurious false values. Under the proposed change, it would no longer mean that.

We do agree that it would be better to avoid == comparisons because of those spurious false values, so yes in favor of the lint.

Also, the bar is high for this change because it would be a subtle breaking change to existing stabilized behavior and could cause breakage (if someone was correctly relying on the invariant above). If the new rules were strictly better, that might be fine, but the new rules don't seem strictly better.

@nikomatsakis
Copy link
Contributor

Per the earlier FCP, I am going to close this issue. I appreciate @Amanieu your doggedness in trying to resolve this behavior. =)

@traviscross traviscross removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Nov 8, 2023
@nikomatsakis
Copy link
Contributor

@scottmcm is opening a new issue to track the lint.

@nikomatsakis
Copy link
Contributor

On a personal note, I would much prefer to fix the existing behavior by adding more info into the vtable.

@scottmcm
Copy link
Member

scottmcm commented Nov 8, 2023

Issue for adding a lint: #117717

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 11, 2023
…, r=davidtwco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang#106447 decided in rust-lang#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang#117717
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Dec 12, 2023
…twco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang/rust#106447 decided in rust-lang/rust#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang/rust#117717
bjorn3 pushed a commit to rust-lang/rustc_codegen_cranelift that referenced this issue Dec 13, 2023
…twco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang/rust#106447 decided in rust-lang/rust#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang/rust#117717
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue Dec 16, 2023
…twco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang/rust#106447 decided in rust-lang/rust#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang/rust#117717
liketechnik added a commit to cmjava2023/vm that referenced this issue Dec 19, 2023
We store class instances as Rc<dyn ClassInstanc> values.
The pointers to the inner values consist of
1. a pointer to the vtable of the trait
2. a pointer to the data itself
Due to rust internals, the vtable pointer can be different
for Rc<dyn ClassInstance> values that point to the same *data*,
effectively making any comparison involing the vtable pointer moot.
This uses a function rust just introduced to compare such pointers.
(cf. rust-lang/rust#106447,
rust-lang/rust#117717,
rust-lang/rust#116325)
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
…twco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang/rust#106447 decided in rust-lang/rust#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang/rust#117717
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
…twco

Add lint against ambiguous wide pointer comparisons

This PR is the resolution of rust-lang/rust#106447 decided in rust-lang/rust#117717 by T-lang.

## `ambiguous_wide_pointer_comparisons`

*warn-by-default*

The `ambiguous_wide_pointer_comparisons` lint checks comparison of `*const/*mut ?Sized` as the operands.

### Example

```rust
let ab = (A, B);
let a = &ab.0 as *const dyn T;
let b = &ab.1 as *const dyn T;

let _ = a == b;
```

### Explanation

The comparison includes metadata which may not be expected.

-------

This PR also drops `clippy::vtable_address_comparisons` which is superseded by this one.

~~One thing: is the current naming right? `invalid` seems a bit too much.~~

Fixes rust-lang/rust#117717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants