Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTFE keeps values in padding, breaking structural equality of const generic args. #70889

Closed
eddyb opened this issue Apr 7, 2020 · 15 comments
Closed
Labels
A-const-generics Area: const generics (parameters and arguments) A-valtree Area: Value trees or fixed by value trees C-bug Category: This is a bug. F-const_generics `#![feature(const_generics)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Apr 7, 2020

This example should do one of these three things, but it doesn't (playground):

  • compile successfully (PADDED == FILLED after all, field-wise)
  • error on FILLED's definition (due to evaluated constant not fitting type)
  • error when FILLED is used as an argument to Phantom
#![feature(const_generics)]

union Transmute<T: Copy, U: Copy> {
    from: T,
    to: U,
}

const PADDED: (u8, u16) = (0, 0);
const FILLED: (u8, u16) = unsafe { Transmute { from: [0u8; 4] }.to };

struct Phantom<const X: (u8, u16)>;

fn test(x: Phantom<PADDED>) -> Phantom<FILLED> {
    x
}

Instead, Phantom<PADDED> and Phantom<FILLED> are considered different types.

If we want to make it compile, we could normalize FILLED to also have that padding byte marked as "undef", but I'm not sure if we can do this normalization if the values were behind a reference.

So we might want to error if e.g. &[0u8; 4] was transmuted to &(u8, u16), because normalizing it would change what runtime code would see. Again, we have two places where we can error.


If we want to error without causing no breaking changes, either always or just indirect case, we can do so by introducing a second, stricter, validity check in ty::Const Well-Formed rules (which we should be able to post-#70107).
That check should enforce that the value is a tree of constructors (&_ would be treated as a constructor) with integer leaves (no relocations, i.e. no raw/fn pointers), where any user ADTs are structurally-matchable, and all padding bytes (not occupied by leaves) are "undef".

cc @rust-lang/wg-const-eval @varkor @yodaldevoid

@eddyb eddyb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. A-const-generics Area: const generics (parameters and arguments) F-const_generics `#![feature(const_generics)]` labels Apr 7, 2020
@pnkfelix pnkfelix added the requires-nightly This issue requires a nightly compiler in some way. label Apr 7, 2020
@eddyb
Copy link
Member Author

eddyb commented Apr 7, 2020

Actually, I think the error is because of a different bug, the allocation values in error the look identical.

I only noticed that because I expected this to work, but it errors (playground):

trait Foo {}
impl Foo for Phantom<PADDED> {}
impl Foo for Phantom<FILLED> {}

And the indirect version errors in an unexpected way, probably because different parts of the compiler handle by-ref ty::Consts differently.

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Instead, Phantom<PADDED> and Phantom<FILLED> are considered different types.

TBH, at least from a pure const-eval perspective, this seems fine to me.
Actually formalizing Rust's "value domain" is a huge undertaking, and even then (ignoring const generics) I am not sure if we want to use that in const-eval.

So I think it would be acceptable if const generics used "identity of the underlying storage". But maybe I just want to avoid implementation complexity. :P

Though, on the other hand... does this mean that if we ever change the Miri engine to no longer preserve padding on copies, that would be a breaking change with const-generics? That seems quite problematic, then.

@eddyb
Copy link
Member Author

eddyb commented Apr 7, 2020

TBH, at least from a pure const-eval perspective, this seems fine to me.

It's not, v0 mangling will require a pure tree of integer leaves to function. Encoding the underlying representation is not an option (a hash thereof might be, but it's a terrible compromise).

Actually formalizing Rust's "value domain" is a huge undertaking

Not if you treat it as a constructor tree with integral leaves. I agree that anything else is hard, but const generics must fundamentally use purely structural values for the typesystem to be able to embed them.

@RalfJung
Copy link
Member

RalfJung commented Apr 7, 2020

Not if you treat it as a constructor tree with integral leaves.

Hm... I suppose things are easier because we only have to compare values of equal types, so we don't actually need the full complexity of the "value domain".

But still, you keep talking about "trees" where if you turn the memory links into edges, you actually get a general graph. What notion would you use for graph equivalence? Two (&i32, &i32) where in one both have the same address but in the other we have two distinct shared references -- do you consider them equal?

And what about floating point leaves? NaNs are not usually equal to themselves even.

@eddyb
Copy link
Member Author

eddyb commented Apr 7, 2020

But still, you keep talking about "trees" where if you turn the memory links into edges, you actually get a general graph.

It's a DAG (enforced by miri atm IIRC, or at least I remember @oli-obk implying that it is enforced), where sharing is an unobservable implementation optimization (same as with Ty). Therefore: tree.

Two (&i32, &i32) where in one both have the same address but in the other we have two distinct shared references -- do you consider them equal?

Of course, const generics are at least as restrictive as pattern-matching, with its structural == requirement. There's no concept of "address", and interning must be effective enough to normalize such differences (except for references to statics but those can be disallowed entirely in values passed to const generics).

And what about floating point leaves? NaNs are not usually equal to themselves even.

Floating-point values are not structurally ==, nor Eq.

It's funny you should mention this, given the history of type-level values and floating-point.
IIRC, idris-lang/Idris-dev#2609 was discovered after the subject of floating-point values in future Rust const-generics was brought up in #rust-offtopic - also see rust-lang/rfcs#1038 (comment).

@RalfJung
Copy link
Member

RalfJung commented Apr 8, 2020

It's a DAG (enforced by miri atm IIRC, or at least I remember @oli-obk implying that it is enforced), where sharing is an unobservable implementation optimization (same as with Ty). Therefore: tree.

Once const-code can mutate things, it's not a DAG any more but a general tree.
Also, why is sharing unobservable, assuming CTFE allows raw ptr comparisons eventually?

Of course, const generics are at least as restrictive as pattern-matching, with its structural == requirement.

Oh I see, so I cannot use just any type in const generics, but only types that have certain properties? Okay that's a totally different game, then the "value domain" is even more restricted of course.

I enforcing via some validity checks that padding is "undef" is the wrong approach though. It seems much more sensible to me to have a smarter comparison function that takes the type of the to-be-compared values into account -- basically, an interpreter-level implementation of == for supported types.

@eddyb
Copy link
Member Author

eddyb commented Apr 8, 2020

Also, why is sharing unobservable, assuming CTFE allows raw ptr comparisons eventually?

Sorry, unobservable by type equality. The same way you can't tell from within the language that this always results in a non-tree DAG of interned types, for any type passed to it:

type Dup<X> = (X, X);

e.g.:

Dup<u8> = (u8, u8)
           \   /
             u8

It seems much more sensible to me to have a smarter comparison function that takes the type of the to-be-compared values into account

The typesystem does not admit such an approach for normalized types, a type can either:

  • have "unnormalized" components (like ty::Projection or ty::ConstKind::Unevaluated)
    • normalization should fully replace these components, when monomorphic
    • these components must be cheaply detectable
  • be equal to itself, and only to itself, in its generic "domain" (and modulo lifetime subtyping)
    • when monomorphic, this is even stronger, as the entire type is known

Once you put a constant value in a type, all of the same rules apply to it as well.
If you have some way to create structurally-equal but distinct constants, you can do one of:

  1. ban them in types (via WF checks)
  2. normalize the difference away before they ever reach types
  3. track them as unnormalized and have a normalization rule
    • this is 2. but deferred and probably the worst of all worlds, implementation-wise

@RalfJung
Copy link
Member

RalfJung commented Apr 9, 2020

Sorry, unobservable by type equality.

Okay, so type equality has its own notion of equivalence, distinct from what CTFE can observe. Are there some crucial properties that relate the two, like one being strictly more fine-grained than the other or so? I suppose CTFE-observable-equal consts should always be type-equal (because why not), but I am not sure if that is crucial for soundness.

You seem to say that if CONST1 == CONST2 (in the sense of looking up the PartialEq instance and evaluating that code) then they must be type-equal. That seems sensible but I do not understand why it is crucial for soundness. Is this all written down somewhere?

If equality of references is defined by equality of the referent, then indeed a tree makes sense. I suppose raw pointers will not be permitted?

The typesystem does not admit such an approach for normalized types

So you are saying, for the type system we need to define equality by normalization, not via a comparison function. I can see how that helps with interning etc, but it is quite the complication here.

However, I think I'd be opposed to banning non-undef padding in "normal" CTFE results. That's a huge foot-gun, it would break consts like MaybeUninit::<(u8, u16)>::zeroed().assume_init(). And if we ban it just for consts used in types, that's still a big foot-gun as it means a random CONST I find somewhere could work fine for normal use until I try to put it into a type, and then the compiler complains about it having non-undef padding.

I suppose we could apply "typed copy" rules on the result of CTFE, and "reset" the padding on the final CTFE value, e.g. as part of interning. But this will be really hard to do for "nested" allocations where we might not have reliable type information, so that's not great either.

In that light, transforming the constant into some kind of tree representation designed specifically for this purpose makes a lot of sense indeed. Hopefully this can share at least something with ty::Const so we can avoid having three representations of constants in the compiler...

@eddyb
Copy link
Member Author

eddyb commented Apr 9, 2020

You seem to say that if CONST1 == CONST2 (in the sense of looking up the PartialEq instance and evaluating that code) then they must be type-equal. That seems sensible but I do not understand why it is crucial for soundness. Is this all written down somewhere?

The reason we require "structurally matchable" ADTs (i.e. with #[derive(PartialEq, Eq)], not custom) is mostly for consistency (with regular == and patterns), because anything embedded into the typesystem is closer to a bitwise comparison (except references, which recurse), and if we ever extend this in the future, it could require e.g. const impl PartialEq (although this seems unlikely).

What's crucial about soundness is how this is all implemented in the compiler.
Oh and we need to be able to hash types (for TypeId and symbol names - the new mangling scheme is even stricter, but worst case it could fall back to hashes), in a way that, barring hash collisions, == on Ty is equivalent with first applying Hash/StableHash (the latter having to remain stable between compilations) and then comparing.

So variation in undef bytes might be allowable (the same way bitcast NaN floats would be if we went the "bitwise comparison even when it doesn't match ==).

I suppose the thing that would really not be is addresses (of anything other than a static), because there's no way to "purify" them, other than interning them by-value (this might also mean non-const-generic uses of const-evaluation may be incrementally-unsound without said interning).

Overall, I'm not sure what the best path forward is, but I'd rather start out with very tight restrictions as far as const generics are concerned, and try to go from there.

"Trees of ADTs/references with integer leaves" is the "obviously pure and sound" subset, everything else is a minefield that has to be carefully navigated, and I'd prefer not to let in haphazard implementation-dictated behavior.

I suppose we could apply "typed copy" rules on the result of CTFE, and "reset" the padding on the final CTFE value, e.g. as part of interning.

We might be doing that already, see #70889 (comment).

I'm not sure what's going on, maybe ConstValues can compare equal but result in distinct ty::Consts?

@RalfJung
Copy link
Member

We might be doing that already,

I highly doubt it, that would be a non-trivial piece of code.
Miri might want something like that, too, but for every copy.

Overall, I'm not sure what the best path forward is, but I'd rather start out with very tight restrictions as far as const generics are concerned, and try to go from there.

Fully agreed.

I'm not sure what's going on, maybe ConstValues can compare equal but result in distinct ty::Consts?

Could it be a ByRef vs. "immediate representation" (the other ConstValue variants) thing? I think what we do is normalizing in the sense that constants with the same type should result in the same representation, but I know that at least when I reviewed that code I didn't know that would be important one day.

@RalfJung
Copy link
Member

RalfJung commented May 5, 2020

@eddyb I recently learned that raw pointers and function pointers are also allowed as constants in patterns -- and thus maybe also in const generics? For those, your integer-based "value tree" doesn't work as their equality test compares pointers, which are nothing like integers.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 28, 2020

Const generics now statically exclude function pointers and raw pointers. Not sure how that holds up if they are fields of aggregates, but once we move to a value tree format, we won't see any of this anymore anyway.

@eddyb
Copy link
Member Author

eddyb commented Jul 16, 2020

@RalfJung Yes, and they're "nothing like integers" in ways that are not suitable for embedding into a typesystem, IMO.

I would be fine with raw pointers containing integer addresses, and maybe even those pointing to statics (which have unique addresses, so we can use the identity of the static as a "nominal" value), but anything else I'd avoid.

@RalfJung
Copy link
Member

For now I propose we just do not allow raw pointers and function pointers when converting to the valtree (which will be a type-directed translation I assume).

@oli-obk oli-obk added the A-valtree Area: Value trees or fixed by value trees label Mar 15, 2021
@lcnr
Copy link
Contributor

lcnr commented Mar 30, 2022

#70889 (comment)

Also, why is sharing unobservable, assuming CTFE allows raw ptr comparisons eventually?

This is the only thing not currently tracked in #29 or the valtree design. Opened a topic for this on zulip and am going to close this issue.

@lcnr lcnr closed this as completed Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Area: const generics (parameters and arguments) A-valtree Area: Value trees or fixed by value trees C-bug Category: This is a bug. F-const_generics `#![feature(const_generics)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants