Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute LLVM-agnostic type layouts in rustc. #32939

Merged
merged 5 commits into from
Apr 20, 2016
Merged

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Apr 13, 2016

Layout for monomorphic types, and some polymorphic ones (e.g. &T where T: Sized),
can now be computed by rustc without involving LLVM in the actual process.

This gives rustc the ability to evaluate size_of or align_of, as well as obtain field offsets.
MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly.

Layout computation also comes with a [breaking-change], or two:

  • "data-layout" is now mandatory in custom target specifications, reverting the decision from Upgrade to LLVM's 3.7 release branch #27076.
    This string is needed because it describes endianness, pointer size and alignments for various types.
    We have the first two and we could allow tweaking alignments in target specifications.
    Or we could also extract the data layout from LLVM and feed it back into rustc.
    However, that can vary with the LLVM version, which is fragile and undermines stability.
    For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults.
  • transmute calls are checked in a stricter fashion, which fixes Unexpected tail in unsized_info_ty: usize for ty=process::Core<isize, isize, isize> #32377

To expand on transmute, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any).
If you're affected, my suggestions are:

  • try to use casts (and raw pointer deref) instead of transmutes
  • really try to avoid transmute where possible
  • if you have a structure, try working on individual fields and unpack/repack the structure instead of transmuting it whole, e.g. transmute::<RefCell<Box<T>>, RefCell<*mut T>>(x) doesn't work, but RefCell::new(Box::into_raw(x.into_inner())) does (and Box::into_raw is just a transmute)

@rust-highfive
Copy link
Collaborator

r? @nrc

(rust_highfive has picked a reviewer for you, use r? to override)

@eddyb eddyb force-pushed the layout branch 2 times, most recently from c6564a1 to b9022d2 Compare April 13, 2016 17:55
@eddyb eddyb added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Apr 13, 2016
}
}

// Odd unit types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Layout::Unit? Also Layout::Empty might be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the need for a dedicated unit layout, but Layout::Empty would probably be useful in refining the set of variants, to make sizeof(Result<T, Void>) just sizeof(T).

However, just like invalid value range reusal (i.e. extending non-zero optimizations), it cannot be done before moving trans to use only ty::Layout.

I was hoping we can test the worst-case transmute approach before the beta, but there were complications I had to address.

@eddyb eddyb changed the title [WIP] Compute LLVM-agnostic type layouts in rustc. Compute LLVM-agnostic type layouts in rustc. Apr 13, 2016
}

impl Default for TargetDataLayout {
fn default() -> TargetDataLayout {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb How were these defaults chosen? When do they get used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the defaults match LLVMs defaults for the data layout string. The data layout string isn't required to specify every value, so the defaults are used to fill in the gaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks.

@alexcrichton
Copy link
Member

My main worry on these data layouts was just that no one actually understands them and they're just opaque blobs we copy around without any thought, and then once that caused a bug one day and LLVM had reasonable defaults for everything it seemed "why not just use that?"

If we validate that our data layout is the same as LLVM, however, then this seems fine to me. I'd be curious if LLVM actually changes anything here between releases (wouldn't that be a huge breaking change for C as well?), but doing this ourselves also seems fine to me.

@solson
Copy link
Member

solson commented Apr 14, 2016

This PR will be a big help for implementing Miri more properly, which hopefully will lead to MIR-based CTFE as @eddyb mentioned in the description.

In general, I think it's a good idea to make rustc less dependent on LLVM where it isn't too difficult to do so.

@eddyb
Copy link
Member Author

eddyb commented Apr 14, 2016

@alexcrichton They're documented, though: http://llvm.org/docs/LangRef.html#data-layout.
Also, they used to be much more verbose for redundancy's sake, but they're manageably small now.
The bug you mention was precisely my point: if we get the datalayout from LLVM and there's a bug fix, we don't get notified.

@eddyb
Copy link
Member Author

eddyb commented Apr 14, 2016

I've taken a look at the preliminary list of regressions, and even though almost half of them were build failures (timeouts?), the rest displayed a clear pattern:

  • in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)
  • everywhere else, the errors come from a transmute between two pointer types, some of which are wrapped in newtypes, both potentially-fat with the exact same "tail", which is a type parameter

That is, in the latter case, you might have &T -> Rc<RefCell<T>> with T: ?Sized (not that you could make use of that, but it's an extreme example), and both of those types are potentially-fat pointers with the same metadata, depending only on T, now and in the future.

Assuming we want to allow the pattern of casting between maybe-newtyped pointers to maybe-unsized maybe-wrapped type parameters, how should we go about doing it?

The cleanest solution I have in mind, although not the most principled one, is to have two checks:

  • first, we try to obtain static sizes for the source and destination and compare them
  • if that fails, we attempt to extract potentially-unsized "pointer skeletons" from both types, and compare the pointee "tail" type; this technique also allows transmutes between pointers to ?Sized associated types

While it may be possible to integrate unknown sizes into the regular Layout to get a more principled(?) solution, it would complicate any code working with layouts in monomorphic code.

EDIT: I should mention that if breaking all of those crates is an option, most of them can use pointer casts and dereferences instead - even when a newtype or Option is involved.

@eddyb eddyb added I-needs-decision Issue: In need of a decision. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Apr 14, 2016
@Aatch
Copy link
Contributor

Aatch commented Apr 15, 2016

  • in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)

I think regions should be ignored/substituted here. Makes more general sense, since the regions shouldn't affect the layout of the type.

@nrc
Copy link
Member

nrc commented Apr 15, 2016

r? @nikomatsakis

@rust-highfive rust-highfive assigned nikomatsakis and unassigned nrc Apr 15, 2016
@nikomatsakis
Copy link
Contributor

@eddyb

Assuming we want to allow the pattern of casting between maybe-newtyped pointers to maybe-unsized maybe-wrapped type parameters, how should we go about doing it?

I think we do.

While it may be possible to integrate unknown sizes into the regular Layout to get a more principled(?) solution, it would complicate any code working with layouts in monomorphic code.

I don't have an informed enough opinion yet. I'll try to take a look at your code but it may not happen until Monday, since I'm kind of busy today with other (non-Rust-related) things. But it seems to me like we could probably use a similar trick to what we do today, where for things where T: ?Sized we compute the layout multiple times, under different assumptions (T is definitely sized, T is definitely unsized, etc). But maybe your suggestion turns out nicer -- it just seems like it may wind up duplicating some layout logic. But perhaps not.

@nikomatsakis
Copy link
Contributor

in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)

Oh, and clearly regions shouldn't affect the result, but I would expect this equivalency to fallout in a more general way than strict type equality?

@eddyb
Copy link
Member Author

eddyb commented Apr 15, 2016

@nikomatsakis Just trying out combinations is pretty fragile, if we ever want to add more kinds of fat pointer metadata (or if we make it fully custom).

The "perfect" solution IMO is to enforce static sizes because that would allow a size_of::<T>() == size_of::<U>() bound on transmute in the future, but I am open to non-static, yet strict, solutions.

That's because we could end up simplifying, e.g. size_of::<*mut T>() == size_of::<Option<&T>>() to true based on the "pointer skeleton" rule mentioned above.

@nikomatsakis
Copy link
Contributor

@eddyb

Just trying out combinations is pretty fragile, if we ever want to add more kinds of fat pointer metadata (or if we make it fully custom).

I see. Good point.

pub struct Size {
raw: u64
pub bytes: u64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care, but I imagine the use of privacy here was to help ensure people don't do stupid things like let mut size = ...; size.bytes *= 2;, but rather encourage them to go through these (presumably more careful) APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually initially wanted to use the highest bit to indicate that the rest of the size is the base size of an unsized type, instead of an exact size, but I ended up dealing with unsized types differently.
I could go back to private raw and add more checks for 2^61 overflow if you want to.

@bors
Copy link
Contributor

bors commented Apr 20, 2016

📌 Commit c7d564d has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Apr 20, 2016

⌛ Testing commit c7d564d with merge 542c7d1...

@bors
Copy link
Contributor

bors commented Apr 20, 2016

💔 Test failed - auto-win-msvc-32-opt

@eddyb
Copy link
Member Author

eddyb commented Apr 20, 2016

@bors retry

@bors
Copy link
Contributor

bors commented Apr 20, 2016

⌛ Testing commit c7d564d with merge 6ece144...

bors added a commit that referenced this pull request Apr 20, 2016
Compute LLVM-agnostic type layouts in rustc.

Layout for monomorphic types, and some polymorphic ones (e.g. `&T` where `T: Sized`),
can now be computed by rustc without involving LLVM in the actual process.

This gives rustc the ability to evaluate `size_of` or `align_of`, as well as obtain field offsets.
MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly.

Layout computation also comes with a `[breaking-change]`, or two:
* `"data-layout"` is now mandatory in custom target specifications, reverting the decision from #27076.
This string is needed because it describes endianness, pointer size and alignments for various types.
We have the first two and we could allow tweaking alignments in target specifications.
Or we could also extract the data layout from LLVM and feed it back into rustc.
However, that can vary with the LLVM version, which is fragile and undermines stability.
For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults.
* `transmute` calls are checked in a stricter fashion, which fixes #32377

To expand on `transmute`, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any).
If you're affected, my suggestions are:
* try to use casts (and raw pointer deref) instead of transmutes
* *really* try to avoid `transmute` where possible
* if you have a structure, try working on individual fields and unpack/repack the structure instead of transmuting it whole, e.g. `transmute::<RefCell<Box<T>>, RefCell<*mut T>>(x)` doesn't work, but `RefCell::new(Box::into_raw(x.into_inner()))` does (and `Box::into_raw` is just a `transmute`)
@@ -1410,6 +1410,32 @@ It is not possible to use stability attributes outside of the standard library.
Also, for now, it is not possible to write deprecation messages either.
"##,

E0512: r##"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice! Thanks for adding it! \o/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only moved it.

@bors bors merged commit c7d564d into rust-lang:master Apr 20, 2016
@eddyb eddyb deleted the layout branch April 20, 2016 16:58
@eddyb eddyb added the beta-nominated Nominated for backporting to the compiler in the beta channel. label Apr 20, 2016
@eddyb
Copy link
Member Author

eddyb commented Apr 20, 2016

Nominated for backporting to beta to fix #32377.

}

/// Helper function for normalizing associated types in an inference context.
fn normalize_associated_type<'a, 'tcx>(infcx: &InferCtxt<'a, 'tcx>,
Copy link
Member

@solson solson Apr 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyb In Miri I use rustc::infer::normalize_associated_type. Its source code is slightly different and has a warning about only being callable from trans, but most of it is duplicated. Could they be unified somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how I call ty.layout(&infcx) from trans. The effect is the same in that case, it's only different elsewhere (intrinsicck).

hannobraun added a commit to hannobraun/embedded that referenced this pull request Apr 26, 2016
The data layout had become optional at some point. Some time after that,
it started causing a compiler error, so I removed it. From the Rust
side, those changes are documented in the following issue:
rust-lang/rust#31367

This is the pull request that made the data layout non-optional again,
is this one:
rust-lang/rust#32939

I took the layout I added here from the Rust compiler code. The various
built-in ARM targets seem to have mostly[1] the same target layout, which
makes sense, as the target layout describes mostly hardware
characteristics that shouldn't change between operation systems.

The layout I copied is from the `arm-unknown-linux-gnueabi` target,
here:
https://github.com/rust-lang/rust/blob/253b7c1e1a919a6b722c29a04241d6f08ff8c79a/src/librustc_back/target/arm_unknown_linux_gnueabi.rs#L19

I double-checked with the LLVM documentation on data layouts, and
everything seems legit, as far as I can tell. I don't completely
understand everything about it, though, so I can't give a 100%
guarantee. The LLVM documentation on data layouts is located here:
http://llvm.org/docs/LangRef.html#data-layout

In any case, the program compiles and works fine with the new layout, so
I'm assuming it's correct.

[1] "Mostly", because the one exception I see is the name mangline
    option ("m:"), which I set to "e", meaning "ELF". This doesn't seem
    terribly relevant. The only case, that I can think of, that might
    make it relevant is if we had C code calling into our Rust code, but
    then we would mark the called Rust functions as "#[no_mangle]"
    anyway.
@pnkfelix
Copy link
Member

after a week of exercising in the nightly release, the compiler team has decided to accept this for beta, largely because it fixes a real regression (#32377), and that outweighed the relative risk of backporting such a largish change to beta.

@pnkfelix pnkfelix added the beta-accepted Accepted for backporting to the compiler in the beta channel. label Apr 28, 2016
@brson brson removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label May 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta-accepted Accepted for backporting to the compiler in the beta channel.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected tail in unsized_info_ty: usize for ty=process::Core<isize, isize, isize>