Compute LLVM-agnostic type layouts in rustc. #32939

eddyb · 2016-04-13T16:49:36Z

Layout for monomorphic types, and some polymorphic ones (e.g. &T where T: Sized),
can now be computed by rustc without involving LLVM in the actual process.

This gives rustc the ability to evaluate size_of or align_of, as well as obtain field offsets.
MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly.

Layout computation also comes with a [breaking-change], or two:

"data-layout" is now mandatory in custom target specifications, reverting the decision from Upgrade to LLVM's 3.7 release branch #27076.
This string is needed because it describes endianness, pointer size and alignments for various types.
We have the first two and we could allow tweaking alignments in target specifications.
Or we could also extract the data layout from LLVM and feed it back into rustc.
However, that can vary with the LLVM version, which is fragile and undermines stability.
For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults.
transmute calls are checked in a stricter fashion, which fixes Unexpected tail in unsized_info_ty: usize for ty=process::Core<isize, isize, isize> #32377

To expand on transmute, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any).
If you're affected, my suggestions are:

try to use casts (and raw pointer deref) instead of transmutes
really try to avoid transmute where possible
if you have a structure, try working on individual fields and unpack/repack the structure instead of transmuting it whole, e.g. transmute::<RefCell<Box<T>>, RefCell<*mut T>>(x) doesn't work, but RefCell::new(Box::into_raw(x.into_inner())) does (and Box::into_raw is just a transmute)

rust-highfive · 2016-04-13T16:49:48Z

r? @nrc

(rust_highfive has picked a reviewer for you, use r? to override)

arielb1 · 2016-04-13T18:48:08Z

src/librustc/ty/layout.rs

+                }
+            }
+
+            // Odd unit types.


No Layout::Unit? Also Layout::Empty might be nice.

I don't really see the need for a dedicated unit layout, but Layout::Empty would probably be useful in refining the set of variants, to make sizeof(Result<T, Void>) just sizeof(T).

However, just like invalid value range reusal (i.e. extending non-zero optimizations), it cannot be done before moving trans to use only ty::Layout.

I was hoping we can test the worst-case transmute approach before the beta, but there were complications I had to address.

solson · 2016-04-13T22:36:12Z

src/librustc/ty/layout.rs

+}
+
+impl Default for TargetDataLayout {
+    fn default() -> TargetDataLayout {


@eddyb How were these defaults chosen? When do they get used?

They're LLVM's defaults: http://llvm.org/docs/LangRef.html#data-layout.

Looks like the defaults match LLVMs defaults for the data layout string. The data layout string isn't required to specify every value, so the defaults are used to fill in the gaps.

That makes sense, thanks.

alexcrichton · 2016-04-13T23:32:57Z

My main worry on these data layouts was just that no one actually understands them and they're just opaque blobs we copy around without any thought, and then once that caused a bug one day and LLVM had reasonable defaults for everything it seemed "why not just use that?"

If we validate that our data layout is the same as LLVM, however, then this seems fine to me. I'd be curious if LLVM actually changes anything here between releases (wouldn't that be a huge breaking change for C as well?), but doing this ourselves also seems fine to me.

solson · 2016-04-14T00:43:23Z

This PR will be a big help for implementing Miri more properly, which hopefully will lead to MIR-based CTFE as @eddyb mentioned in the description.

In general, I think it's a good idea to make rustc less dependent on LLVM where it isn't too difficult to do so.

eddyb · 2016-04-14T04:13:43Z

@alexcrichton They're documented, though: http://llvm.org/docs/LangRef.html#data-layout.
Also, they used to be much more verbose for redundancy's sake, but they're manageably small now.
The bug you mention was precisely my point: if we get the datalayout from LLVM and there's a bug fix, we don't get notified.

eddyb · 2016-04-14T17:12:09Z

I've taken a look at the preliminary list of regressions, and even though almost half of them were build failures (timeouts?), the rest displayed a clear pattern:

in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)
everywhere else, the errors come from a transmute between two pointer types, some of which are wrapped in newtypes, both potentially-fat with the exact same "tail", which is a type parameter

That is, in the latter case, you might have &T -> Rc<RefCell<T>> with T: ?Sized (not that you could make use of that, but it's an extreme example), and both of those types are potentially-fat pointers with the same metadata, depending only on T, now and in the future.

Assuming we want to allow the pattern of casting between maybe-newtyped pointers to maybe-unsized maybe-wrapped type parameters, how should we go about doing it?

The cleanest solution I have in mind, although not the most principled one, is to have two checks:

first, we try to obtain static sizes for the source and destination and compare them
if that fails, we attempt to extract potentially-unsized "pointer skeletons" from both types, and compare the pointee "tail" type; this technique also allows transmutes between pointers to ?Sized associated types

While it may be possible to integrate unknown sizes into the regular Layout to get a more principled(?) solution, it would complicate any code working with layouts in monomorphic code.

EDIT: I should mention that if breaking all of those crates is an option, most of them can use pointer casts and dereferences instead - even when a newtype or Option is involved.

Aatch · 2016-04-15T00:28:39Z

in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)

I think regions should be ignored/substituted here. Makes more general sense, since the regions shouldn't affect the layout of the type.

nrc · 2016-04-15T02:08:53Z

r? @nikomatsakis

nikomatsakis · 2016-04-15T09:44:36Z

@eddyb

Assuming we want to allow the pattern of casting between maybe-newtyped pointers to maybe-unsized maybe-wrapped type parameters, how should we go about doing it?

I think we do.

While it may be possible to integrate unknown sizes into the regular Layout to get a more principled(?) solution, it would complicate any code working with layouts in monomorphic code.

I don't have an informed enough opinion yet. I'll try to take a look at your code but it may not happen until Monday, since I'm kind of busy today with other (non-Rust-related) things. But it seems to me like we could probably use a similar trick to what we do today, where for things where T: ?Sized we compute the layout multiple times, under different assumptions (T is definitely sized, T is definitely unsized, etc). But maybe your suggestion turns out nicer -- it just seems like it may wind up duplicating some layout logic. But perhaps not.

nikomatsakis · 2016-04-15T09:45:46Z

in postgres-binary-copy-0.2.1 Option<&'a T> is transmuted to Option<&'b T> - we could try to accommodate for this by comparing the two types, with regions substituted away - or handle Option<&T> like *const T (see below)

Oh, and clearly regions shouldn't affect the result, but I would expect this equivalency to fallout in a more general way than strict type equality?

eddyb · 2016-04-15T09:50:19Z

@nikomatsakis Just trying out combinations is pretty fragile, if we ever want to add more kinds of fat pointer metadata (or if we make it fully custom).

The "perfect" solution IMO is to enforce static sizes because that would allow a size_of::<T>() == size_of::<U>() bound on transmute in the future, but I am open to non-static, yet strict, solutions.

That's because we could end up simplifying, e.g. size_of::<*mut T>() == size_of::<Option<&T>>() to true based on the "pointer skeleton" rule mentioned above.

nikomatsakis · 2016-04-15T20:19:34Z

@eddyb

Just trying out combinations is pretty fragile, if we ever want to add more kinds of fat pointer metadata (or if we make it fully custom).

I see. Good point.

nikomatsakis · 2016-04-15T20:22:52Z

src/librustc/ty/layout.rs

 pub struct Size {
-    raw: u64
+    pub bytes: u64


I don't really care, but I imagine the use of privacy here was to help ensure people don't do stupid things like let mut size = ...; size.bytes *= 2;, but rather encourage them to go through these (presumably more careful) APIs.

I actually initially wanted to use the highest bit to indicate that the rest of the size is the base size of an unsized type, instead of an exact size, but I ended up dealing with unsized types differently.
I could go back to private raw and add more checks for 2^61 overflow if you want to.

# Conflicts: # src/librustc/ty/layout.rs

bors · 2016-04-20T08:34:03Z

📌 Commit c7d564d has been approved by nikomatsakis

bors · 2016-04-20T10:44:57Z

⌛ Testing commit c7d564d with merge 542c7d1...

bors · 2016-04-20T11:58:43Z

💔 Test failed - auto-win-msvc-32-opt

eddyb · 2016-04-20T14:16:42Z

@bors retry

bors · 2016-04-20T14:27:59Z

⌛ Testing commit c7d564d with merge 6ece144...

Compute LLVM-agnostic type layouts in rustc. Layout for monomorphic types, and some polymorphic ones (e.g. `&T` where `T: Sized`), can now be computed by rustc without involving LLVM in the actual process. This gives rustc the ability to evaluate `size_of` or `align_of`, as well as obtain field offsets. MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly. Layout computation also comes with a `[breaking-change]`, or two: * `"data-layout"` is now mandatory in custom target specifications, reverting the decision from #27076. This string is needed because it describes endianness, pointer size and alignments for various types. We have the first two and we could allow tweaking alignments in target specifications. Or we could also extract the data layout from LLVM and feed it back into rustc. However, that can vary with the LLVM version, which is fragile and undermines stability. For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults. * `transmute` calls are checked in a stricter fashion, which fixes #32377 To expand on `transmute`, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any). If you're affected, my suggestions are: * try to use casts (and raw pointer deref) instead of transmutes * *really* try to avoid `transmute` where possible * if you have a structure, try working on individual fields and unpack/repack the structure instead of transmuting it whole, e.g. `transmute::<RefCell<Box<T>>, RefCell<*mut T>>(x)` doesn't work, but `RefCell::new(Box::into_raw(x.into_inner()))` does (and `Box::into_raw` is just a `transmute`)

GuillaumeGomez · 2016-04-20T15:35:25Z

src/librustc/diagnostics.rs

@@ -1410,6 +1410,32 @@ It is not possible to use stability attributes outside of the standard library.
 Also, for now, it is not possible to write deprecation messages either.
 "##,

+E0512: r##"


Oh nice! Thanks for adding it! \o/

I only moved it.

bors · 2016-04-20T16:57:54Z

eddyb · 2016-04-20T16:58:42Z

Nominated for backporting to beta to fix #32377.

solson · 2016-04-22T05:21:18Z

src/librustc/ty/layout.rs

+}
+
+/// Helper function for normalizing associated types in an inference context.
+fn normalize_associated_type<'a, 'tcx>(infcx: &InferCtxt<'a, 'tcx>,


@eddyb In Miri I use rustc::infer::normalize_associated_type. Its source code is slightly different and has a warning about only being callable from trans, but most of it is duplicated. Could they be unified somehow?

See how I call ty.layout(&infcx) from trans. The effect is the same in that case, it's only different elsewhere (intrinsicck).

The data layout had become optional at some point. Some time after that, it started causing a compiler error, so I removed it. From the Rust side, those changes are documented in the following issue: rust-lang/rust#31367 This is the pull request that made the data layout non-optional again, is this one: rust-lang/rust#32939 I took the layout I added here from the Rust compiler code. The various built-in ARM targets seem to have mostly[1] the same target layout, which makes sense, as the target layout describes mostly hardware characteristics that shouldn't change between operation systems. The layout I copied is from the `arm-unknown-linux-gnueabi` target, here: https://github.com/rust-lang/rust/blob/253b7c1e1a919a6b722c29a04241d6f08ff8c79a/src/librustc_back/target/arm_unknown_linux_gnueabi.rs#L19 I double-checked with the LLVM documentation on data layouts, and everything seems legit, as far as I can tell. I don't completely understand everything about it, though, so I can't give a 100% guarantee. The LLVM documentation on data layouts is located here: http://llvm.org/docs/LangRef.html#data-layout In any case, the program compiles and works fine with the new layout, so I'm assuming it's correct. [1] "Mostly", because the one exception I see is the name mangline option ("m:"), which I set to "e", meaning "ELF". This doesn't seem terribly relevant. The only case, that I can think of, that might make it relevant is if we had C code calling into our Rust code, but then we would mark the called Rust functions as "#[no_mangle]" anyway.

pnkfelix · 2016-04-28T20:31:15Z

after a week of exercising in the nightly release, the compiler team has decided to accept this for beta, largely because it fixes a real regression (#32377), and that outweighed the relative risk of backporting such a largish change to beta.

rust-highfive assigned nrc Apr 13, 2016

eddyb force-pushed the layout branch 2 times, most recently from c6564a1 to b9022d2 Compare April 13, 2016 17:55

eddyb added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Apr 13, 2016

arielb1 reviewed Apr 13, 2016
View reviewed changes

eddyb changed the title ~~[WIP] Compute LLVM-agnostic type layouts in rustc.~~ Compute LLVM-agnostic type layouts in rustc. Apr 13, 2016

solson reviewed Apr 13, 2016
View reviewed changes

eddyb added I-needs-decision Issue: In need of a decision. and removed S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Apr 14, 2016

rust-highfive assigned nikomatsakis and unassigned nrc Apr 15, 2016

eddyb mentioned this pull request Apr 15, 2016

Promote ! to a type. rust-lang/rfcs#1216

Merged

nikomatsakis reviewed Apr 15, 2016
View reviewed changes

eddyb added 5 commits April 19, 2016 16:08

Make data-layout mandatory in target specs.

0776399

Parse data-layout specifications.

efd0ea5

Compute LLVM-agnostic type layouts in rustc.

fe48a4a

# Conflicts: # src/librustc/ty/layout.rs

Guard against rustc::layout diverging from rustc_trans.

24ca1ec

Check transmutes between types without statically known sizes.

c7d564d

eddyb force-pushed the layout branch from b9022d2 to c7d564d Compare April 19, 2016 15:56

GuillaumeGomez reviewed Apr 20, 2016
View reviewed changes

bors merged commit c7d564d into rust-lang:master Apr 20, 2016

eddyb deleted the layout branch April 20, 2016 16:58

eddyb added the beta-nominated Nominated for backporting to the compiler in the beta channel. label Apr 20, 2016

bors mentioned this pull request Apr 20, 2016

Expose target options via JSON #32988

Closed

Aatch mentioned this pull request Apr 20, 2016

Various improvements to MIR and LLVM IR Construction #32980

Merged

solson reviewed Apr 22, 2016
View reviewed changes

solson mentioned this pull request Apr 22, 2016

Allow box allocations rust-lang/miri#1

Closed

phil-opp mentioned this pull request Apr 22, 2016

Custom target: data-layout changed #31367

Closed

NilSet mentioned this pull request Apr 23, 2016

Add data-layout fields to build targets. redox-os/redox#622

Merged

arielb1 mentioned this pull request Apr 27, 2016

tuples with unsized tails cause an error when compiled #33241

Closed

pnkfelix added the beta-accepted Accepted for backporting to the compiler in the beta channel. label Apr 28, 2016

alexcrichton mentioned this pull request May 4, 2016

Merge beta-accepted into beta #33407

Merged

brson removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label May 4, 2016

gllghr mentioned this pull request May 8, 2016

Compiler panics when 'data-layout' field is missing in LLVM target json #33497

Closed

eddyb mentioned this pull request Aug 31, 2016

repr(packed) allows invalid unaligned loads #27060

Closed

eddyb mentioned this pull request Feb 10, 2017

Transmute trait rust-lang/rfcs#1891

Closed

eddyb mentioned this pull request Jun 24, 2017

RFC - Zero-Sized References rust-lang/rfcs#2040

Closed

PlasmaPower mentioned this pull request Jul 7, 2017

Raise alignment limit from 2^15 to 2^31 - 1 #43097

Merged

Compute LLVM-agnostic type layouts in rustc. #32939

Compute LLVM-agnostic type layouts in rustc. #32939

Uh oh!

Conversation

eddyb commented Apr 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Apr 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Apr 13, 2016

Uh oh!

solson commented Apr 14, 2016

Uh oh!

eddyb commented Apr 14, 2016

Uh oh!

eddyb commented Apr 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aatch commented Apr 15, 2016

Uh oh!

nrc commented Apr 15, 2016

Uh oh!

nikomatsakis commented Apr 15, 2016

Uh oh!

nikomatsakis commented Apr 15, 2016

Uh oh!

eddyb commented Apr 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikomatsakis commented Apr 15, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bors commented Apr 20, 2016

Uh oh!

bors commented Apr 20, 2016

Uh oh!

bors commented Apr 20, 2016

Uh oh!

eddyb commented Apr 20, 2016

Uh oh!

bors commented Apr 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bors commented Apr 20, 2016

Uh oh!

eddyb commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

solson Apr 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pnkfelix commented Apr 28, 2016

Uh oh!

Uh oh!

eddyb commented Apr 13, 2016 •

edited

Loading

eddyb commented Apr 14, 2016 •

edited

Loading

eddyb commented Apr 15, 2016 •

edited

Loading

eddyb commented Apr 20, 2016 •

edited

Loading

solson Apr 22, 2016 •

edited

Loading