-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute LLVM-agnostic type layouts in rustc. #32939
Conversation
r? @nrc (rust_highfive has picked a reviewer for you, use r? to override) |
c6564a1
to
b9022d2
Compare
} | ||
} | ||
|
||
// Odd unit types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Layout::Unit
? Also Layout::Empty
might be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the need for a dedicated unit layout, but Layout::Empty
would probably be useful in refining the set of variants, to make sizeof(Result<T, Void>)
just sizeof(T)
.
However, just like invalid value range reusal (i.e. extending non-zero optimizations), it cannot be done before moving trans to use only ty::Layout
.
I was hoping we can test the worst-case transmute approach before the beta, but there were complications I had to address.
} | ||
|
||
impl Default for TargetDataLayout { | ||
fn default() -> TargetDataLayout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyb How were these defaults chosen? When do they get used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're LLVM's defaults: http://llvm.org/docs/LangRef.html#data-layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the defaults match LLVMs defaults for the data layout string. The data layout string isn't required to specify every value, so the defaults are used to fill in the gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks.
My main worry on these data layouts was just that no one actually understands them and they're just opaque blobs we copy around without any thought, and then once that caused a bug one day and LLVM had reasonable defaults for everything it seemed "why not just use that?" If we validate that our data layout is the same as LLVM, however, then this seems fine to me. I'd be curious if LLVM actually changes anything here between releases (wouldn't that be a huge breaking change for C as well?), but doing this ourselves also seems fine to me. |
@alexcrichton They're documented, though: http://llvm.org/docs/LangRef.html#data-layout. |
I've taken a look at the preliminary list of regressions, and even though almost half of them were build failures (timeouts?), the rest displayed a clear pattern:
That is, in the latter case, you might have Assuming we want to allow the pattern of casting between maybe-newtyped pointers to maybe-unsized maybe-wrapped type parameters, how should we go about doing it? The cleanest solution I have in mind, although not the most principled one, is to have two checks:
While it may be possible to integrate unknown sizes into the regular EDIT: I should mention that if breaking all of those crates is an option, most of them can use pointer casts and dereferences instead - even when a newtype or |
I think regions should be ignored/substituted here. Makes more general sense, since the regions shouldn't affect the layout of the type. |
I think we do.
I don't have an informed enough opinion yet. I'll try to take a look at your code but it may not happen until Monday, since I'm kind of busy today with other (non-Rust-related) things. But it seems to me like we could probably use a similar trick to what we do today, where for things where |
Oh, and clearly regions shouldn't affect the result, but I would expect this equivalency to fallout in a more general way than strict type equality? |
@nikomatsakis Just trying out combinations is pretty fragile, if we ever want to add more kinds of fat pointer metadata (or if we make it fully custom). The "perfect" solution IMO is to enforce static sizes because that would allow a That's because we could end up simplifying, e.g. |
I see. Good point. |
pub struct Size { | ||
raw: u64 | ||
pub bytes: u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really care, but I imagine the use of privacy here was to help ensure people don't do stupid things like let mut size = ...; size.bytes *= 2;
, but rather encourage them to go through these (presumably more careful) APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually initially wanted to use the highest bit to indicate that the rest of the size is the base size of an unsized type, instead of an exact size, but I ended up dealing with unsized types differently.
I could go back to private raw
and add more checks for 2^61 overflow if you want to.
# Conflicts: # src/librustc/ty/layout.rs
📌 Commit c7d564d has been approved by |
⌛ Testing commit c7d564d with merge 542c7d1... |
💔 Test failed - auto-win-msvc-32-opt |
@bors retry |
Compute LLVM-agnostic type layouts in rustc. Layout for monomorphic types, and some polymorphic ones (e.g. `&T` where `T: Sized`), can now be computed by rustc without involving LLVM in the actual process. This gives rustc the ability to evaluate `size_of` or `align_of`, as well as obtain field offsets. MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly. Layout computation also comes with a `[breaking-change]`, or two: * `"data-layout"` is now mandatory in custom target specifications, reverting the decision from #27076. This string is needed because it describes endianness, pointer size and alignments for various types. We have the first two and we could allow tweaking alignments in target specifications. Or we could also extract the data layout from LLVM and feed it back into rustc. However, that can vary with the LLVM version, which is fragile and undermines stability. For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults. * `transmute` calls are checked in a stricter fashion, which fixes #32377 To expand on `transmute`, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any). If you're affected, my suggestions are: * try to use casts (and raw pointer deref) instead of transmutes * *really* try to avoid `transmute` where possible * if you have a structure, try working on individual fields and unpack/repack the structure instead of transmuting it whole, e.g. `transmute::<RefCell<Box<T>>, RefCell<*mut T>>(x)` doesn't work, but `RefCell::new(Box::into_raw(x.into_inner()))` does (and `Box::into_raw` is just a `transmute`)
@@ -1410,6 +1410,32 @@ It is not possible to use stability attributes outside of the standard library. | |||
Also, for now, it is not possible to write deprecation messages either. | |||
"##, | |||
|
|||
E0512: r##" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice! Thanks for adding it! \o/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only moved it.
Nominated for backporting to beta to fix #32377. |
} | ||
|
||
/// Helper function for normalizing associated types in an inference context. | ||
fn normalize_associated_type<'a, 'tcx>(infcx: &InferCtxt<'a, 'tcx>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyb In Miri I use rustc::infer::normalize_associated_type
. Its source code is slightly different and has a warning about only being callable from trans, but most of it is duplicated. Could they be unified somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See how I call ty.layout(&infcx)
from trans. The effect is the same in that case, it's only different elsewhere (intrinsicck).
The data layout had become optional at some point. Some time after that, it started causing a compiler error, so I removed it. From the Rust side, those changes are documented in the following issue: rust-lang/rust#31367 This is the pull request that made the data layout non-optional again, is this one: rust-lang/rust#32939 I took the layout I added here from the Rust compiler code. The various built-in ARM targets seem to have mostly[1] the same target layout, which makes sense, as the target layout describes mostly hardware characteristics that shouldn't change between operation systems. The layout I copied is from the `arm-unknown-linux-gnueabi` target, here: https://github.com/rust-lang/rust/blob/253b7c1e1a919a6b722c29a04241d6f08ff8c79a/src/librustc_back/target/arm_unknown_linux_gnueabi.rs#L19 I double-checked with the LLVM documentation on data layouts, and everything seems legit, as far as I can tell. I don't completely understand everything about it, though, so I can't give a 100% guarantee. The LLVM documentation on data layouts is located here: http://llvm.org/docs/LangRef.html#data-layout In any case, the program compiles and works fine with the new layout, so I'm assuming it's correct. [1] "Mostly", because the one exception I see is the name mangline option ("m:"), which I set to "e", meaning "ELF". This doesn't seem terribly relevant. The only case, that I can think of, that might make it relevant is if we had C code calling into our Rust code, but then we would mark the called Rust functions as "#[no_mangle]" anyway.
after a week of exercising in the nightly release, the compiler team has decided to accept this for beta, largely because it fixes a real regression (#32377), and that outweighed the relative risk of backporting such a largish change to beta. |
Layout for monomorphic types, and some polymorphic ones (e.g.
&T
whereT: Sized
),can now be computed by rustc without involving LLVM in the actual process.
This gives rustc the ability to evaluate
size_of
oralign_of
, as well as obtain field offsets.MIR-based CTFE will eventually make use of these layouts, as will MIR trans, shortly.
Layout computation also comes with a
[breaking-change]
, or two:"data-layout"
is now mandatory in custom target specifications, reverting the decision from Upgrade to LLVM's 3.7 release branch #27076.This string is needed because it describes endianness, pointer size and alignments for various types.
We have the first two and we could allow tweaking alignments in target specifications.
Or we could also extract the data layout from LLVM and feed it back into rustc.
However, that can vary with the LLVM version, which is fragile and undermines stability.
For built-in targets, I've added a check that the hardcoded data-layout matches LLVM defaults.
transmute
calls are checked in a stricter fashion, which fixes Unexpected tail in unsized_info_ty: usize for ty=process::Core<isize, isize, isize> #32377To expand on
transmute
, there are only 2 allowed patterns: between types with statically known sizes and between pointers with the same potentially-unsized "tail" (which determines the type of unsized metadata they use, if any).If you're affected, my suggestions are:
transmute
where possibletransmute::<RefCell<Box<T>>, RefCell<*mut T>>(x)
doesn't work, butRefCell::new(Box::into_raw(x.into_inner()))
does (andBox::into_raw
is just atransmute
)