Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow enum discriminants to not be uint, and use smallest possible size by default. #9613

Merged
merged 17 commits into from
Oct 30, 2013

Conversation

jld
Copy link
Contributor

@jld jld commented Sep 30, 2013

Allows an enum with a discriminant to use any of the primitive integer types to store it. By default the smallest usable type is chosen, but this can be overridden with an attribute: #[repr(int)] etc., or #[repr(C)] to match the target's C ABI for the equivalent C enum.

Also adds a lint pass for using non-FFI safe enums in extern declarations, checks that specified discriminants can be stored in the specified type if any, and fixes assorted code that was assuming int.

ty::ty_enum(def_id, ref substs) => {
let cases = get_cases(tcx, def_id, substs);
// Univariant => like struct/tuple.
if cases.len() <= 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really univariant? (Should it be 1 rather than 2?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be 1 (or < 2). Nice catch; thanks. And this should have tests, because that would probably have caught this.

@huonw
Copy link
Member

huonw commented Sep 30, 2013

Are there any tests for sizing of non C-like enum, e.g. enum Foo { A, B } being a single byte (assuming that this is implemented)?

@jld
Copy link
Contributor Author

jld commented Oct 1, 2013

Nothing for enum Foo { A, B } yet, but it'd be easy to add. That's equivalent to enum Foo { A = 0, B = 1 }, so it would definitely be one byte.

pub fn is_ffi_safe(tcx: ty::ctxt, def_id: ast::DefId) -> bool {
match ty::get(ty::lookup_item_type(tcx, def_id).ty).sty {
ty::ty_enum(def_id, ref substs) => {
let cases = get_cases(tcx, def_id, substs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: these seem to be the wrong substs. Maybe I'm getting the formal paramters of the enum instead of the actual parameters of the use? Do I need to actually resolve the type here? (Edit: or, if not here, then in lint.rs.)

Alternately, I could just allow any non-C-like enums (ones where any variant has a non-zero number of fields), because those are less likely to be used accidentally, and file an issue to tighten up the warning in the future. Thoughts?

@jld
Copy link
Contributor Author

jld commented Oct 1, 2013

cc: @thestinger @bjz @jdm @cmr @michaelwoerister, who commented on the last attempt and might want to comment here; and re-cc: @brson @nikomatsakis.

cx.span_lint(ctypes, ty.span,
"found enum type without foreign-function-safe \
representation annotation in foreign module");
// NOTE this message could be more helpful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is definitely worthy of a span_note explaining a little bit about repr, although that may require reorganization of linting somewhat to pass through the note.

@alexcrichton
Copy link
Member

I didn't see much from the tests, but how well does this play along with enums that have tuple or struct variants? If I have something like:

#[repr(C)]
enum A { 
  B(int),
  D { foo: float },
}

Did I just create a "C-compatible" union for an int and a float?

@jld
Copy link
Contributor Author

jld commented Oct 1, 2013

(Deleted previous comment because I misread the question.)
What happens in that case is sort of like this:

enum A_discrim { B, D };
struct A {
  union { enum A_discrim discrim; int unnamed; } B;
  union { enum A_discrim discrim; float foo; } D;
};

So the discriminant is sized like the corresponding C enum, but the “general” Rust enum representation applies otherwise.

As for tests: now that you mention it, I'm thinking that there ought to be some more tests, at the very least for basic cases like Either<u8, i8> just to make sure non-C-like enums are being sized as expected.

@alexcrichton
Copy link
Member

Ok cool, just wanted to make sure that these were considered. I think today that enum { A(i8), B(i8) } is the size of two words because the discriminant is uint and the payload is i32, but in theory the enum can be a 16-bit struct of two i8's. I was hoping this pull request would allow for that, and it sounds like it does!

Some tests would be nice though to ensure that that indeed is the case.

@huonw
Copy link
Member

huonw commented Oct 10, 2013

Is this blocked on something?

@jld
Copy link
Contributor Author

jld commented Oct 11, 2013

I think people are waiting on me to add some more tests (nothing too major, I think/hope) before doing final review. With the Mozilla Summit and everything the last week has been a little busy. I'll see if I can get to this sometime tomorrow....

@jld
Copy link
Contributor Author

jld commented Oct 13, 2013

…and it's bit-rotted. Actually, things have been broken since… maybe since const_eval landed in its current form? But with the test suite as it stands, it only shows up with run-pass/deriving-primitive.rs on a 32-bit platform, where my changes make the enum discriminant a u64 because int::max_value is 0xffffffff7fffffff (at least from check_enum_variants's point of view).

@jld
Copy link
Contributor Author

jld commented Oct 21, 2013

And it's even worse than that. I'm glad @alexcrichton suggested some tests of enums with fields, because we have a problem there: Either<u8, i16> is 4 bytes long and therefore immediate, and its type_of is {i8, i16, [0 x i8]} (taking the highest-alignment case and padding it, like Clang does for unions), which means that the body of the Left case is lost and replaced with stack garbage because the second byte the LLVM type is alignment padding.

In contrast, before the immediate type stuff landed, it didn't matter that fields of one case might overlap alignment padding of another, because we'd always do pointer casts and access the memory at the correct type. And without the enum discriminant size changes, an enum with any non-unit-like fields is necessarily larger than an int and so not immediate.

I think I know how to fix this.

@alexcrichton
Copy link
Member

Sadly I just got bit by this :(, I made a function in uvll.rs which returns uv_handle_type, an enum which we define in rust to mirror its C counterpart. I had a function which returned uv_handle_type as part of FFI, but apparently this codegen for this ended up causing segfaults as soon as you called the function which returned the enum.

This pull request would have totally fixed it though with the lint error on missing #[repr(C)] on an FFI function returning an enum.

@jld
Copy link
Contributor Author

jld commented Oct 27, 2013

This almost works. I'm seeing breakage in the rustpkg tests due to stack exhaustion while generating drop glue, which goes away if I raise RUST_MIN_STACK to 4194304 (rather than the default of 4000000). This suggests that rustpkg is invoking rustc without going through rustc::monitor, which seems like a bug. I'm not sure what the right fix there is. I also don't know why my changes would increase stack size.

But the rest of make check passes for me (Linux, x86_64 and i686).

Also export enum attrs into metadata, and add a convenient interface for
obtaining the repr hint from either a local or remote definition.
Allows an enum with a discriminant to use any of the primitive integer
types to store it.  By default the smallest usable type is chosen, but
this can be overridden with an attribute: `#[repr(int)]` etc., or
`#[repr(C)]` to match the target's C ABI for the equivalent C enum.

This commit breaks a few things, due to transmutes that now no longer
match in size, or u8 enums being passed to C that expects int, or
reflection; later commits on this branch fix them.
Note that raising an error during trans doesn't stop the compile or cause
rustc to exit with a failure status, currently, so this is of more than
cosmetic importance.
The variant used in debug-info/method-on-enum.rs had its layout changed
by the smaller discriminant, so that the `u32` no longer overlaps both
of the `u16`s, and thus the debugger is printing partially uninitialized
data when it prints the wrong variant.

Thus, the test runner is modified to accept wildcards (using a string
that should be unlikely to occur literally), to allow for this.
Not only can discriminants be smaller than int now, but they can be
larger than int on 32-bit targets.  This has obvious implications for the
reflection interface.  Without this change, things fail with LLVM
assertions when we try to "extend" i64 to i32.
Otherwise, run-pass/deriving-primitive.rs breaks on 32-bit platforms,
because `int::min_value` is `0xffffffff7fffffff` when evaluated for the
discriminant declaration.
The previous implementation, when combined with small discriminants and
immediate types, caused problems for types like `Either<u8, i16>` which
are now small enough to be immediate and can have fields intersecting
the highest-alignment variant's alignment padding (which LLVM doesn't
preserve).  So let's not do that.
The actual fix would be to make rustpkg use `rustc::monitor` so it picks
up anything special that rustc needs, but for now let's keep the tests
from breaking.
@jld
Copy link
Contributor Author

jld commented Oct 29, 2013

And now it passes make check, optimized and unoptimized, x86_64 and i686.

@pcwalton
Copy link
Contributor

Do we know what the effect on rustc memory usage is, if any, with this patch?

bors added a commit that referenced this pull request Oct 30, 2013
Allows an enum with a discriminant to use any of the primitive integer types to store it.  By default the smallest usable type is chosen, but this can be overridden with an attribute: `#[repr(int)]` etc., or `#[repr(C)]` to match the target's C ABI for the equivalent C enum.

Also adds a lint pass for using non-FFI safe enums in extern declarations, checks that specified discriminants can be stored in the specified type if any, and fixes assorted code that was assuming int.
@bors bors merged commit 86a710e into rust-lang:master Oct 30, 2013
@sanxiyn
Copy link
Member

sanxiyn commented Oct 30, 2013

@pcwalton Is Rust Slim Yet? shows no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants