Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags should be represented using null ptrs where possible #1271

Closed
nikomatsakis opened this issue Dec 8, 2011 · 17 comments
Closed

Tags should be represented using null ptrs where possible #1271

nikomatsakis opened this issue Dec 8, 2011 · 17 comments
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@nikomatsakis
Copy link
Contributor

A tag with one nullary variant and one variant that accepts only a pointer should be implemented using a null pointer at runtime. This would save space and be generally nifty.

@boggle
Copy link
Contributor

boggle commented Dec 13, 2011

Hmm would be nice to have this in other cases, too, by annotating the NULLary case that should be represented by null.

Additonally, now that I think about it, might be desirable to assign explicit values to the different nullary cases for representing C enums.
Reason to do this instead of using constants would be checking of exhaustiveness in alt expressions.

@nikomatsakis
Copy link
Contributor Author

Yes, I like the idea of adding explicit annotations to help guide the representation. This seems related to supporting C-style enums, where the value of each case can be specified explicitly.

@nikomatsakis
Copy link
Contributor Author

We could in principle support "single-word" variant type representation (i.e., use NULL) for tags with a small number of variants, but we would have to use "tagged pointers" (i.e., set the low bits). In this way we could support up to 4 nullary variants and 4 pointer variants. I would certainly be inclined to start with NULL or not NULL (i.e., option) but I think tagged ptrs would be as general as we could get.

@graydon
Copy link
Contributor

graydon commented Dec 16, 2011

A long time ago I set out to make this part of the mandatory representation of tags in rust (including the bit-packing on unused pointer bits). I'd be willing to revive the concept, though I'll admit it feels weird to encode as a first-class language feature, and likely hard to cover all cases.

@nikomatsakis
Copy link
Contributor Author

It doesn't seem like such a weird thing to me, though I wouldn't call it a super high priority feature. Still, I'm pretty keen on the idea of Rust giving you very good control of how data is represented in memory, and I think being able to present a null pointer as an option type can be really important for something like a big array of data where you don't want to spend an extra word per item. Tagged pointers would just be icing on the cake.

@kevina
Copy link
Contributor

kevina commented Jan 17, 2012

I just noticed this bug and was wondering about the issue.

Not directly related, but another thing I would like to see is using special values to represent the case when there is only one unary variant and the rest are nullary to avoid having to have a separate tag field. I am guessing the typestate system can be used to enforce the constrain that the value not be one of the special values. Naturally this is something the user need to request explicitly somehow.

However, I agree the special case of one unary that is a pointer and one nullary is the most important.

@pcwalton
Copy link
Contributor

pcwalton commented Feb 1, 2012

The only issue to me with this is that it makes the rules about memory layout more complex with tags. This matters for C interop. As an alternative, we could build option into the language.

@nikomatsakis
Copy link
Contributor Author

Yes, it does make things more complex. Though, frankly, most C types probably use a null pointer for this case rather than a (uint, ptr*) pair! Basically, enums with data don't have a very natural C representation imo. I don't know that most unions are immediately preceded by a uint saying which variant of the union to use. Building option<> into the language would be an "ok" compromise. Worse to be sure, but is it also better?

I could also imagine some attributes that specify the layout as well.

@graydon
Copy link
Contributor

graydon commented Feb 15, 2012

I would prefer not to build option into the language. System malloc usually aligns to eight bytes; on x64 the high 16 bits of every pointer are also free (all implementations are currently 48 bit addressed). It's not too hard for us to wire these (or a similar assumptions that the compiler can "know") into our handling of boxed values, and it means a bunch of cases drop to half their size.

Issue #1704 requests a packed modifier. I'm ok with packed enum { ... } knowing enough about pointers to pull this trick when it can find the bits to stuff the tag values into. It's not like we have to maintain compatibility with C handling here. C doesn't have disjoint unions, and most C code that emulates this sort of thing does in fact do just this.

@graydon
Copy link
Contributor

graydon commented Feb 15, 2012

Oh, also, removing RFC tag. This doesn't really require "everyone nodding" consensus. It's mostly a codegen thing; users aren't even that likely to notice most of the time.

@ghost ghost assigned nikomatsakis Apr 12, 2012
@catamorphism
Copy link
Contributor

You don't need to build option into the language to do the "nullary tags are null pointers" optimization. Rather, the compiler can do that optimization for any data type that's isomorphic to option. There's a bag of tricks like this -- since I didn't see this paper mentioned in comments so far:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.3526

and also a chapter in Appel's Compiling with Continuations.

@nikomatsakis
Copy link
Contributor Author

Agreed, building in option is more complex than is warranted. To address @pcwalton's objection, just have an annotation for specifying the layout of an enum from the various choices we have available. In the absence of such an annotation, we just select as we think best.

@catamorphism
Copy link
Contributor

When I was looking at work related to this for my dissertation proposal, I found http://people.ischool.berkeley.edu/~parikh/projects/WIL/ -- a proposal that talks about just such annotations (in the context of the Whirlwind compiler) -- though the work was never implemented and I couldn't find anything close to it. So the problem of how to specify "hints" to the compiler about how to lay out data structures is novel/interesting in and of itself, IMO.

@catamorphism
Copy link
Contributor

For ease of searching: this issue is about representation optimizations for enum variants. (The issue talks about tags but that's very old terminology.)

@sanxiyn
Copy link
Member

sanxiyn commented Jan 25, 2013

I will tackle this next week.

@jld
Copy link
Contributor

jld commented Apr 7, 2013

I think I have this working, modulo some cleanup and a bug in LLVM.

@graydon
Copy link
Contributor

graydon commented May 1, 2013

Fixed now, closing.

@graydon graydon closed this as completed May 1, 2013
bjorn3 added a commit to bjorn3/rust that referenced this issue Aug 24, 2022
Support compiling codegen units in parallel
coastalwhite pushed a commit to coastalwhite/rust that referenced this issue Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

8 participants