Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change pointers to zero-bit types to be actual pointers with addresses #6706

Closed
tadeokondrak opened this issue Oct 17, 2020 · 10 comments
Closed
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@tadeokondrak
Copy link
Contributor

I don't think this optimization pulls its weight.

#6239 #1553 #4537 #6700

With #6432, you can't cast *[1]u8 to *[0]u8.
I think that's a reasonable cast, and standard library relies on this working.

There are special cases that happen for all zero-bit types and should still be handled by generic code, but when some generic code is using a type solely as a pointer, it still needs to deal with them.

I think the main user of the void type in generic code is HashMap, and it doesn't seem to use *V anywhere.

@tadeokondrak tadeokondrak changed the title Reconsider whether pointers to zero-bit types should be zero-bits Reconsider whether pointers to zero-bit types should be zero bits Oct 17, 2020
@tadeokondrak
Copy link
Contributor Author

tadeokondrak commented Oct 17, 2020

Regarding the actual value of the pointer, it should be legal to have any number but 0, since the compiler won't let you actually dereference it. To get a pointer to one, you could use @intToPtr(*T, 1).
As far as I know this is what Rust does.

@Snektron
Copy link
Collaborator

I would also like to note that this optimization makes Zig more complex, and makes it harder to communicate intent.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 17, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Oct 17, 2020
@foobles
Copy link
Contributor

foobles commented Nov 6, 2020

I totally agree with this. It's such an edge-case optimization that reduces the general consistency of the language. Also, it forces the compiler to do a lot of special cases for whether or not a pointer is zero sized.

It would make sense for any type T with size 0, that any non-null pointer with proper alignment is a valid pointer to that type (this is what Rust does).

@katesuyu
Copy link
Contributor

katesuyu commented Nov 6, 2020

Another case where this optimization is fairly annoying is generic code that performs type erasure and requires an opaque "context pointer." In such a case, the 8 bytes are already "wasted", so the optimization does not help. In practice, it simply leads to special-cased boilerplate involving @bitSizeOf(T) == 0.

I argue that, in practice, this optimization is so rare that it causes more special casing to sidestep it than we'd have to apply to generic code upon removal of the optimization. By the way, what examples do we even have of this optimization generating better code in the standard library?

@andrewrk andrewrk added the accepted This proposal is planned. label Nov 19, 2020
@andrewrk andrewrk changed the title Reconsider whether pointers to zero-bit types should be zero bits change pointers to zero-bit types to be actual pointers with addresses Nov 19, 2020
@N00byEdge
Copy link
Contributor

What does this mean for addresses of zero sized struct members? Are a and b be defined to have the same address here, or will that be undefined?

pub S = struct {
  a: u0,
  b: u8,
};

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 21, 2020

Some notes on this decision:

The original motivation behind zero-sized pointers was not so much an optimization, but the result of an observation that a pointer is how you find a thing, and since zero-sized values can be anywhere, you don't need any information to find one. The insight that we found most persuasive in reversing this is that pointers are also how you find a part of something larger, and you do need information to find a zero-sized piece of a larger structure or array.

Since this is a big change for the language, @andrewrk has suggested that work on this issue be put in a branch to make sure there aren't any major unforseen issues. For now I'm going to close or accept any design issues related to this, but leave any stage 1 bugs involving zero-sized types open until the temporary branch is merged into the main line. We can probably close all of these when that happens:
#6983, #6951, #6947, #6937, #6936, #6861, #4282, #4246, and #3610.

Multiple zero-sized values may share the same address. In particular, this means that the following loop is not safe with zero sized types:

var curr = slice.ptr;
const end = slice.ptr + slice.len;
while (curr != end) : (curr += 1) { ... }

This will execute zero times with a zero sized type, because all elements in the slice share the same address.

No guarantees are made about the address of globals or stack variables that are zero-sized, except that their values are nonzero and have the correct alignment. Zero-sized variables that are fields or array items are pointers to their parent with the appropriate offset. So for @N00byEdge 's example:

pub S = struct {
  a: u0,
  b: u8,
};

Zig makes no guarantees about the order of fields in a bare struct, but the address of a is guaranteed to be within the range of the instance of S. It may be at the beginning or end of S, or between fields if there were more nonzero-sized fields. But it can't be inside of a larger field. If S were a packed struct, where ordering is guaranteed, a and b would both have the same address as the parent instance.

@zigazeljko
Copy link
Contributor

The important observation here is that:

(a) a pointer is how you find a thing, and since zero-sized values can be anywhere, you don't need any information to find one.

and

(b) pointers are also how you find a part of something larger, and you do need information to find a zero-sized piece of a larger structure or array.

This proposal considers all pointers to fall into category (b), but what if we made a distinction in the type system between these two categories?

Assuming attribute noexpand stands for (a) and expand for (b), we would have:

  • @sizeOf(*noexpand u0) == 0
  • @sizeOf(*expand u0) != 0

While it is most evident with zero-sized types, this distinction is also important for non-zero-sized types. Consider this example:

fn foo(str: *const [4]u8) void { ... }

fn bar() void {
    foo("somelongstring"[0..4]);
}

Is the compiler here allowed to perform the following optimization?

fn bar() void {
    foo("some");
}

Currently, no, since the compiler can't know if str falls into category (a) or (b).

With this attributes, we could declare foo as fn foo(str: *noexpand const [4]u8) void, which would allow the optimization above to happen.

@ghost
Copy link

ghost commented Dec 11, 2020

Sounds like this issue would close #2325 as well.

@tadeokondrak
Copy link
Contributor Author

@zigazeljko,

I think what you're proposing would work, but it'd still keep a lot of the complexity of zero-sized pointers in the language.
I haven't seen any real code where zero-sized pointers would provide a benefit, so I think the optimization isn't worth it even if it's turned off by default.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
andrewrk added a commit that referenced this issue Oct 16, 2021
 * LLVM backend: The `alloc` AIR instruction as well as pointer
   constants which point to a 0-bit element type now call a common
   codepath to produce a `*const llvm.Value` which is a non-zero pointer
   with a bogus-but-properly-aligned address.
 * LLVM backend: improve the lowering of optional types.
 * Type: `hasCodeGenBits()` now returns `true` for pointers even when
   it returns `false` for their element types.

Effectively, #6706 is now implemented in stage2 but not stage1.
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021
@andrewrk
Copy link
Member

This is implemented in self-hosted which is now the default compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants