Skip to content

Guarantee that heap::EMPTY.offset(0) has defined behaviour #25718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Gankra opened this issue May 22, 2015 · 13 comments
Closed

Guarantee that heap::EMPTY.offset(0) has defined behaviour #25718

Gankra opened this issue May 22, 2015 · 13 comments

Comments

@Gankra
Copy link
Contributor

Gankra commented May 22, 2015

Currently we have an awkward situation where heap::EMPTY (1 as *mut _) is frequently used to represent zero-sized types or other "non-allocations". It is often desirable to offset-by-0 from this in general code to avoid complicating things with special-cases for ZSTs or empty array iterators.

Unfortunately it is possible to interpret the LLVM GEP docs to determine that heap::EMPTY.offset(0) (which is marked as inbounds unconditionally) has undefined behaviour, because heap::EMPTY is not actually allocated. I see two ways forward for this: mark heap::EMPTY as allocated in some way for LLVM, or ensure offset(0) is always safe. I favour the latter, simply because it's plausible to want to offset by 0 off some other ptrs. However the former is acceptable if this is not possible.

@Gankra
Copy link
Contributor Author

Gankra commented May 22, 2015

CC @Aatch @dotdash

@huonw
Copy link
Member

huonw commented May 22, 2015

The LLVM docs can't be talking about literal malloc allocations, since e.g. one can perfectly well index into static data.

@Gankra
Copy link
Contributor Author

Gankra commented May 22, 2015

"allocated" is just the actual terminology they use in the documentation. I have no idea what it properly entails (LLVM vagueness prevails).

If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise.

(emphasis mine)

via http://llvm.org/docs/LangRef.html#getelementptr-instruction

@Gankra
Copy link
Contributor Author

Gankra commented May 22, 2015

I can't imagine heap::EMPTY is allocated by any reasonable definition, though.

@huonw
Copy link
Member

huonw commented May 23, 2015

Yes, I know the terminology used by the docs, in just pointing out that one has to be careful to avoid conflating it with malloc (doing so is something I've seen happen a lot in this sort of discussion).

In any case, due to the zero size, I think there are definitions of allocated which work fine with EMPTY.

@huonw
Copy link
Member

huonw commented May 23, 2015

(One way to guarantee this would be to lower offsets on zero sized types to a noop.)

@Gankra
Copy link
Contributor Author

Gankra commented May 23, 2015

Special-casing ZSTs covers half the usecase, but it doesn't cover creating the iterator for a empty slice of non-ZSTs.

e.g.

let vec: Vec<u8> = Vec::new();
for i in vec { .. }

where Vec::new sets ptr to heap::EMPTY and Vec's iterator code (for non ZSTs) amounts to:

let mut start = vec.ptr;
let end = start.offset(vec.len); // does this work if vec.ptr == heap::EMPTY and vec.len == 0?
while start != end {
  let elem = ptr::read(start);
  // ... use elem
  start = start.offset(1);
} 

@Gankra
Copy link
Contributor Author

Gankra commented May 23, 2015

CC @pczarn and #24604

@dotdash
Copy link
Contributor

dotdash commented May 25, 2015

Adding a zero check to every offset operation seems bad. Vec should probably add a special case for zero length as done in #24604, and if you want to just mess around with possibly "invalid" pointers, there's the arith_offset intrinsic that uses GEP without the inbounds modifier.

@Gankra
Copy link
Contributor Author

Gankra commented May 25, 2015

I was more hoping at codegen time we could just convince LLVM that everything's alright (or possibly someone with a better understanding could simply determine that this isn't actually a concern -- certainly LLVM doesn't seem to actually produce UB when we do this today). A check on every offset is definitely a non-starter.

@bluss
Copy link
Member

bluss commented May 25, 2015

sdf Edit: that's right.

@Gankra
Copy link
Contributor Author

Gankra commented May 25, 2015

@bluss That discussion doesn't seem relevant to ptr::offset?

@Gankra
Copy link
Contributor Author

Gankra commented Jun 24, 2015

I talked this out with @sunfish on IRC. Evidently allocation is a very abstract concept for LLVM where it's basically "as long as you're consistent it's cool". This means that we can pretend heap::EMPTY is an allocated ZST without telling LLVM anything, and so offset-by-0 is legal since that's 1-past-the-end.

However offset-by-0 off heap::EMPTY as *mut T is not sound in general as if e.g. T is a HugeStruct it could actually alias with some real allocated memory somewhere.

Therefore we don't need to worry about ZSTs (yay!) but do need to care about empty arrays. As such I'm closing this as basically resolved.

@Gankra Gankra closed this as completed Jun 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants