-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translate-c: handle negative array indices #8589
translate-c: handle negative array indices #8589
Conversation
Let me CC @SpexGuy here, what does the specification say about pointer arithmetic? From a LLVM point of view we're codegen'ing pointer arithmetic using
We'd be on the safer side by codegen'ing the non-wrapping arithmetic using non-inbounds GEPs and the wrapping one using |
Good question. We haven't really thought about this yet. I expect that in the end either |
|
Oh, my bad, yeah, looks like // 1. check the sign, good for alias analysis, bad for everything else
res = if (offset >= 0) (ptr + @intCast(usize, offset)) else (ptr - @intCast(usize, -offset));
// 2. use integers, bad for alias analysis, good for everything else
res = @intToPtr(*T, @ptrToInt(ptr) +% @bitCast(usize, offset) * @sizeOf(T)); I'll bring this up at a design meeting, it's the same problem that prevents us from adding a signed value to an unsigned value without either giving up unsigned range or giving up overflow checks. |
Cool, I can do option 1 for now. Should I also update the pointer arithmetic code from my previously-linked PR to use the same approach? |
Yeah, that's probably safer until we figure out what |
We talked about this at the design meeting, and decided
So the long term path here is to use |
I take this is also the case for a RHS of type |
There's another problem of little practical interest that I've stumbled into while implementing the safety checks for pointer arithmetic. |
Hmmm. If we used Maybe it makes sense to not do inbounds for pointer arithmetic. We would still be using inbounds for everything else. I need to think about this for a while. Feel free to propose ideas. |
Would it make sense to lower pointer addition with a usize as bitCast to isize + add? |
There's a bit more than this, the use of inbound GEPs limits the addressable memory to the lower half and so array indices, |
Without modular aka wrapping pointer arithmetic, how are you going to translate extern char mem[4]; int main(){
} (https://godbolt.org/z/f1TY9d88x) Of course this is a contrived example but someday people will write something like (char*)p + (offset(struct foo, baz) - offsetof(struct foo, bar)) which (since all arithmetic with unsigned integers in C is modular aka wrapping and the result of offsetof is an unsigned size_t) means, oops, the result of the the subtraction is a very big unsigned number. Maybe we should pedantically have written ((char*)p + offset(struct foo, baz)) - offsetof(struct foo, bar) or (char*)p + (diff_t) ( offset(struct foo, baz) - offsetof(struct foo, bar)). but the hardware only knowns about modular aka wrapping arithmetic so the difference is harmless. To define +% for pointers and unsigned integers, if ptr: [*]T; then set ptr +% uoffset == @intToPtr([*]T, @ptrToInt(ptr) +% @as(usize, offset) % @sizeof(T)); while in the signed case ptr: [*]T ; ptr +% soffset = ptr +% @bitcast(usize, @as(isize, offset)) and a modular pointer difference ptr1, ptr2: [*]T; ptr1 -% ptr2 == @divExact(@as(isize, @ptrToInt(ptr1) -% @ptrToInt(ptr2)), @sizeof(T)) //preserve sign Defined in this way the expected associativity properties hold for n,m:isize or n, m:usize ptr +% (n +% m) == (ptr +% n) +% m Since provided no overflow occurs we have for signed integers @as(isize, n + m) == @as(isize, n) + @as(isize,m ) (and similar for smaller unsigned integers with usize) we have for all integer types and provided no overflow occurs ptr +% (n + m) == (ptr +% n) +% m (Note that in terms of modular aka wrapping arithmetic, i.e. if we reduce the offset mod 2^{log2(@Bitsize)} and if there existed modular (aka wrapping) integer type msize (#7512) then one could simply define [*T] : ptr ptr +% offset == @intToPtr([*]T, @as(msize, @ptrToInt(ptr)) +% offset *% @sizeof(T)) and all the integer casting in the definitions above just amount to @as(msize, ) (i.e. reduction mod 2^{log2(@Bitsize)}). Arguably, if you want to stress that pointer arithmetic is an intrinsically dangerous operation that you should try to avoid if you can, but that makes sense on the low, close to the hardware level, only having modular aka wrapping pointer arithmetic +% (and -%, and pointer difference -% with values in msize, or in its absence isize) may be the sane way to go. |
b0c35f5
to
780ed01
Compare
The generated code is pretty wild but it works now (on my machine, we'll see what CI says). Hopefully with less undefined behavior. |
157e5d3
to
33535d5
Compare
A rather complicated workaround for handling signed array subscripts. Once `[*]T + isize` is allowed, this can be removed. Fixes ziglang#8556
33535d5
to
c6f0b24
Compare
ziglang#8589 introduced correct handling of signed (possibly negative) array access of pointers. Since unadorned integer literals in C are signed, this resulted in inefficient generated code when indexing a pointer by a non-negative integer literal.
#8589 introduced correct handling of signed (possibly negative) array access of pointers. Since unadorned integer literals in C are signed, this resulted in inefficient generated code when indexing a pointer by a non-negative integer literal.
Take advantage of wrapping pointer arithmetic to enable negative array
subscripts.
Fixes #8556