Missed opportunities to eliminate bounds checks #35981

StefanoD · 2016-08-25T09:44:39Z

A user reported a code example that shows that in Go 1.7 there is bounds check elimination and in rust not.

https://users.rust-lang.org/t/bounds-check-elimination-in-go-1-7/7008

jrmuizel · 2016-08-25T16:02:39Z

A reduced test case would help here.

StefanoD · 2016-08-25T19:37:15Z

The thread has a simple test case. Is this not sufficient? Maybe the author of the thread could write something here. I linked this bug report in the thread.

jrmuizel · 2016-08-25T19:43:19Z

It's not clear from the thread exactly which bounds checks are not being eliminated that are eliminated by Go. A reduced test case would ideally point out a single array access that has a bounds check in Rust but not in Go.

brson · 2016-08-26T02:34:21Z

There are many examples in the linked thread, and a quick read doesn't make it obvious where there are bounds checks to eliminate. Clearly there are optimization opportunities here, but this thread needs reduced examples with a clear description of which bounds checks are not being removed.

Deewiant · 2016-09-16T18:19:23Z

Disclaimer: I know very little Rust and even less Go. I'm somewhat familiar with LLVM, though, so I took a look, comparing the original Go results at http://www.tapirgames.com/blog/golang-1.7-bce to the Rust as given in the forum post.

I'm using rustc 1.11.0 and LLVM 3.8.1.

Note that the Rust translations do not seem to be quite equivalent: the Go code seems to perform no-op accesses on array slices whereas the Rust versions increment the array values. This complicates the generated code heavily, so to simplify the analysis I replaced increment ops like s[i] += 1 with no-op accesses like s[i]. This did not seem to have an effect on the amount of bounds checks in the generated code but made my life easier.

I'll go through the functions in order.

f1, f2, f3, f4

rustc -C opt-level=3 eliminates the same bounds checks as Go does, as is to be expected for these simple cases.

f5

Here's the discrepancy. The Go code:

func f5(s []int) {
    for i := range s {
        _ = s[i]
        _ = s[i:len(s)]
        _ = s[:i+1]
    }
}

The Rust translation from the forum post, with increment operations removed and usize changed to isize to be pedantic:

fn f5(s: &mut [isize]) {
    for i in 0 .. s.len() {
        s[i];
        for j in i .. s.len() { s[j]; }
        for j in 0 .. i + 1 { s[j]; }
    }
}

Looking at the output of rustc -C opt-level=3 --emit llvm-ir, the first access and the first inner loop are completely optimized out here, but the second inner loop remains, and there is a bounds check on the s[j] access therein.

Constructing what I believe to be equivalent C we can see whether this is a rustc or LLVM problem:

#include <stdlib.h>
void f5(long* s, size_t len) {
    for (size_t i = 0; i < len; i++) {
        for (size_t j = 0; j <= i + 1; j++) {
            if (j >= len) {
                abort();
            }
        }
    }
}

The LLVM IR output by clang -O3 -S -emit-llvm is practically identical, so we can't blame rustc here: LLVM doesn't realize that j >= len will never be true here.

Going back to the original Go, we see that it's constructing array slices instead of doing individual element access like the Rust translation. So we could write a Rust version of the function that loops over slices instead:

fn f5(s: &mut [isize]) {
    for i in 0 .. s.len() {
        s[i];
        let len = s.len();
        for x in s[i .. len].iter_mut() { *x; }
        for x in s[0 .. i + 1].iter_mut() { *x; }
    }
}

No bounds checks are emitted for the above code, but there's still a no-op loop that is not eliminated. I'm not sure why that happens but I'm sure it's tied with the bounds check issues.

Going even closer to the original Go:

fn f5(s: &mut [isize]) {
    for i in 0 .. s.len() {
        s[i];
        let len = s.len();
        &s[i .. len];
        &s[0 .. i + 1];
    }
}

The bounds check is back: LLVM doesn't realize that i + 1 > len cannot be true.

f6

Codegen is practically identical to f5, no further insights available here.

f7, f8, f9

Same results as Go, no problems here.

Conclusions

I made some incorrect conclusions here, please see my comment below for better ones. LLVM bug 810 is not strongly related to this and my examples aren't entirely relevant.

In the end I think this boils down to the now over ten years old LLVM bug 810. (There may be more specific bugs for these kinds of use cases related to array bounds checks but I didn't find any.) LLVM simply doesn't have the smarts required to perform reasoning like "if i < n, then we know that !(i + 1 > n)". The following C function, which LLVM 3.8.1 can't optimize to a no-op, demonstrates this fairly succintly (doing the subtraction with a signed int is required so that LLVM can assume that it does not overflow — I believe that's how Rust's integers work as well?):

#include <stdlib.h>
void check(unsigned int i, unsigned int len) {
    if (i < len) {
        if (i > (int)len - 1) {
            abort();
        }
    }
}

Note that if the inner condition is changed to the equivalent i >= len, LLVM does know that it's false, presumably because it recognizes it as the negation of i < len. This suggests that if LLVM were to canonicalize i > x - 1 and i + 1 > x as i >= x this particular case would be solved. But this more obtuse example would still have a problem:

fn obtuse(s: &mut [isize]) {
    if s.len() >= 2 {
        for i in 0 .. s.len() {
            if i < s.len() - 2 {
                &s[0 .. i + 2];
            }
        }
    }
}

Just like this C code does:

#include <stdlib.h>
void check(unsigned int i, unsigned int len) {
    if (len >= 2) {
        if (i < len - 2) {
            if ((int)i + 2 > len) {
                abort();
            }
        }
    }
}

For what it's worth, GCC 6.2.1 can't eliminate the checks in any of the above C snippets either, so this doesn't seem like a major deficiency in LLVM. In the end, kudos to the Go folks for beating industry standard C compilers, but this seems like LLVM's problem, not Rust's.

StefanoD · 2016-09-16T20:37:44Z

So, does it make any sense to extend the old LLVM bug report?
@Deewiant Thank you, very much!!!

Deewiant · 2016-09-17T16:38:05Z

No, I don't think it does. I thought about this some more and did my research more thoroughly and came to the conclusion that while a fix for the old bug 810 might be helpful it's not really pertinent to this case. I'll edit my comment.

Instead, this is a combo of two things:

LLVM does in fact canonicalize i + 1 > x as i >= x (under the assumption that i + 1 does not overflow), but only for signed comparisons! I submitted a patch for the unsigned case, so hopefully this will be taken care of soon: https://reviews.llvm.org/D24700. In terms of the Rust examples this would allow recognizing that i + 1 > s.len() is equivalent to i >= s.len(), which is then recognized as the negation of i < s.len() and so can be assumed to be false inside the loop.
LLVM can't recognize that i + 1 does not overflow even though we know that i < s.len(). Previously I assumed that Rust integers had undefined behaviour on overflow, but I noticed that the emitted LLVM didn't match that assumption and Myths and Legends about Integer Overflow in Rust confirmed that this is not the case. Therefore we need LLVM to infer that overflow is not possible here, so that it can proceed with the canonicalization of i + 1 > x to i >= x. I filed LLVM#30428 for this because I'm not sure how it should be implemented.

For the general case such as my more obtuse example where I used len - 2 and i + 2, I submitted LLVM#30429, but that one's not important for this issue.

StefanoD · 2016-09-17T17:30:58Z

I appreciate your efforts! – Thank you, very much!

nox · 2018-04-02T15:40:23Z

@eddyb I wonder if the root cause of this issue is related to the one of #13018.

Real world code sometimes cannot just use the slice iterator etc and must instead rely on using ranges of indices, it would be nice to be able to fix this, though I suspect this is a pretty difficult task.

Godbolt playground with f5 and obtuse, where we can see the bound checks still exist in current nightly: https://godbolt.org/g/e4NpSz

Cc @rust-lang/wg-codegen

Restioson · 2018-05-01T20:10:26Z

Perhaps miri could help here to more generally find where bounds checks can be removed? I don't know much about how, but perhaps if it could figure out the way to push up the index as much as it can then it can interpret it and figure out whether it's still out of bounds in that situation?

nikic · 2021-02-27T22:19:32Z

Looks like there's no bounds checks on f5 either anymore (since 1.45): https://rust.godbolt.org/z/f3aja5

With that, I think we can call this resolved, and other bounds check elimination failures should be reported separately.

brson changed the title ~~Bounds Check Elimination~~ Missed opportunities to eliminate bounds checks Aug 26, 2016

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017

tustvold mentioned this issue Apr 23, 2020

ARROW-8516: [Rust] Improve PrimitiveBuilder::append_slice performance apache/arrow#6980

Closed

nikic closed this as completed Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missed opportunities to eliminate bounds checks #35981

Missed opportunities to eliminate bounds checks #35981

StefanoD commented Aug 25, 2016

jrmuizel commented Aug 25, 2016

StefanoD commented Aug 25, 2016

jrmuizel commented Aug 25, 2016

brson commented Aug 26, 2016

Deewiant commented Sep 16, 2016 •

edited

Loading

StefanoD commented Sep 16, 2016 •

edited

Loading

Deewiant commented Sep 17, 2016

StefanoD commented Sep 17, 2016 •

edited

Loading

nox commented Apr 2, 2018

Restioson commented May 1, 2018 •

edited

Loading

nikic commented Feb 27, 2021

Missed opportunities to eliminate bounds checks #35981

Missed opportunities to eliminate bounds checks #35981

Comments

StefanoD commented Aug 25, 2016

jrmuizel commented Aug 25, 2016

StefanoD commented Aug 25, 2016

jrmuizel commented Aug 25, 2016

brson commented Aug 26, 2016

Deewiant commented Sep 16, 2016 • edited Loading

f1, f2, f3, f4

f5

f6

f7, f8, f9

Conclusions

StefanoD commented Sep 16, 2016 • edited Loading

Deewiant commented Sep 17, 2016

StefanoD commented Sep 17, 2016 • edited Loading

nox commented Apr 2, 2018

Restioson commented May 1, 2018 • edited Loading

nikic commented Feb 27, 2021

Deewiant commented Sep 16, 2016 •

edited

Loading

StefanoD commented Sep 16, 2016 •

edited

Loading

StefanoD commented Sep 17, 2016 •

edited

Loading

Restioson commented May 1, 2018 •

edited

Loading