Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a case where bounds checks should be eliminated #58602

Closed
oconnor663 opened this issue Feb 20, 2019 · 6 comments
Closed

a case where bounds checks should be eliminated #58602

oconnor663 opened this issue Feb 20, 2019 · 6 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation

Comments

@oconnor663
Copy link
Contributor

https://godbolt.org/z/y-GVc2

const INPUT: usize = 10;
const OUTPUT: usize = 3;

pub fn get_output1(input: &[u8; INPUT], offset: usize) -> Option<&[u8]> {
    if offset <= INPUT - OUTPUT {
        // This version optimizes out the extra bounds checks for the slice.
        Some(&input[offset..offset + OUTPUT])
    } else {
        None
    }
}

pub fn get_output2(input: &[u8; INPUT], offset: usize) -> Option<&[u8]> {
    if offset <= INPUT - OUTPUT {
        // This version fails to optimize bounds checks.
        Some(&input[offset..][..OUTPUT])
    } else {
        None
    }
}

This is a simplified version of a pessimization that came up when I was trying to optimize the array_ref! macro: droundy/arrayref#16

@estebank estebank added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 20, 2019
@luqmana
Copy link
Member

luqmana commented Mar 27, 2019

So this is what the LLVM IR looks like optimized:

; foo::get_output1
; Function Attrs: nonlazybind uwtable
define { i8*, i64 } @_ZN3foo11get_output117haa6e1bf361064f05E([10 x i8]* noalias readonly align 1 dereferenceable(10) %input, i64 %offset) unnamed_addr #0 {
start:
  %0 = icmp ult i64 %offset, 8
  %1 = getelementptr inbounds [10 x i8], [10 x i8]* %input, i64 0, i64 %offset
  %_0.sroa.0.0 = select i1 %0, i8* %1, i8* null
  %2 = insertvalue { i8*, i64 } undef, i8* %_0.sroa.0.0, 0
  %3 = insertvalue { i8*, i64 } %2, i64 3, 1
  ret { i8*, i64 } %3
}

; foo::get_output2
; Function Attrs: nonlazybind uwtable
define { i8*, i64 } @_ZN3foo11get_output217h09c70bf053786004E([10 x i8]* noalias readonly align 1 dereferenceable(10) %input, i64 %offset) unnamed_addr #0 {
start:
  %0 = icmp ult i64 %offset, 8
  br i1 %0, label %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h0c1ade7fc9d72442E.exit", label %bb5

"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h0c1ade7fc9d72442E.exit": ; preds = %start
  %1 = sub i64 10, %offset
  %2 = icmp ult i64 %1, 3
  br i1 %2, label %bb4.i.i.i, label %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17hbbe883cdf661ef0bE.exit"

bb4.i.i.i:                                        ; preds = %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h0c1ade7fc9d72442E.exit"
; call core::slice::slice_index_len_fail
  tail call void @_ZN4core5slice20slice_index_len_fail17h3b63b7b5b8377837E(i64 3, i64 %1)
  unreachable

"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17hbbe883cdf661ef0bE.exit": ; preds = %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h0c1ade7fc9d72442E.exit"
  %3 = getelementptr inbounds [10 x i8], [10 x i8]* %input, i64 0, i64 %offset
  br label %bb5

bb5:                                              ; preds = %start, %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17hbbe883cdf661ef0bE.exit"
  %_0.sroa.0.0 = phi i8* [ %3, %"_ZN4core5slice74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17hbbe883cdf661ef0bE.exit" ], [ null, %start ]
  %4 = insertvalue { i8*, i64 } undef, i8* %_0.sroa.0.0, 0
  %5 = insertvalue { i8*, i64 } %4, i64 3, 1
  ret { i8*, i64 } %5
}

Looking at it we can tell that %2 = icmp ult i64 %1, 3 should be implied by the previous %0 = icmp ult i64 %offset, 8 and in fact LLVM does try to do this opt (from a quick looking in multiple passes instcombine, simplifycfg, instsimplify). Unfortunately, it doesn't seem to be able to relate %1 and %offset.

Just manually rewriting that condition as %2 = icmp ugt i64 %offset, 7 causes it to optimize get_output2 to the same as get_output. But that's not exactly a valid general transformation.

@luqmana luqmana added the WG-llvm Working group: LLVM backend code generation label Mar 27, 2019
@nikic
Copy link
Contributor

nikic commented Mar 27, 2019

This could be decomposed into three optimizations: First, it should be possible to determine that the sub is nuw. I think this is something that CVP should be capable of determining, at least in principle. Second, %y = sub nuw C, %x; icmp P %y, C2 should be instcombined to icmp swap(P) %y, C-C2. Finally CSE will see that the conditions are now the same (this part already works).

@luqmana
Copy link
Member

luqmana commented Mar 28, 2019

@luqmana
Copy link
Member

luqmana commented Apr 20, 2019

The (icmp P (sub nuw|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C) part of the change landed already in https://reviews.llvm.org/D59916.
The change to teach CVP to add nuw/nsw to sub is at https://reviews.llvm.org/D60036 but unfortunately that optimization is currently disabled by default due to another LLVM bug: https://bugs.llvm.org/show_bug.cgi?id=31181

@nikic
Copy link
Contributor

nikic commented Jun 29, 2019

CVP nowrap inference reenabled in https://reviews.llvm.org/D62776, so this issue should be fixed once LLVM is updated.

@mati865
Copy link
Contributor

mati865 commented Jul 19, 2019

LLVM was upgraded and both functions generate equal assembly now.

@nikic nikic closed this as completed Jul 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests

5 participants