Skip to content

Codegen creates an extra branch in a binary search #36492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
llvmbot opened this issue Apr 16, 2018 · 3 comments
Open

Codegen creates an extra branch in a binary search #36492

llvmbot opened this issue Apr 16, 2018 · 3 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2018

Bugzilla Link 37144
Version unspecified
OS Linux
Reporter LLVM Bugzilla Contributor
CC @adibiagio,@Dushistov,@LebedevRI,@RKSimon,@rui314,@rotateright

Extended Description

lld has a binary search function written as:


while (Size != 1) {
size_t H = Size / 2;
const It MI = First + H;
Size -= H;
First = Comp(Value, *MI) ? First : First + H;
}
return Comp(Value, *First) ? First : First + 1;

The objective is to avoid branch misprediction. The body has no branches and the loop has no early exits, so it is very predictable.

Everything is fine in the IR. The loop is a single BB and a select is used:

%12 = phi i32* [ %0, %8 ], [ %19, %10 ]
...
%14 = getelementptr inbounds i32, i32* %12, i64 %13
...
%18 = icmp ugt i64 %17, %2
%19 = select i1 %18, i32* %12, i32* %14

But codegen decides to use branches:

    cmpq    %rdx, %rcx
    ja      .LBB81_4

%bb.3: # in Loop: Header=BB81_2 Depth=1

    leaq    (%rdi,%rax,4), %rdi

.LBB81_4: # in Loop: Header=BB81_2 Depth=1
cmpq $1, %rsi
jne .LBB81_2

Which causes massive amounts of branch misprediction. LLD's largest allocation is a hash table that exists just to avoid using the binary search as often since it is so expensive.

GCC gets this right. In a lld version without the extra hash table it looks like fixing this bug would result in a 12% speedup of the entire link is some testcases.

@nikic
Copy link
Contributor

nikic commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#40027

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@davidbolvansky
Copy link
Collaborator

After https://reviews.llvm.org/D118118 this flag / metadata is propagated to the X86 backend and with builtin __builtin_unpredictable or attached unpredictable metadata on selects you can force LLVM to codegen CMovs.

@davidbolvansky
Copy link
Collaborator

In a lld version without the extra hash table it looks like fixing this bug would result in a 12% speedup of the entire link is some testcases.

This is interesting. LLD developers now can use __builtin_unpredictable to get this nice speedup.

@MaskRay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants