Skip to content

Codegen creates an extra branch in a binary search #36492

Open
@llvmbot

Description

@llvmbot
Bugzilla Link 37144
Version unspecified
OS Linux
Reporter LLVM Bugzilla Contributor
CC @adibiagio,@Dushistov,@LebedevRI,@RKSimon,@rui314,@rotateright

Extended Description

lld has a binary search function written as:


while (Size != 1) {
size_t H = Size / 2;
const It MI = First + H;
Size -= H;
First = Comp(Value, *MI) ? First : First + H;
}
return Comp(Value, *First) ? First : First + 1;

The objective is to avoid branch misprediction. The body has no branches and the loop has no early exits, so it is very predictable.

Everything is fine in the IR. The loop is a single BB and a select is used:

%12 = phi i32* [ %0, %8 ], [ %19, %10 ]
...
%14 = getelementptr inbounds i32, i32* %12, i64 %13
...
%18 = icmp ugt i64 %17, %2
%19 = select i1 %18, i32* %12, i32* %14

But codegen decides to use branches:

    cmpq    %rdx, %rcx
    ja      .LBB81_4

%bb.3: # in Loop: Header=BB81_2 Depth=1

    leaq    (%rdi,%rax,4), %rdi

.LBB81_4: # in Loop: Header=BB81_2 Depth=1
cmpq $1, %rsi
jne .LBB81_2

Which causes massive amounts of branch misprediction. LLD's largest allocation is a hash table that exists just to avoid using the binary search as often since it is so expensive.

GCC gets this right. In a lld version without the extra hash table it looks like fixing this bug would result in a 12% speedup of the entire link is some testcases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugzillaIssues migrated from bugzilla

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions