You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
while (Size != 1) {
size_t H = Size / 2;
const It MI = First + H;
Size -= H;
First = Comp(Value, *MI) ? First : First + H;
}
return Comp(Value, *First) ? First : First + 1;
The objective is to avoid branch misprediction. The body has no branches and the loop has no early exits, so it is very predictable.
Everything is fine in the IR. The loop is a single BB and a select is used:
.LBB81_4: # in Loop: Header=BB81_2 Depth=1
cmpq $1, %rsi
jne .LBB81_2
Which causes massive amounts of branch misprediction. LLD's largest allocation is a hash table that exists just to avoid using the binary search as often since it is so expensive.
GCC gets this right. In a lld version without the extra hash table it looks like fixing this bug would result in a 12% speedup of the entire link is some testcases.
The text was updated successfully, but these errors were encountered:
After https://reviews.llvm.org/D118118 this flag / metadata is propagated to the X86 backend and with builtin __builtin_unpredictable or attached unpredictable metadata on selects you can force LLVM to codegen CMovs.
Extended Description
lld has a binary search function written as:
while (Size != 1) {
size_t H = Size / 2;
const It MI = First + H;
Size -= H;
First = Comp(Value, *MI) ? First : First + H;
}
return Comp(Value, *First) ? First : First + 1;
The objective is to avoid branch misprediction. The body has no branches and the loop has no early exits, so it is very predictable.
Everything is fine in the IR. The loop is a single BB and a select is used:
%12 = phi i32* [ %0, %8 ], [ %19, %10 ]
...
%14 = getelementptr inbounds i32, i32* %12, i64 %13
...
%18 = icmp ugt i64 %17, %2
%19 = select i1 %18, i32* %12, i32* %14
But codegen decides to use branches:
%bb.3: # in Loop: Header=BB81_2 Depth=1
.LBB81_4: # in Loop: Header=BB81_2 Depth=1
cmpq $1, %rsi
jne .LBB81_2
Which causes massive amounts of branch misprediction. LLD's largest allocation is a hash table that exists just to avoid using the binary search as often since it is so expensive.
GCC gets this right. In a lld version without the extra hash table it looks like fixing this bug would result in a 12% speedup of the entire link is some testcases.
The text was updated successfully, but these errors were encountered: