Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining fails when there is no default branch #76772

Open
DianQK opened this issue Jan 3, 2024 · 6 comments
Open

Inlining fails when there is no default branch #76772

DianQK opened this issue Jan 3, 2024 · 6 comments

Comments

@DianQK
Copy link
Member

DianQK commented Jan 3, 2024

From #76669 (comment).

In the following code, with the -O2 -inline-threshold=20 argument, bar1 will be inlined, but bar2 will not.
I expect that if bar1 will be inlined, bar2 should also be inlined. If bar2 is not inlined, bar1 should not be inlined either.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i64 @bar1(i64 %0) {
  %2 = and i64 %0, 6
  switch i64 %2, label %default_branch [
    i64 0, label %branch_0
    i64 2, label %branch_2
    i64 4, label %branch_4
  ]

branch_0:                                         ; preds = %1
  br label %exit

branch_2:                                         ; preds = %1
  br label %exit

branch_4:                                         ; preds = %1
  br label %exit

default_branch:                                   ; preds = %1
  br label %exit

exit:                                             ; preds = %default_branch, %branch_4, %branch_2, %branch_0
  %3 = phi i64 [ 5, %branch_0 ], [ 9, %branch_2 ], [ 2, %branch_4 ], [ 7, %default_branch ]
  ret i64 %3
}

define i64 @bar2(i64 %0) {
  %2 = and i64 %0, 6
  switch i64 %2, label %unreachabledefault [
    i64 0, label %branch_0
    i64 2, label %branch_2
    i64 4, label %branch_4
    i64 6, label %branch_6
  ]

branch_0:                                         ; preds = %1
  br label %exit

branch_2:                                         ; preds = %1
  br label %exit

branch_4:                                         ; preds = %1
  br label %exit

branch_6:                                         ; preds = %1
  br label %exit

unreachabledefault:                               ; preds = %1
  unreachable

exit:                                             ; preds = %branch_6, %branch_4, %branch_2, %branch_0
  %3 = phi i64 [ 5, %branch_0 ], [ 9, %branch_2 ], [ 2, %branch_4 ], [ 7, %branch_6 ]
  ret i64 %3
}

define i64 @foo1(i64 %0) {
  %2 = call i64 @bar1(i64 %0)
  ret i64 %2
}

define i64 @foo2(i64 %0) {
  %2 = call i64 @bar2(i64 %0)
  ret i64 %2
}

Generally, If a version with a default branch can be inlined, it should also get inlined without a default branch. I think a switch without a default branch generates fewer instructions.

cc @dtcxzyw

@DianQK
Copy link
Member Author

DianQK commented Jan 10, 2024

I checked the relevant code and I found a lot of interesting things.

1. [DONE] InlineCost.cpp does not consider the default branch.

unsigned JumpTableSize = 0;
BlockFrequencyInfo *BFI = GetBFI ? &(GetBFI(F)) : nullptr;
unsigned NumCaseCluster =
TTI.getEstimatedNumberOfCaseClusters(SI, JumpTableSize, PSI, BFI);
onFinalizeSwitch(JumpTableSize, NumCaseCluster);

unsigned N = SI.getNumCases();
const TargetLoweringBase *TLI = getTLI();
const DataLayout &DL = this->getDataLayout();
JumpTableSize = 0;
bool IsJTAllowed = TLI->areJTsAllowed(SI.getParent()->getParent());
// Early exit if both a jump table and bit test are not allowed.
if (N < 1 || (!IsJTAllowed && DL.getIndexSizeInBits(0u) < N))
return N;

Godbolt:


2. Inconsistent minimum cases for lookup table

The SimplifyCFG uses 3.

// Ignore switches with less than three cases. Lookup tables will not make
// them faster, so we don't analyze them.
if (SI->getNumCases() < 3)
return false;
.

The InlineCost uses getMinimumJumpTableEntries (default is 4).

unsigned TargetLoweringBase::getMinimumJumpTableEntries() const {
return MinimumJumpTableEntries;
}


3. Large cost of instructions for lookup tables?

I feel like we should reduce lookup table cost.

// If suitable for a jump table, consider the cost for the table size and
// branch to destination.
// Maximum valid cost increased in this function.
if (JumpTableSize) {
int64_t JTCost =
static_cast<int64_t>(JumpTableSize) * InstrCost + 4 * InstrCost;
addCost(JTCost);
return;
}

If inline cost considers the number of instructions, we should not calculate the length of the lookup table.


I don't know if I'm right to think so. But I believe we should address at least the first two.

@dtcxzyw
Copy link
Member

dtcxzyw commented Jan 10, 2024

I checked the relevant code and I found a lot of interesting things.

1. InlineCost.cpp does not consider the default branch.

unsigned JumpTableSize = 0;
BlockFrequencyInfo *BFI = GetBFI ? &(GetBFI(F)) : nullptr;
unsigned NumCaseCluster =
TTI.getEstimatedNumberOfCaseClusters(SI, JumpTableSize, PSI, BFI);
onFinalizeSwitch(JumpTableSize, NumCaseCluster);

unsigned N = SI.getNumCases();
const TargetLoweringBase *TLI = getTLI();
const DataLayout &DL = this->getDataLayout();
JumpTableSize = 0;
bool IsJTAllowed = TLI->areJTsAllowed(SI.getParent()->getParent());
// Early exit if both a jump table and bit test are not allowed.
if (N < 1 || (!IsJTAllowed && DL.getIndexSizeInBits(0u) < N))
return N;

Godbolt: https://llvm.godbolt.org/z/fd17b4nMj

Yeah, we should add an extra cost for the reachable default case. The inline cost of switch should be lower than the original form after #76669.

@DianQK
Copy link
Member Author

DianQK commented Jan 10, 2024

I checked the relevant code and I found a lot of interesting things.

1. InlineCost.cpp does not consider the default branch.

Yeah, we should add an extra cost for the reachable default case. The inline cost of switch should be lower than the original form after #76669.

If both will become lookup tables or neither will become lookup tables. I think so. This test case happens to encounter a boundary value. The cost of a normal switch is much less than a lookup table. Sad...

Anyway, I'll start by creating two draft PRs for the first two issues to illustrate my ideas.

@DianQK
Copy link
Member Author

DianQK commented Jan 10, 2024

If both will become lookup tables or neither will become lookup tables. I think so. This test case happens to encounter a boundary value. The cost of a normal switch is much less than a lookup table.

But the unreachable default branch should always be no worse than the default reachable branch. So this must be the result of inconsistent information between the inline and lookup tables.

DianQK added a commit that referenced this issue Feb 11, 2024
First step in fixing #76772.

This PR considers the default branch as a case branch. This will give
the unreachable default branch fair consideration.
@DianQK
Copy link
Member Author

DianQK commented Feb 11, 2024

Let's move on to step two.

@DianQK
Copy link
Member Author

DianQK commented May 3, 2024

  1. InlineCost.cpp does not consider the default branch.
  2. Inconsistent minimum cases for lookup table
  3. Large cost of instructions for lookup tables?

I think I've figured out the problem here. Among the three issues I mentioned earlier, the second and third ones are maybe incorrect. The thresholds/costs for these two problems are related to the backend and are unrelated to the intermediate IR. For example, the lookup table here isn't transformed into a GEP instruction in the IR; instead, it's a machine code jump instruction: https://llvm.godbolt.org/z/Ps3Mhxjcx.

So, this result seems expected as well. I'm considering closing this issue. Unless we encounter actual performance regressions, we can reconsider how to add additional rules for handling. (Perhaps we should revisit the third problem, aiming to reduce the cost overhead of the backend lookup table. Or converting the small switch statement with no default branch into several comparison instructions.)

cc @dtcxzyw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants