-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlining fails when there is no default branch #76772
Comments
I checked the relevant code and I found a lot of interesting things. 1. [DONE]
|
unsigned JumpTableSize = 0; | |
BlockFrequencyInfo *BFI = GetBFI ? &(GetBFI(F)) : nullptr; | |
unsigned NumCaseCluster = | |
TTI.getEstimatedNumberOfCaseClusters(SI, JumpTableSize, PSI, BFI); | |
onFinalizeSwitch(JumpTableSize, NumCaseCluster); |
llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h
Lines 448 to 457 in 7954c57
unsigned N = SI.getNumCases(); | |
const TargetLoweringBase *TLI = getTLI(); | |
const DataLayout &DL = this->getDataLayout(); | |
JumpTableSize = 0; | |
bool IsJTAllowed = TLI->areJTsAllowed(SI.getParent()->getParent()); | |
// Early exit if both a jump table and bit test are not allowed. | |
if (N < 1 || (!IsJTAllowed && DL.getIndexSizeInBits(0u) < N)) | |
return N; |
Godbolt:
- cost: https://llvm.godbolt.org/z/fd17b4nMj (same cost!)
- ASM (without lookup table): https://llvm.godbolt.org/z/M6MMozxcn
2. Inconsistent minimum cases for lookup table
The SimplifyCFG uses 3.
llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
Lines 6579 to 6582 in 7954c57
// Ignore switches with less than three cases. Lookup tables will not make | |
// them faster, so we don't analyze them. | |
if (SI->getNumCases() < 3) | |
return false; |
The InlineCost
uses getMinimumJumpTableEntries
(default is 4).
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp
Lines 2037 to 2039 in 7954c57
unsigned TargetLoweringBase::getMinimumJumpTableEntries() const { | |
return MinimumJumpTableEntries; | |
} |
3. Large cost of instructions for lookup tables?
I feel like we should reduce lookup table cost.
llvm-project/llvm/lib/Analysis/InlineCost.cpp
Lines 704 to 713 in 7954c57
// If suitable for a jump table, consider the cost for the table size and | |
// branch to destination. | |
// Maximum valid cost increased in this function. | |
if (JumpTableSize) { | |
int64_t JTCost = | |
static_cast<int64_t>(JumpTableSize) * InstrCost + 4 * InstrCost; | |
addCost(JTCost); | |
return; | |
} |
If inline cost considers the number of instructions, we should not calculate the length of the lookup table.
I don't know if I'm right to think so. But I believe we should address at least the first two.
Yeah, we should add an extra cost for the reachable default case. The inline cost of switch should be lower than the original form after #76669. |
If both will become lookup tables or neither will become lookup tables. I think so. This test case happens to encounter a boundary value. The cost of a normal switch is much less than a lookup table. Sad... Anyway, I'll start by creating two draft PRs for the first two issues to illustrate my ideas. |
But the unreachable default branch should always be no worse than the default reachable branch. So this must be the result of inconsistent information between the inline and lookup tables. |
First step in fixing #76772. This PR considers the default branch as a case branch. This will give the unreachable default branch fair consideration.
Let's move on to step two. |
I think I've figured out the problem here. Among the three issues I mentioned earlier, the second and third ones are maybe incorrect. The thresholds/costs for these two problems are related to the backend and are unrelated to the intermediate IR. For example, the lookup table here isn't transformed into a GEP instruction in the IR; instead, it's a machine code jump instruction: https://llvm.godbolt.org/z/Ps3Mhxjcx. So, this result seems expected as well. I'm considering closing this issue. Unless we encounter actual performance regressions, we can reconsider how to add additional rules for handling. (Perhaps we should revisit the third problem, aiming to reduce the cost overhead of the backend lookup table. Or converting the small switch statement with no default branch into several comparison instructions.) cc @dtcxzyw |
From #76669 (comment).
In the following code, with the
-O2 -inline-threshold=20
argument,bar1
will be inlined, butbar2
will not.I expect that if
bar1
will be inlined,bar2
should also be inlined. Ifbar2
is not inlined, bar1 should not be inlined either.Generally, If a version with a default branch can be inlined, it should also get inlined without a default branch. I think a switch without a default branch generates fewer instructions.
cc @dtcxzyw
The text was updated successfully, but these errors were encountered: