You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The attached file (a preprocessed copy of AArch64InstPrinter.cpp) takes 80s to compile locally with -O3 [1], and 400s with -O3 + UBSan [2].
The vast majority of that time is spent computing dominator trees for the JumpThreading pass for two auto-generated functions (AArch64InstPrinter::printAliasInstr and AArch64AppleInstPrinter::printAliasInstr) where inlining significantly blows up the number of basic blocks (from ~5k with -O1 to ~25k with -O3 and ~55k for O3 + UBSan.) Running CFGSimplication before JumpThreading reduces the # of BBs by ~10k and improves the runtime by ~3x, but the compile for -O3 + UBSan still takes ~2 minutes. I'm not sure if making that change is justified/sound.
The issue exposed by the reproducer is that JumpThreading is trying to update the dominator tree with a great number of updates, which causes poor performance because the time complexity of the incremental updating algorithm is proportional to the number of updates.
Fixed in r345353 (https://reviews.llvm.org/rL345353) by reconstructing the dominator tree in this case. The time used by Dominator Tree updating reduces from 297s to 0.15s by the commit when compiling the reproducer with -O3 + UBSan locally.
Extended Description
Filing a bug for this: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180625/563662.html
The attached file (a preprocessed copy of AArch64InstPrinter.cpp) takes 80s to compile locally with -O3 [1], and 400s with -O3 + UBSan [2].
The vast majority of that time is spent computing dominator trees for the JumpThreading pass for two auto-generated functions (AArch64InstPrinter::printAliasInstr and AArch64AppleInstPrinter::printAliasInstr) where inlining significantly blows up the number of basic blocks (from ~5k with -O1 to ~25k with -O3 and ~55k for O3 + UBSan.) Running CFGSimplication before JumpThreading reduces the # of BBs by ~10k and improves the runtime by ~3x, but the compile for -O3 + UBSan still takes ~2 minutes. I'm not sure if making that change is justified/sound.
[1] time clang -cc1 -triple x86_64-unknown-linux-gnu -O3 -std=c++11 -emit-llvm -o /dev/null preprocessed.cpp
[2] time clang -cc1 -triple x86_64-unknown-linux-gnu -O3 -std=c++11 -emit-llvm -o /dev/null preprocessed.cpp -fsanitize=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,float-divide-by-zero,integer-divide-by-zero,nonnull-attribute,null,object-size,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound
The text was updated successfully, but these errors were encountered: