-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor Loop TapirTarget / Add GPU Backend #73
base: master
Are you sure you want to change the base?
Conversation
How much of this is NVPTX specific? Would be great to also have this working on the AMDGPU backend. |
AMD Gpu backend is in the works. This is also staging for larger GPU codegen which is to come shortly! |
}); | ||
// Report success. | ||
ORE.emit(OptimizationRemark(LS_NAME, "DACSpawning", DLoc, Header) | ||
<< "spawning iterations using divide-and-conquer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks like it reports the wrong message.
// Report failure. | ||
ORE.emit(OptimizationRemarkMissed(LS_NAME, "NoDACSpawning", DLoc, | ||
Header) | ||
<< "cannot spawn iterations using divide-and-conquer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks like it reports the wrong message.
@@ -1417,6 +1427,35 @@ bool LoopSpawningImpl::processLoop(Loop *L) { | |||
case LoopSpawningHints::ST_SEQ: | |||
DEBUG(dbgs() << "LS: Hints dictate sequential spawning.\n"); | |||
break; | |||
case LoopSpawningHints::ST_GPU: | |||
DEBUG(dbgs() << "LS: Hints dictate DAC spawning.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debug statement will print the wrong message.
lib/Transforms/Tapir/OpenMPABI.cpp
Outdated
@@ -488,19 +488,19 @@ Function* formatFunctionToTask(Function* extracted, CallInst* cal) { | |||
IRBuilder<> CallerIRBuilder(cal); | |||
auto *SharedsTySize = | |||
CallerIRBuilder.getInt64(DL.getTypeAllocSize(SharedsTy)); | |||
auto *KmpTaskTTy = createKmpTaskTTy(C); | |||
//unused -- auto *KmpTaskTTy = createKmpTaskTTy(C); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just delete this line and the other //unused
dead lines in this file? Are we worried we'll need them later?
@@ -803,6 +803,8 @@ void llvm::OpenMPABI::postProcessFunction(Function &F) { | |||
OpenMPRuntimeFunction::OMPRTL__kmpc_fork_call, F.getParent()); | |||
// Replace the old call with __kmpc_fork_call | |||
auto *ForkCall = emitRuntimeCall(ForkRTFn, OMPRegionFnArgs, "", b); | |||
assert(ForkCall != 0); // play it safe -- something better to do here? | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm being nit-picky, but for matching code style, I think we would write assert(ForkCall && "Some helpful error message.");
or assert(ForkCall != nullptr && "Some helpful error message.");
return true; | ||
} | ||
|
||
bool PTXABILoopSpawning::processLoop(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of the code in this method seems to recreate the functionality of the methods in Transforms/Tapir/Outline.cpp, which were built specifically to handle function outlining of this nature and address lots of hairy corner cases. I would prefer that those functions be used to reduce code and functionality duplication.
This PR now also includes a giant refactor of loop-based tapir targets (to ease sharing with GPU and other backends) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the refactoring
@@ -17,7 +17,8 @@ jobs: | |||
command: | | |||
mkdir build | |||
cd build | |||
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2" | |||
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DCOMPILER_RT_BUILD_KITSUNE=OFF -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2" | |||
#cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DCOMPILER_RT_BUILD_KITSUNE=ON -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably delete these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, though I'll change to a reminder to re-enable kitsune after we move to a CI with gpu's
OpenMP = 3, | ||
CilkR = 4, | ||
Qthreads = 5 | ||
CilkLegacy = 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want new targets for the same backends with different loop strategies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards doing so, with cilk legacy denoting the ABI version of the divide and conquer and Cilk denoting our version (building in the compiler). Therefore it you could have any of the 4 (serial vs DAC) x (Cilk vs CilkLegacy), each with different meanings. There are a couple of small differences between the Cilk/Cilk Legacy (such as passing in everything via registers vs a struct and loading), and if the code is there it would be nice to have an explicit path to enable it for an apples-to-apples comparison w other compilers that simply use the abi.
lib/Transforms/Utils/LLVMBuild.txt
Outdated
@@ -19,4 +19,4 @@ | |||
type = Library | |||
name = TransformUtils | |||
parent = Transforms | |||
required_libraries = Analysis Core Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can toss this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
No description provided.