Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Refactor Loop TapirTarget / Add GPU Backend #73

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

wsmoses
Copy link
Owner

@wsmoses wsmoses commented Jul 23, 2018

No description provided.

@vchuravy
Copy link

How much of this is NVPTX specific? Would be great to also have this working on the AMDGPU backend.

@wsmoses
Copy link
Owner Author

wsmoses commented Jul 23, 2018

AMD Gpu backend is in the works. This is also staging for larger GPU codegen which is to come shortly!

@neboat neboat self-assigned this Jul 23, 2018
});
// Report success.
ORE.emit(OptimizationRemark(LS_NAME, "DACSpawning", DLoc, Header)
<< "spawning iterations using divide-and-conquer");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks like it reports the wrong message.

// Report failure.
ORE.emit(OptimizationRemarkMissed(LS_NAME, "NoDACSpawning", DLoc,
Header)
<< "cannot spawn iterations using divide-and-conquer");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks like it reports the wrong message.

@@ -1417,6 +1427,35 @@ bool LoopSpawningImpl::processLoop(Loop *L) {
case LoopSpawningHints::ST_SEQ:
DEBUG(dbgs() << "LS: Hints dictate sequential spawning.\n");
break;
case LoopSpawningHints::ST_GPU:
DEBUG(dbgs() << "LS: Hints dictate DAC spawning.\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug statement will print the wrong message.

@@ -488,19 +488,19 @@ Function* formatFunctionToTask(Function* extracted, CallInst* cal) {
IRBuilder<> CallerIRBuilder(cal);
auto *SharedsTySize =
CallerIRBuilder.getInt64(DL.getTypeAllocSize(SharedsTy));
auto *KmpTaskTTy = createKmpTaskTTy(C);
//unused -- auto *KmpTaskTTy = createKmpTaskTTy(C);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just delete this line and the other //unused dead lines in this file? Are we worried we'll need them later?

@@ -803,6 +803,8 @@ void llvm::OpenMPABI::postProcessFunction(Function &F) {
OpenMPRuntimeFunction::OMPRTL__kmpc_fork_call, F.getParent());
// Replace the old call with __kmpc_fork_call
auto *ForkCall = emitRuntimeCall(ForkRTFn, OMPRegionFnArgs, "", b);
assert(ForkCall != 0); // play it safe -- something better to do here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm being nit-picky, but for matching code style, I think we would write assert(ForkCall && "Some helpful error message."); or assert(ForkCall != nullptr && "Some helpful error message.");

return true;
}

bool PTXABILoopSpawning::processLoop(){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of the code in this method seems to recreate the functionality of the methods in Transforms/Tapir/Outline.cpp, which were built specifically to handle function outlining of this nature and address lots of hairy corner cases. I would prefer that those functions be used to reduce code and functionality duplication.

@wsmoses wsmoses changed the title add ptx backend [WIP] Add GPU Backend Jul 23, 2018
@wsmoses wsmoses changed the title [WIP] Add GPU Backend [WIP] Refactor Loop TapirTarget / Add GPU Backend Jul 26, 2018
@wsmoses
Copy link
Owner Author

wsmoses commented Jul 26, 2018

This PR now also includes a giant refactor of loop-based tapir targets (to ease sharing with GPU and other backends)

Copy link
Collaborator

@stelleg stelleg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the refactoring

@@ -17,7 +17,8 @@ jobs:
command: |
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=host -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2"
cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DCOMPILER_RT_BUILD_KITSUNE=OFF -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2"
#cmake .. -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_BUILD_TESTS=ON -DLLVM_ENABLE_ASSERTIONS=ON -DCOMPILER_RT_BUILD_KITSUNE=ON -DLLVM_ENABLE_LTO=OFF -DLLVM_USE_LINKER=gold -DLLVM_PARALLEL_COMPILE_JOBS=2 -DLLVM_PARALLEL_LINK_JOBS=1 -DLLVM_BINUTILS_INCDIR=/usr/include -DLLVM_LIT_ARGS="-sv -j 2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably delete these comments.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, though I'll change to a reminder to re-enable kitsune after we move to a CI with gpu's

OpenMP = 3,
CilkR = 4,
Qthreads = 5
CilkLegacy = 3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want new targets for the same backends with different loop strategies?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards doing so, with cilk legacy denoting the ABI version of the divide and conquer and Cilk denoting our version (building in the compiler). Therefore it you could have any of the 4 (serial vs DAC) x (Cilk vs CilkLegacy), each with different meanings. There are a couple of small differences between the Cilk/Cilk Legacy (such as passing in everything via registers vs a struct and loading), and if the code is there it would be nice to have an explicit path to enable it for an apples-to-apples comparison w other compilers that simply use the abi.

@@ -19,4 +19,4 @@
type = Library
name = TransformUtils
parent = Transforms
required_libraries = Analysis Core Support
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can toss this change

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants