Skip to content

[SYCL] Generate volatile load/store/memcpy instructions in LowerWGScope #1257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions clang/lib/CodeGen/SYCLLowerIR/LowerWGScope.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -300,10 +300,10 @@ shareOutputViaLocalMem(Instruction &I, BasicBlock &BBa, BasicBlock &BBb,
// 2) Generate a store of the produced value into the WG local var
IRBuilder<> Bld(Ctx);
Bld.SetInsertPoint(I.getNextNode());
Bld.CreateStore(&I, WGLocal);
Bld.CreateStore(&I, WGLocal, true);
// 3) Generate a load in the "worker" BB of the value stored by the leader
Bld.SetInsertPoint(&BBb.front());
auto *WGVal = Bld.CreateLoad(WGLocal, "wg_val_" + Twine(I.getName()));
auto *WGVal = Bld.CreateLoad(WGLocal, true, "wg_val_" + Twine(I.getName()));
// 4) Finally, replace usages of I ouside the scope
for (auto *U : Users)
U->replaceUsesOfWith(&I, WGVal);
Expand Down Expand Up @@ -396,17 +396,17 @@ static void copyBetweenPrivateAndShadow(Value *L, GlobalVariable *Shadow,
auto SizeVal = M.getDataLayout().getTypeStoreSize(T);
auto Size = ConstantInt::get(getSizeTTy(M), SizeVal);
if (Loc2Shadow)
Builder.CreateMemCpy(Shadow, ShdAlign, L, LocAlign, Size);
Builder.CreateMemCpy(Shadow, ShdAlign, L, LocAlign, Size, true);
else
Builder.CreateMemCpy(L, LocAlign, Shadow, ShdAlign, Size);
Builder.CreateMemCpy(L, LocAlign, Shadow, ShdAlign, Size, true);
} else {
Value *Src = L;
Value *Dst = Shadow;

if (!Loc2Shadow)
std::swap(Src, Dst);
Value *LocalVal = Builder.CreateLoad(Src, "mat_ld");
Builder.CreateStore(LocalVal, Dst);
Value *LocalVal = Builder.CreateLoad(Src, true, "mat_ld");
Builder.CreateStore(LocalVal, Dst, true);
}
}

Expand Down Expand Up @@ -673,7 +673,7 @@ static void fixupPrivateMemoryPFWILambdaCaptures(CallInst *PFWICall) {

if (ValAS != PtrAS)
Val = Bld.CreateAddrSpaceCast(Val, NewGEP->getResultElementType());
Bld.CreateStore(Val, NewGEP);
Bld.CreateStore(Val, NewGEP, true);
}
}

Expand Down Expand Up @@ -921,7 +921,7 @@ Value *spirv::genLinearLocalID(Instruction &Before, const Triple &TT) {
unsigned Align = M.getDataLayout().getPreferredAlignment(G);
G->setAlignment(Align);
}
Value *Res = new LoadInst(G, "", &Before);
Value *Res = new LoadInst(G, "", true, &Before);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong way to fix the problem.
Please, add "convergent" attribute to the barrier function at line 944.
NOTE: it's not the only function missing this attribute.

Please, add a unit test for this pass to clang LIT test suite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naghasan, do we need add this attribute to _Z22__spirv_ControlBarrierN5__spv5ScopeES0_j implementation in libclc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably safer to do so, regardless of how llvm handles attribute merging.

Copy link
Contributor Author

@againull againull Mar 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, I am going to try add attribute but I am not sure that this will fix the problem.
Could you please take a look at the example I provided in the issue: #1258

Declaration contains convergent attribute:

; Function Attrs: convergent nounwind
declare void @llvm.nvvm.barrier0() #3

Nevertheless illegal transformation happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  tail call void @llvm.nvvm.barrier0() #4
...
attributes #3 = { convergent nounwind }
attributes #4 = { nounwind }

Call must be convergent too i.e. #3.
Note these might require it too:

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #2

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #2

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.z() #2

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.y() #2

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ntid.z() #2

; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost 100% sure that applying convergent will fix the problem.
I'm 146% sure that using volatile for load/store instructions is not the right approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Alexey offline, unfortunately convergent attribute doesn't help. As I described the problem, it happens because of globals AA, convergent attribute doesn't affect alias analysis. It looks like problem is how global variables are created in LowerWGScope pass. They are created with internal linkage. As a result alias analysis thinks that this global doesn't escape and gives wrong answer that barrier references but doesn't modify memory location of the global. It looks like globals variables produced by LowerWGScope should have external linkage. I am working on the new patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created separate PR with new patch #1257. For some reasons, I am not able to add Victor as a reviewer. This PR is going to be closed.

return Res;
}
}
Expand Down
3 changes: 0 additions & 3 deletions sycl/test/hier_par/hier_par_basic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,6 @@
// RUN: %GPU_RUN_PLACEHOLDER %t.out
// RUN: %ACC_RUN_PLACEHOLDER %t.out

// TODO: ptxas fatal : Unresolved extern function '__spirv_ControlBarrier'
// XFAIL: cuda

// This test checks hierarchical parallelism invocation APIs, but without any
// data or code with side-effects between the work group and work item scopes.

Expand Down
5 changes: 1 addition & 4 deletions sycl/test/hier_par/hier_par_wgscope.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,12 @@
// RUN: %GPU_RUN_PLACEHOLDER %t.out
// RUN: %ACC_RUN_PLACEHOLDER %t.out

// RUN: %clangxx -O0 -fsycl %s -o %t.out
// RUN: %clangxx -O0 -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUN: env SYCL_DEVICE_TYPE=HOST %t.out
// RUN: %CPU_RUN_PLACEHOLDER %t.out
// RUN: %GPU_RUN_PLACEHOLDER %t.out
// RUN: %ACC_RUN_PLACEHOLDER %t.out

// TODO: ptxas fatal : Unresolved extern function '__spirv_ControlBarrier'
// UNSUPPORTED: cuda

// This test checks correctness of hierarchical kernel execution when there is
// code and data in the work group scope.

Expand Down