-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI]fix scatterND large shape problem #12200
Conversation
Do you know why cc @tkonolige |
Looks like this is an issue in the TVM thread pool. I ran with TVM compiled with address sanitizer and I get the following error. @tqchen I think you have the most experience here. @yidawang wrote the code but I'm not sure if they are still active in the community.
|
Would be good to know why. To see if it is an issue with particular thread pool, we can try to switch to openmp https://github.com/apache/tvm/blob/main/cmake/config.cmake#L178 |
It also errors with the openmp thread pool:
|
That might hint something is wrong higher up? |
Following the hint of sanitizer that something related to stack overflow happens, there is a possible llvm issue found. The full dumped ll: scatter_nd_stackoverflow.ll.txt ; Function Attrs: noinline
define internal fastcc i32 @tvmgen_default_fused_scatter_nd_compute_(i8* noalias align 128 %0, i8* noalias nocapture readonly align 128 %1, i8* noalias align 128 %2, i8* noalias align 128 %3) unnamed_addr #2 {
entry:
call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 128 dereferenceable(1080000) %0, i8* noundef nonnull align 128 dereferenceable(1080000) %1, i64 1080000, i1 false)
br label %for_body_j
for_body_j: ; preds = %for_body_j, %entry
%j1 = phi i32 [ 0, %entry ], [ %13, %for_body_j ]
%4 = alloca %closure_loop_parallel_k, align 8
%5 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 0
store i8* %0, i8** %5, align 8
%6 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 1
store i8* %2, i8** %6, align 8
%7 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 2
store i32 %j1, i32* %7, align 8
%8 = getelementptr inbounds %closure_loop_parallel_k, %closure_loop_parallel_k* %4, i64 0, i32 3
store i8* %3, i8** %8, align 8
%9 = load i32 (i32 (i32, %0*, i8*)*, i8*, i32)*, i32 (i32 (i32, %0*, i8*)*, i8*, i32)** @__TVMBackendParallelLaunch, align 8, !tbaa !5
%10 = bitcast %closure_loop_parallel_k* %4 to i8*
%11 = call i32 %9(i32 (i32, %0*, i8*)* nonnull @__tvm_parallel_lambda, i8* nonnull %10, i32 0)
%12 = icmp ne i32 %11, 0
%13 = add nuw nsw i32 %j1, 1
%exitcond.not = icmp eq i32 %13, 270000
%or.cond = select i1 %12, i1 true, i1 %exitcond.not
br i1 %or.cond, label %common.ret, label %for_body_j, !prof !142
common.ret: ; preds = %for_body_j
ret i32 %11
} The alloca A quick experimental modification to https://github.com/apache/tvm/blob/main/src/target/llvm/codegen_cpu.cc#L645 make segfault disappear in my environment. auto cur_pt = builder_->GetInsertBlock();
builder_->SetInsertPoint(&(*(function_->getEntryBlock().getFirstInsertionPt())));
llvm::Value* cvalue = builder_->CreateAlloca(ctype, ConstInt32(1)); // alloca at function begin
builder_->SetInsertPoint(cur_pt ); So I think we could either
|
@wrongtest-intellif This sounds like a bug in StorageRewrite and/or PlanAndUpdateBufferAllocationLocation where the allocation should be lifted outside the loop but isn't. |
Emmmm I think it is not. https://github.com/apache/tvm/blob/main/src/target/llvm/codegen_cpu.cc#L645 is an dedicated usage of llvm alloca for thread pool, not from tvm level allocation. |
cc @tkonolige @kparzysz-quic Hi~ could you kindly review the fix again? |
src/target/llvm/codegen_cpu.cc
Outdated
@@ -642,7 +642,16 @@ CodeGenLLVM::TypedPointer CodeGenCPU::PackClosureData(const Array<Var>& vfields, | |||
} | |||
llvm::StructType* ctype = struct_name.size() ? llvm::StructType::create(fields, struct_name) | |||
: llvm::StructType::create(fields); | |||
llvm::Value* cvalue = builder_->CreateAlloca(ctype, ConstInt32(1)); | |||
// create ctype alloca at function entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an explanation of why we allocate at the function entry.
@kparzysz-quic Looks like CI has a error. Can you help check it ? thanks. |
@tvm-bot rerun |
1 similar comment
@tvm-bot rerun |
@driazati Could you take a look at what happened here with the bot? We couldn't get it to rerun the CI. |
@masahi Could you review the fix again? thanks. |
* fix scatterND large shape problem * fix thread pool alloca * add scatternd unit test * update with comment * Empty Co-authored-by: wrongtest <wrongtest0@gmail.com>
test case:
when i run the case, I will get a Segmentation fault.
Then I remove the
kind
param, it can run success.I don't know whether the
kind
param use 'unroll' is better.