Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][VTA][OpenCL] If allowed to allocate in stack, VTA multiple target test will fail #8978

Closed
zhanghaohit opened this issue Sep 10, 2021 · 4 comments

Comments

@zhanghaohit
Copy link
Contributor

zhanghaohit commented Sep 10, 2021

In src/tir/transforms/lower_tvm_builtin.cc:

    if (device_type_.defined()) {
      if (const auto* dev_type = device_type_.as<IntImmNode>()) {
        if (dev_type->value == kDLCPU) {
          int32_t constant_size = op->constant_allocation_size();
          if (constant_size > 0 && constant_size * nbytes < runtime::kMaxStackAlloca) {
            return stmt;
          }
        }
      }
    }

If the above optimization code is here, it will raise LLVM function signature errors when there are multiple targets. This piece of code was removed in #6126 to fix this issue. But it was reverted back in #8274, as removal of this code will cause some VM test failed.

To quick fix, the multiple target unittests are disabled in the current main branch. But we need further investigation on how to resolve this nicely.

cc @echuraev @elvin-n

@zhanghaohit zhanghaohit changed the title [Bug][VTA][OpenCL] [Bug][VTA][OpenCL] If allowed to allocate in stack, VTA multiple target test will fail Sep 10, 2021
@manupak
Copy link
Contributor

manupak commented Sep 15, 2021

Thanks for flagging this! and following up from #8274

IIUC,

This is problematic for micro because now the constant sized allocates are forced to be placed on stack (bypassing TVMPlatformAllocate abstraction) because that is how the codegen_c lowers the allocate.

void CodeGenC::VisitStmt_(const AllocateNode* op) {
ICHECK(!is_zero(op->condition));
std::string vid = AllocVarID(op->buffer_var.get());
this->PrintIndent();
int32_t constant_size = op->constant_allocation_size();
ICHECK_GT(constant_size, 0) << "Can only handle constant size stack allocation for now";
auto scope = GetPtrStorageScope(op->buffer_var);
alloc_storage_scope_[op->buffer_var.get()] = scope;
PrintStorageScope(scope, stream);
PrintType(op->dtype, stream);
stream << ' ' << vid << '[' << constant_size << "];\n";
RegisterHandleType(op->buffer_var.get(), op->dtype);
this->PrintStmt(op->body);
}

I dont think is desired.

Can someone explain why the guarded TVMBackendAllocWorkspace calls to allocate memory is problematic ? -- which are the ones I see generated in TIR prior to the last revert : #8274 ?

cc: @jroesch @mbrookhart

@manupak
Copy link
Contributor

manupak commented Sep 15, 2021

Also for some reason, if we really need the tir.allocate to be translated down to as a stack placement, we should probably make the tir.allocate with storage_scope = "local" to PrimFunc that is placed on CPU.

keep the global ones served via TVMBAWs, because TVMBAW could serve as the 'global' allocator for memory.
In this way, in micro we could still use TVMBAWs to serve memory from the application/platform layer for 'global' allocates.

@mbrookhart
Copy link
Contributor

FYI - this PR renables the test that this change breaks #9019

@mbrookhart
Copy link
Contributor

@mbs-octoml opened this issue to track an approach to better distinguish multiple targets #9022

@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@Lunderberg Lunderberg added backend:opencl vta and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Nov 16, 2022
@tqchen tqchen closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants