Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][CodeGenC]“tvm_call_packed” crashes when build with target “c” #7596

Closed
zhuwenxi opened this issue Mar 5, 2021 · 22 comments
Closed

Comments

@zhuwenxi
Copy link
Contributor

zhuwenxi commented Mar 5, 2021

Forum discussion thread attached here: https://discuss.tvm.apache.org/t/bug-codegenc-tvm-call-packed-crashes-when-build-with-target-c/9291

@tqchen Could you bridge me someone who's familiar with C backend to look into this? I highly doubt this is a known issue.

@leeexyz
Copy link
Contributor

leeexyz commented Mar 7, 2021

Hi @zhuwenxi , that's a bug. Actually, this is a case from tutorial extern op, link . After investigating, I have found this issue was introduced by #5417. See diff.

You can comment out this ICHECK as a workaround.

CHECK(GetUniqueName(packed_func_name) == packed_func_name) <<
        "Expected name " << packed_func_name << " to not be taken";

@tqchen
Copy link
Member

tqchen commented Mar 7, 2021

cc @areusch

@zhuwenxi
Copy link
Contributor Author

zhuwenxi commented Mar 9, 2021

@leeexyz It works when I comment out the ICHECK line, but only for "tvm.build()". It still crashes when trying to run the module:
image

Below is my code to reproduce the crash:

import tvm
from tvm import te

n = 1024
l = 128
m = 235
bias = te.var("bias", dtype="float32")`
A = te.placeholder((n, l), name="A")`
B = te.placeholder((l, m), name="B")`
C = te.extern(`
    (n, m),
    [A, B],
    lambda ins, outs: tvm.tir.call_packed(`
        "tvm.contrib.cblas.matmul", ins[0], ins[1], outs[0], False, False`
    ),`
    name="C",`
)
D = te.compute(C.shape, lambda i, j: C[i, j] + bias, name="D")`
s = te.create_schedule(D.op)`

ctx = tvm.cpu(0)`
f = tvm.build(s, [A, B, D, bias], "c")
a = tvm.nd.array(np.random.uniform(size=(n, l)).astype(A.dtype), ctx)
b = tvm.nd.array(np.random.uniform(size=(l, m)).astype(B.dtype), ctx)
d = tvm.nd.array(np.zeros((n, m), dtype=D.dtype), ctx)
bb = 10.0
f(a, b, d, bb)

@leeexyz
Copy link
Contributor

leeexyz commented Mar 10, 2021

@leeexyz It works when I comment out the ICHECK line, but only for "tvm.build()". It still crashes when trying to run the module:
image

Below is my code to reproduce the crash:

import tvm
from tvm import te

n = 1024
l = 128
m = 235
bias = te.var("bias", dtype="float32") A = te.placeholder((n, l), name="A")
B = te.placeholder((l, m), name="B") C = te.extern(
(n, m),
[A, B],
lambda ins, outs: tvm.tir.call_packed( "tvm.contrib.cblas.matmul", ins[0], ins[1], outs[0], False, False
), name="C",
)
D = te.compute(C.shape, lambda i, j: C[i, j] + bias, name="D") s = te.create_schedule(D.op)

ctx = tvm.cpu(0)`
f = tvm.build(s, [A, B, D, bias], "c")
a = tvm.nd.array(np.random.uniform(size=(n, l)).astype(A.dtype), ctx)
b = tvm.nd.array(np.random.uniform(size=(l, m)).astype(B.dtype), ctx)
d = tvm.nd.array(np.zeros((n, m), dtype=D.dtype), ctx)
bb = 10.0
f(a, b, d, bb)

It caused by CSourceModuleNode does not support to get function runtime::symbol::tvm_module_main. You can compare with the LLVMModuleNode and add it.

src/target/source/source_module.cc:GetFunction
src/target/llvm/llvm_module.cc:GetFunction

@zhuwenxi
Copy link
Contributor Author

@leeexyz , I looked into the test cases for tvmc (https://github.com/apache/tvm/blob/main/tests/python/unittest/test_target_codegen_c_host.py#L36) and found that all the cases export their module as library and then load the module, before actually run the module, like this:

mhost = tvm.build(s, [A, B, C], "c", name="test_fadd")
**mhost.export_library(path_dso)
m = tvm.runtime.load_module(path_dso)**
fadd = m["test_fadd"]
fadd(a, b, c)

My question is, is this a desired behavior, rather than a bug that tvmc does not support to run a module directly?

# Following code pattern is not desired in tvmc?
f = tvm.build(s, [A, B, D, bias], "c")
f(a, b, d, bb)

@leeexyz
Copy link
Contributor

leeexyz commented Mar 10, 2021

@zhuwenxi I think it does not support it, namely, you cannot run CSourceModule directly. It seems CSourceModule only is used for demonstration purposes.

@zhuwenxi
Copy link
Contributor Author

Hi @zhuwenxi , that's a bug. Actually, this is a case from tutorial extern op, link . After investigating, I have found this issue was introduced by #5417. See diff.

You can comment out this ICHECK as a workaround.

CHECK(GetUniqueName(packed_func_name) == packed_func_name) <<
        "Expected name " << packed_func_name << " to not be taken";

@leeexyz I found that 1) comment the ICHECK line 2) use export_library() and load_module() didn't solve all the problem. Still, the program is not able to compile:
image

I think the root cause is "tvm.contrib.cblas.matmul" is not a qualified c variable name. Below is the c source file generated by tvmc backend:
WeChatWorkScreenshot_c4c8f1de-7a24-4c90-b39c-3e8a7f58088a

We probably need to do some major change in tvmc to enable packed func call.

@leeexyz
Copy link
Contributor

leeexyz commented Mar 18, 2021

@zhuwenxi Did you build tvm with cblas support?

@zhuwenxi
Copy link
Contributor Author

@zhuwenxi Did you build tvm with cblas support?

Yes, definitely.

@zhuwenxi
Copy link
Contributor Author

@leeexyz , could you please check my screenshot posted above? This is definitely an issue that a variable named "tvm.contrib.cblas.matmul_packed" generated in the c source file, which is not a qualified c variable name (not allowed to have a "." symbol).

@leeexyz
Copy link
Contributor

leeexyz commented Mar 24, 2021

@zhuwenxi Let me try it :)

@leeexyz
Copy link
Contributor

leeexyz commented Mar 26, 2021

@leeexyz , could you please check my screenshot posted above? This is definitely an issue that a variable named "tvm.contrib.cblas.matmul_packed" generated in the c source file, which is not a qualified c variable name (not allowed to have a "." symbol).

Yes, you are right. It was caused by the same PR I mentioned. Try to modify like

    // src/target/source/codegen_c_host.cc:void CodeGenCHost::VisitExpr_(const CallNode* op, std::ostream& os);
    std::string packed_func_name = GetUniqueName(func_name + "_packed");
    if (declared_globals_.insert(packed_func_name).second) {
      // Still reserve the name among unique names.
      // ICHECK(GetUniqueName(packed_func_name) == packed_func_name)
      //     << "Expected name " << packed_func_name << " to not be taken";
      decl_stream << "static void* " << packed_func_name << " = NULL;\n";
    }

and add

    // src/target/source/source_module.cc:GetFunction
    } else if (name == runtime::symbol::tvm_module_main) {
      return PackedFunc(
          [sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { *rv = this->func_names_[0]; });

@zhuwenxi
Copy link
Contributor Author

zhuwenxi commented Mar 29, 2021

@leeexyz , Your fix does make some sense to me. But unfortunately it still crashes my test case:
image

This is the c source file generated:
image

Have you consider to throw a PR to fix all these bunch of problems? All existing approaches we discussed to fix this problem seem incomplete, we probably need some large changes, from an overall perspectives, to make sure it works for all cases.

@leeexyz
Copy link
Contributor

leeexyz commented Mar 30, 2021

Hi @zhuwenxi, I'm not sure why introduced the Map declared_globals_, it is better to make it more clear before sending a PR. Can you please give me the tests? I'd like to reproduce these issues :)

@zhuwenxi
Copy link
Contributor Author

Thanks @leeexyz , this is the code to reproduce the issue.

import tvm
from tvm import te
import numpy as np
#from tvm.contrib import utils

n = 1024
l = 128
m = 235
bias = te.var("bias", dtype="float64")
A = te.placeholder((n, l), name="A", dtype="float64")
B = te.placeholder((l, m), name="B", dtype="float64")
C = te.extern(
    (n, m),
    [A, B],
    lambda ins, outs: tvm.tir.call_packed(
        "tvm.contrib.cblas.matmul", ins[0], ins[1], outs[0], False, False
    ),
    name="C",
)
D = te.compute(C.shape, lambda i, j: C[i, j] + bias, name="D")
s = te.create_schedule(D.op)

ctx = tvm.cpu(0)
f = tvm.build(s, [A, B, D, bias], "c", name="test_add")
f.save("gen_c_code.c", "c")
path_dso = "test_add.so"
f.export_library(path_dso)
m = tvm.runtime.load_module(path_dso)
f = m["test_add"]
ctx = tvm.cpu(0)
a = tvm.nd.array(np.random.uniform(size=(n, l)).astype(A.dtype), ctx)
b = tvm.nd.array(np.random.uniform(size=(l, m)).astype(B.dtype), ctx)
d = tvm.nd.array(np.zeros((n, m), dtype=D.dtype), ctx)
bb = 10.0
f(a, b, d, bb)

@zhuwenxi
Copy link
Contributor Author

Ping @leeexyz, any updates?

@leeexyz
Copy link
Contributor

leeexyz commented Apr 15, 2021

Ping @leeexyz, any updates?

Sorry for the late, I've been busy lately :(. I will try it this week.

@leeexyz
Copy link
Contributor

leeexyz commented Apr 15, 2021

@zhuwenxi I modified the C source manually and it can be compiled. I will start the patch soon.

@leeexyz
Copy link
Contributor

leeexyz commented Apr 15, 2021

@zhuwenxi Would you please try this quick patch? leeexyz@786f8cd

@zhuwenxi
Copy link
Contributor Author

@leeexyz Sure, let me have a try.

@zhuwenxi
Copy link
Contributor Author

@leeexyz Cool, it works!

@tqchen
Copy link
Member

tqchen commented Apr 26, 2021

Closed by #7911. Thanks @zhuwenxi @leeexyz !

@tqchen tqchen closed this as completed Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants