-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@def_
Caveat: No KernelDef in IR Unless Function Is Imported Locally
#387
Comments
@def_
caveat: local import is required or we get no KernelDef node@def_
Caveat: No KernelDef in IR Unless Function Is Imported Locally
Following-up on this issue -- is there a proper fix planned? The local import/declaration work-around seem to have additional restrictions. For example:
As is, the code works. But move the def_ inside the do function (i.e., as local as it can get) and it generates a segfault. So it looks like @def_ functions can't be at the top-level nor can it be at some lower/local-level (speculating that this is because it is getting defined within a compute api context). This makes it difficult to define @def_'ed modular building-blocks . |
Our current support for @hcl.def is very preliminary. I plan to solve the top-level declaration issue first. For the inner- or lower-level declaration, you suspect correctly. Our original design thought is more close to c++, which also doesn't allow low-level function declaration (unless you use a lambda function). I will think about adding the low-level support later. |
Function Outlining API ProposalAs this issue points out, using Python decorator to specify function outlining could lead to scope issue: when submodules defined in a different Python are imported at the global level, the generated outlined function ( Therefore, we propose a new API called
The compiler uses the input stages to extract the subgraph to be outlined. When there’s one stage input, only the input stage is outlined as a function. For the outlined function, the compiler also infers its input and output arguments, builds the function body, and inserts a Example: 2MMOutlining single stages2MM performs two matrix multiplications followed by an element-wise addition. It has three stages: We use A = hcl.placeholder((P, Q), "A")
B = hcl.placeholder((Q, R), "B")
C = hcl.placeholder((R, S), "C")
D = hcl.placeholder((P, S), "D")
def kernel_2mm(A, B, C, D):
r = hcl.reduce_axis(0, Q, "r")
out_AB = hcl.compute(
(P, R),
lambda x, y: hcl.sum(A[x, r] * B[r, y], axis=r, dtype=dtype),
name="out_AB",
)
k = hcl.reduce_axis(0, R, "k")
out_ABC = hcl.compute(
(P, S),
lambda x, y: hcl.sum(out_AB[x, k] * C[k, y], axis=k, dtype=dtype),
name="out_ABC",
)
E = hcl.compute(
D.shape,
lambda x, y: (out_ABC[x, y] + D[x, y]),
dtype=dtype,
name="E",
)
return E
s = hcl.create_schedule([A, B, C, D], kernel_2mm)
s.outline(kernel_2mm.out_AB)
s.outline(kernel_2mm.out_ABC) The IR before outlining: module {
func @top(%arg0: memref<16x22xf32>, %arg1: memref<22x18xf32>, %arg2: memref<18x24xf32>, %arg3: memref<16x24xf32>) -> memref<16x24xf32> {
// Stage out_AB
%0 = memref.alloc() : memref<16x18xf32>
affine.for %arg4 = 0 to 16 {
affine.for %arg5 = 0 to 18 {
affine.for %arg6 = 0 to 22 {
...
} {loop_name = "r"}
} {loop_name = "y"}
} {loop_name = "x", stage_name = "out_AB"}
// Stage out_ABC
%1 = memref.alloc() : memref<16x24xf32>
affine.for %arg4 = 0 to 16 {
affine.for %arg5 = 0 to 24 {
affine.for %arg6 = 0 to 18 {
...
} {loop_name = "k"}
} {loop_name = "y"}
} {loop_name = "x", stage_name = "out_ABC"}
// Stage E
%2 = memref.alloc() : memref<16x24xf32>
affine.for %arg4 = 0 to 16 {
affine.for %arg5 = 0 to 24 {
...
} {loop_name = "y"}
} {loop_name = "x", stage_name = "E"}
return %2 : memref<16x24xf32>
}
} The IR after outlining: module {
func @Stage_out_AB(%arg0: memref<16x22xf32>, %arg1: memref<22x18xf32>, %arg3: memref<16x18xf32>) ->() {
...
}
func @Stage_out_ABC(%arg0: memref<16x18xf32>, %arg1: memref<18x24xf32>, %arg3: memref<16x24xf32>) ->() {
...
}
func @top(%arg0: memref<16x22xf32>, %arg1: memref<22x18xf32>, %arg2: memref<18x24xf32>, %arg3: memref<16x24xf32>) -> memref<16x24xf32> {
// Stage out_AB
%0 = memref.alloc() : memref<16x18xf32>
call @Stage_out_AB(%arg0, %arg1, %0)
// Stage out_ABC
%1 = memref.alloc() : memref<16x24xf32>
call @Stage_out_AB(%0, %arg2, %1)
// Stage E
%2 = memref.alloc() : memref<16x24xf32>
affine.for %arg4 = 0 to 16 {
affine.for %arg5 = 0 to 24 {
...
} {loop_name = "y"}
} {loop_name = "x", stage_name = "E"}
return %2 : memref<16x24xf32>
}
} Outlining multiple stages as a single functionWe use s = hcl.create_schedule([A, B, C, D], kernel_2mm)
s.outline(kernel_2mm.out_AB, kernel_2mm.out_ABC) The IR after outlining: module {
func @Stage_outAB_outABC(%arg0: memref<16x22xf32>, %arg1: memref<22x18xf32>, %arg3: memref<16x18xf32>, %arg4: memref<18x24xf32>, %arg5: memref<16x24xf32>) ->() {
...
}
func @top(%arg0: memref<16x22xf32>, %arg1: memref<22x18xf32>, %arg2: memref<18x24xf32>, %arg3: memref<16x24xf32>) -> memref<16x24xf32> {
// Stage out_AB and out_ABC
%0 = memref.alloc() : memref<16x18xf32>
%1 = memref.alloc() : memref<16x24xf32>
call @Stage_outAB_outABC(%arg0, %arg1, %0, %arg2, %1)
// Stage E
%2 = memref.alloc() : memref<16x24xf32>
affine.for %arg4 = 0 to 16 {
affine.for %arg5 = 0 to 24 {
...
} {loop_name = "y"}
} {loop_name = "x", stage_name = "E"}
return %2 : memref<16x24xf32>
}
} |
Description
For function outlining with
@def_
decorator, a local import is required to correctly generate backend code. For HLS backends, the HeteroCL will still run, but we don't get outlined function in generated HLS code. For LLVM backend, the HeteroCL code will fail to run, and throw a segmentation fault, which is difficult to debug.Minimum Example
main.py
submodule.py
Running
python main.py
will get a segmentation fault for LLVM backend.If we comment out the global import and release the local import, we get correct result.
Cause
When import is done in global scope, the submodule function definition is not run during
hcl.create_schedule
, so we don't haveKernelDef
stmt in the IR, only theCall
stmt.Proposal: we can add an IR pass to check if all
Call
stmts have correspondingKernelDef
to detect this problem.The text was updated successfully, but these errors were encountered: