Skip to content

Commit

Permalink
[UMA] UMA v1.0 (apache#12087)
Browse files Browse the repository at this point in the history
* Add minimal working structure for generic interface

* Separate target definition from codegen

* Update file structure to support multiple NPU targets

* Add scheduling and pass support to codegen

* Update schedule function and pass registration

* Add generic partitioner for relay graph partitioning

* Add pattern-based relay graph partitioning and AOT codegen

* Update API

* Add UltraTrail relay passes and schedule function

* Update UltraTrail relay passes

* Add tir_to_runtime hook for UltraTrail

* Add operator strategy registration to lowering

* Add option to pass constants as attributes

* Refactor naming: Generic to UMA

* Change API to single user-facing backend class UMABackend

* Add initial codegen API

* [UMA] add a generic packed function to register targets

* Restructure files and add initial codegen

* Minor code cleanup

* Add UMA config and MergeCompilerRegion example

* Move UMA configuration to init parameters

* Add python hooks for C-codegen. Still has known restrictons

* Fix relay_to_tir hook to keep virtual device in main function

* Remove register schedules, scheduling is moved to passes for now

* Remove extract constants since non-scalar constants are now supported by TVM

* API documentation and some code fixes and cleanup

* Fix typo

* Fix UMA lowering

* Prototype for UMA-based target attribute registration

* Add default option and type deduction to register_target_attr

* Change pass phases to enum

* [Relay] Plumb external codegen target via Target.current() for all external codegen paths

(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).

We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass
attribute on target kind) external codegen to be able to access the current 'external codegen'
Target instance via Target.current().

 - For old-style, plumb the true Target through TEComplier and push it on the context
   stack before calling relay.ext.$toolchain.

 - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from
   "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind
   method, and push on the stack before invoking the custom "RelayToTIR" pass.

While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation
flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize
it to support both inline and global function styles.

Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as
external.

Update target hooks unit test to exercise new support for outline-style, picking up the current target,
and compiling via the VM.

* Use current target in lowering

* Use attr:kRelayToTIR

* Remove erronousely commited quick fix

* Towards test cases for uma

* Add test_uma

* Initial UMA structure for version 1

* [UMA]: conv2d unit test

* [UMA] update of tutorial

* [UMA] update of pass format, still issue with conv2d c code

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] refactoring of test_uma_lowering_with_umalower.py

* [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw

* [UMA] update towards my_ai_hw usecase

* [UMA] working testcase for conv2d with uma

* [UMA] testcase

* [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main"

* UMA: Move torch import to top to avoid free(): invalid pointer error

* Add stub files for targets

* Add tests for ultratrail codegen

* Adopt my_ai_hw accelerator for new target definition

* Add unit test for target attributes

* Test string arguments

* Extend target test

* [UMA] tutorial first versin

* [UMA] moved unit tests to contrib

* [UMA] renaming interfaces

* Fix umalower_tests in ci

* make uma a python module

* [UMA] Update of UMAv1 API + added testcases + tutorialV1

* [UMA] UMAv1

* [UMA] cmake file updated

* AOT test infrastructure adapted

* UMA: add __init__.py for uma.api

* Finish uma tests

* Use upstream version of dmlc-core

* [UMA] tir_to_runtime documentation update

* [UMA] cleanup

* [UMA] fix for test_partition

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] lint fix

* [UMA] fix of build scripts for arm and i386

* Fix remaining linter errors

* [UMA] CMakeLists.txt added UMA tvm_option

* [UMA] added UMA tvm_option

* [UMA] guard against multiple registrations

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] fixed comments as pointed out in PR 12087

* [UMA] skip uma tests if uma is not available

* [UMA] added UMA rst

* [UMA] Moved tutorial to RST file in gallery

* [UMA] moved uma cli to apps

* [UMA] change requests according to PR-12087

* [UMA] update and sync of uma_cli and tutorial

* [UMA] update of template passe: remove Pad block of Conv2D

* [UMA] lint updates

* [UMA] Test updates

* [UMA] fixes according to comments from PR 12087 discussion

* [UMA] lint updates

* [UMA] moved UMA _template file to apps

* [UMA] lint

* [UMA] Remove exceptions when dispatching over targets

* [UMA] vanilla pattern update

* [UMA] added mobilenet integration test

* [UMA] clang lint

* Remove tir to runtime

* [UMA] Use sequential for UMA relay passes

* Use comparison against BYOC flow in test_partition

* [UMA] tutorial update: moved code blocks to RST

* [UMA] tutorial update and lint fixes

* [UMA]  removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386

* [BYOC-DNNL] covered case for sum node without attr

* [UMA] pylint

* [UMA] pylint

* [UMA] aot fix

* [UMA] Changes PR review

* [UMA] cc lint

* [UMA] cc lint

* Use better function name for te_lowering and annotate current target at TE functions

Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com>
Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de>
Co-authored-by: mbs-octoml <mbs@octoml.ai>
Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
  • Loading branch information
5 people authored and Mikael Sevenier committed Aug 12, 2022
1 parent eb3e09f commit d7f6e7f
Show file tree
Hide file tree
Showing 39 changed files with 2,605 additions and 5 deletions.
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ tvm_option(USE_VITIS_AI "Build with VITIS-AI Codegen support" OFF)
tvm_option(SUMMARIZE "Print CMake option summary after configuring" OFF)
tvm_option(USE_CLML "Build with CLML Codegen support" OFF)
tvm_option(USE_CLML_GRAPH_EXECUTOR "Build with CLML graph runtime" OFF)
tvm_option(USE_UMA "Build with UMA support" OFF)

# include directories
include_directories(${CMAKE_INCLUDE_PATH})
Expand Down Expand Up @@ -497,6 +498,7 @@ include(cmake/modules/contrib/TensorRT.cmake)
include(cmake/modules/contrib/VitisAI.cmake)
include(cmake/modules/contrib/Verilator.cmake)
include(cmake/modules/contrib/CLML.cmake)
include(cmake/modules/contrib/UMA.cmake)
include(cmake/modules/Git.cmake)
include(cmake/modules/LibInfo.cmake)
include(cmake/modules/RustExt.cmake)
Expand Down
22 changes: 22 additions & 0 deletions apps/uma/_template/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""
Template files for UMA tutorial
"""
45 changes: 45 additions & 0 deletions apps/uma/_template/backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""UMA backend for the my_ai_hw accelerator"""
from passes import MyAiHwConv2dPass
from tvm.relay.backend.contrib.uma.api.utils import PassPhase
from tvm.relay.backend.contrib.uma.backend import UMABackend
from codegen import gen_includes
from patterns import conv2d_pattern


class MyAiHwBackend(UMABackend):
"""UMA backend for the MyAiHw accelerator."""

def __init__(self):
super().__init__()

# Target configuration
self._register_target_attr("dimension")

# Relay Pattern registration
self._register_pattern("conv2d", conv2d_pattern())

# Relay to TIR function registration
self._register_tir_pass(PassPhase.TIR_PHASE_0, MyAiHwConv2dPass())

# TIR to runtime function registration
self._register_codegen(fmt="c", includes=gen_includes)

@property
def target_name(self):
return "my_ai_hw"
28 changes: 28 additions & 0 deletions apps/uma/_template/codegen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""UMA codegen for the my_ai_hw accelerator"""

import tvm
import pathlib


def gen_includes() -> str:
topdir = pathlib.Path(__file__).parent.absolute()

includes = ""
includes += f'#include "{topdir}/conv2dnchw.cc"'
return includes
96 changes: 96 additions & 0 deletions apps/uma/_template/conv2dnchw.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/*
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
*/
#include <stdlib.h>

// TODO(mjklaiber): leverage pragma import_c in the future
#ifdef __cplusplus
extern "C"
#endif

/*!
* \brief Conv2D function for mock-accelerator examples. Limited to same-padded Conv2D with
* stride (1,1) and datatype float. \param ifmap Pointer to input feature map data of size
* iw*ih*ic*sizeof(float). \param weights Pointer to weight data of size
* kh*kw*ic**oc*sizeof(float). \param result Pointer to output feature map data of size
* iw*ih*oc*sizeof(float). \param oc Number of channels of output feature map. \param iw Width
* of input feature map, ifmap. \param ih Height of input feature map, ifmap. \param ic Number
* of channels of input feature map. \param kh Height of convolution kernels. \param kw Width of
* convolution kernels.
*
* \return error code
*
*/
int
my_ai_hw_conv2dnchw(float* ifmap, float* weights, float* result, int oc, int iw, int ih, int ic,
int kh, int kw) {

int kw_low = kw / 2;
int kh_low = kh / 2;
int kw_high = iw + kw / 2;
int kh_high = ih + kh / 2;

int padded_iw = iw + 2 * kw_low;
int padded_ih = ih + 2 * kh_low;

// This is only example code. A real hardware accelerator would call a device specific malloc
// function.
float* pad_temp = (float*)malloc(
(((ic * padded_iw * padded_ih) + (padded_ih * padded_iw)) + padded_iw) * sizeof(float));

if (pad_temp == NULL) {
return -1;
}

for (int i1 = 0; i1 < ic; ++i1) {
for (int i2 = 0; i2 < padded_ih; ++i2) {
for (int i3 = 0; i3 < padded_iw; ++i3) {
((float*)pad_temp)[(((i1 * padded_iw * padded_ih) + (i2 * padded_iw)) + i3)] =
(((((kh_low <= i2) && (i2 < kh_high)) && (kw_low <= i3)) && (i3 < kw_high))
? ifmap[((((i1 * iw * ih) + ((i2 - kh_low) * iw)) + i3 - kw_low))]
: 0.000000e+00f);
}
}
}
for (int i11 = 0; i11 < oc; ++i11) {
for (int i21 = 0; i21 < ih; ++i21) {
for (int i31 = 0; i31 < iw; ++i31) {
for (int i4 = 0; i4 < ic; ++i4) {
for (int i5 = 0; i5 < kh; ++i5) {
for (int i6 = 0; i6 < kw; ++i6) {
int cse_var_1 = (((i11 * iw * ih) + (i21 * iw)) + i31);
if (((i4 == 0) && (i5 == 0)) && (i6 == 0)) {
result[cse_var_1] = 0.000000e+00f;
}
result[cse_var_1] =
(result[cse_var_1] +
(((float*)
pad_temp)[i4 * padded_iw * padded_ih + (i21 + i5) * padded_iw + i31 + i6] *
weights[((((i11 * ic * kh * kw) + (i4 * kh * kw)) + (i5 * kw)) + i6)]));
}
}
}
}
}
}

// This is only example code. A real hardware accelerator would call a device specific free
// function.
free(pad_temp);
return 0;
}
136 changes: 136 additions & 0 deletions apps/uma/_template/passes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Transform passes for the my_ai_hw accelerator"""

import tvm
from tvm import tir
from tvm.relay.backend.contrib.uma.api.utils import add_llvm_to_block


@tvm.tir.transform.prim_func_pass(opt_level=2)
class MyAiHwConv2dPass:
_EXTERNAL_FUNCTION_NAME = "my_ai_hw_conv2dnchw"
_TVM_BLOCK_MATCH_NAME = "conv2d_nchw"

def transform_function(
self, func: tvm.tir.PrimFunc, mod: tvm.ir.IRModule, ctx: tvm.ir.transform.PassContext
) -> tvm.tir.PrimFunc:
return self._my_ai_hw_conv2d_pass(func, mod, ctx)

@classmethod
def _my_ai_hw_conv2d_pass(cls, func, mod, ctx):
_loops = dict()
_handles = []
_entry_node = None

def _has_block(name: str, func: tvm.tir.PrimFunc) -> bool:
"""
Determine of a tir.block with `name` exists in `func`
"""

def _hb(op):
if isinstance(op, tvm.tir.Block):
_found_blocks.append(op.name_hint)

_found_blocks = []
tvm.tir.stmt_functor.post_order_visit(func.body, _hb)
return name in _found_blocks

def _detect_and_replace_conv2d(
func: tvm.tir.PrimFunc, mod: tvm.ir.IRModule, ctx: tvm.ir.transform.PassContext
) -> tvm.tir.PrimFunc:
def _replace_conv2d(op):
if op == _entry_node:
irb = tvm.tir.ir_builder.create()
# Collection of buffer address
buffers = [b[1].data for b in _handles]
# extraction of loop offsets
for k, v in _loops.items():
assert v.min.value == 0
offset_order = ["co", "w", "h", "ci", "kh", "kw"]
offsets = [_loops[i].extent.value for i in offset_order]
args = buffers + offsets
irb.emit(tir_call(irb, True, cls._EXTERNAL_FUNCTION_NAME, *args))
irb_result = irb.get()
return irb_result
elif isinstance(op, tvm.tir.SeqStmt):
# Remove that pad block of TOPI's conv2DNCHW by only returning the 2nd statement
return op.seq[1]
return op

sch = tir.Schedule(func)

if _has_block(cls._TVM_BLOCK_MATCH_NAME, func):
conv2d_block = sch.get_block(cls._TVM_BLOCK_MATCH_NAME)
rv_loops = sch.get_loops(conv2d_block)
assert len(rv_loops) == 7
loops = dict(
n=rv_loops[0],
co=rv_loops[1],
h=rv_loops[2],
w=rv_loops[3],
ci=rv_loops[4],
kh=rv_loops[5],
kw=rv_loops[6],
)
_entry_node = sch.get(rv_loops[1])
_loops = {k: sch.get(v) for k, v in loops.items()}
_handles = func.buffer_map.items()

x = tvm.tir.stmt_functor.ir_transform(
func.body, None, _replace_conv2d, ["tir.For", "tir.SeqStmt"]
)
return func.with_body(x)
else:
return func

r = _detect_and_replace_conv2d(func, mod, ctx)
return r


def tir_call(ib: tvm.tir.ir_builder, extern: bool, name: str, *args):
"""
ib: ir_builder
extern: bool
True --> tvm.tir.call_extern
False --> tvm.tir.call_packed
name: str
function name
*args:
arguments for function call
"""

def buf_from_array(ib, arr, dtype):
# Allocate enough memory to store the whole array
var = ib.allocate("int32", (len(arr),), scope="global")
for i, v in enumerate(arr):
var[i] = v
# Declare a buffer, which is basically a view on the chunk of memory that we allocated
buf = tvm.tir.decl_buffer((len(arr),), dtype, data=var, scope="global")
return buf

if extern:
args = [i.data if isinstance(i, tvm.tir.Buffer) else i for i in args]
return tvm.tir.call_extern("int32", name, *args)
else:
args = [
buf_from_array(ib, i, "int32")
if isinstance(i, (tuple, list, tvm.ir.container.Array))
else i
for i in args
]
return tvm.tir.call_packed(name, *args)
25 changes: 25 additions & 0 deletions apps/uma/_template/patterns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Relay graph patterns for the my_ai_hw accelerator"""

from tvm.relay.dataflow_pattern import is_op, wildcard


def conv2d_pattern():
pattern = is_op("nn.conv2d")(wildcard(), wildcard())
pattern = pattern.has_attr({"strides": [1, 1], "groups": 1})
return pattern
Loading

0 comments on commit d7f6e7f

Please sign in to comment.