[UMA] UMA v1.0 (apache#12087)

* Add minimal working structure for generic interface * Separate target definition from codegen * Update file structure to support multiple NPU targets * Add scheduling and pass support to codegen * Update schedule function and pass registration * Add generic partitioner for relay graph partitioning * Add pattern-based relay graph partitioning and AOT codegen * Update API * Add UltraTrail relay passes and schedule function * Update UltraTrail relay passes * Add tir_to_runtime hook for UltraTrail * Add operator strategy registration to lowering * Add option to pass constants as attributes * Refactor naming: Generic to UMA * Change API to single user-facing backend class UMABackend * Add initial codegen API * [UMA] add a generic packed function to register targets * Restructure files and add initial codegen * Minor code cleanup * Add UMA config and MergeCompilerRegion example * Move UMA configuration to init parameters * Add python hooks for C-codegen. Still has known restrictons * Fix relay_to_tir hook to keep virtual device in main function * Remove register schedules, scheduling is moved to passes for now * Remove extract constants since non-scalar constants are now supported by TVM * API documentation and some code fixes and cleanup * Fix typo * Fix UMA lowering * Prototype for UMA-based target attribute registration * Add default option and type deduction to register_target_attr * Change pass phases to enum * [Relay] Plumb external codegen target via Target.current() for all external codegen paths (See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md). We want both old-style (via relay.ext.$toolchain) and new-style (via "RelayToTIR" Pass attribute on target kind) external codegen to be able to access the current 'external codegen' Target instance via Target.current(). - For old-style, plumb the true Target through TEComplier and push it on the context stack before calling relay.ext.$toolchain. - For new-style, pass the CompilationConfig to the RelayToTIRTargetHook pass, make the jump from "Compiler" attribute value to Target via the new CompilationConfig::FindPrimitiveTargetForKind method, and push on the stack before invoking the custom "RelayToTIR" pass. While working on this discovered RelayToTIRTargetHook was incompatible with the VM's compilation flow since RelayToTIRTargetHook assumes all "Compiler" attributed functions are inlined. Generalize it to support both inline and global function styles. Extend Target::IsExternalCodegen to recognize target kinds with "RelayToTIR" attributes as external. Update target hooks unit test to exercise new support for outline-style, picking up the current target, and compiling via the VM. * Use current target in lowering * Use attr:kRelayToTIR * Remove erronousely commited quick fix * Towards test cases for uma * Add test_uma * Initial UMA structure for version 1 * [UMA]: conv2d unit test * [UMA] update of tutorial * [UMA] update of pass format, still issue with conv2d c code * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] refactoring of test_uma_lowering_with_umalower.py * [UMA] Adding backend, codegen, patterns, strategies and run file for MyAiHw * [UMA] update towards my_ai_hw usecase * [UMA] working testcase for conv2d with uma * [UMA] testcase * [UMA] uma lower.py: replaced outdated function create_prim_func_from_outputs to be compatible withe latest content of "main" * UMA: Move torch import to top to avoid free(): invalid pointer error * Add stub files for targets * Add tests for ultratrail codegen * Adopt my_ai_hw accelerator for new target definition * Add unit test for target attributes * Test string arguments * Extend target test * [UMA] tutorial first versin * [UMA] moved unit tests to contrib * [UMA] renaming interfaces * Fix umalower_tests in ci * make uma a python module * [UMA] Update of UMAv1 API + added testcases + tutorialV1 * [UMA] UMAv1 * [UMA] cmake file updated * AOT test infrastructure adapted * UMA: add __init__.py for uma.api * Finish uma tests * Use upstream version of dmlc-core * [UMA] tir_to_runtime documentation update * [UMA] cleanup * [UMA] fix for test_partition * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] lint fix * [UMA] fix of build scripts for arm and i386 * Fix remaining linter errors * [UMA] CMakeLists.txt added UMA tvm_option * [UMA] added UMA tvm_option * [UMA] guard against multiple registrations * [UMA] fixed comments as pointed out in PR 12087 * [UMA] fixed comments as pointed out in PR 12087 * [UMA] skip uma tests if uma is not available * [UMA] added UMA rst * [UMA] Moved tutorial to RST file in gallery * [UMA] moved uma cli to apps * [UMA] change requests according to PR-12087 * [UMA] update and sync of uma_cli and tutorial * [UMA] update of template passe: remove Pad block of Conv2D * [UMA] lint updates * [UMA] Test updates * [UMA] fixes according to comments from PR 12087 discussion * [UMA] lint updates * [UMA] moved UMA _template file to apps * [UMA] lint * [UMA] Remove exceptions when dispatching over targets * [UMA] vanilla pattern update * [UMA] added mobilenet integration test * [UMA] clang lint * Remove tir to runtime * [UMA] Use sequential for UMA relay passes * Use comparison against BYOC flow in test_partition * [UMA] tutorial update: moved code blocks to RST * [UMA] tutorial update and lint fixes * [UMA] removing UMA from i386 build, as there is a fail in the CI pipeline due to missing CLANG for i386 * [BYOC-DNNL] covered case for sum node without attr * [UMA] pylint * [UMA] pylint * [UMA] aot fix * [UMA] Changes PR review * [UMA] cc lint * [UMA] cc lint * Use better function name for te_lowering and annotate current target at TE functions Co-authored-by: Paul Palomero Bernardo <paulpb@outlook.com> Co-authored-by: Christoph Gerum <christoph.gerum@uni-tuebingen.de> Co-authored-by: mbs-octoml <mbs@octoml.ai> Co-authored-by: Christoph Gerum <gerum@informatik.uni-tuebingen.de>
mikeseven · Aug 12, 2022 · d7f6e7f · d7f6e7f
1 parent eb3e09f
commit d7f6e7f
Show file tree

Hide file tree

Showing 39 changed files with 2,605 additions and 5 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -113,6 +113,7 @@ tvm_option(USE_VITIS_AI "Build with VITIS-AI Codegen support" OFF)
 tvm_option(SUMMARIZE "Print CMake option summary after configuring" OFF)
 tvm_option(USE_CLML "Build with CLML Codegen support" OFF)
 tvm_option(USE_CLML_GRAPH_EXECUTOR "Build with CLML graph runtime" OFF)
+tvm_option(USE_UMA "Build with UMA support" OFF)
 
 # include directories
 include_directories(${CMAKE_INCLUDE_PATH})
@@ -497,6 +498,7 @@ include(cmake/modules/contrib/TensorRT.cmake)
 include(cmake/modules/contrib/VitisAI.cmake)
 include(cmake/modules/contrib/Verilator.cmake)
 include(cmake/modules/contrib/CLML.cmake)
+include(cmake/modules/contrib/UMA.cmake)
 include(cmake/modules/Git.cmake)
 include(cmake/modules/LibInfo.cmake)
 include(cmake/modules/RustExt.cmake)

diff --git a/apps/uma/_template/__init__.py b/apps/uma/_template/__init__.py
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+
+Template files for UMA tutorial
+
+
+"""
diff --git a/apps/uma/_template/backend.py b/apps/uma/_template/backend.py
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""UMA backend for the my_ai_hw accelerator"""
+from passes import MyAiHwConv2dPass
+from tvm.relay.backend.contrib.uma.api.utils import PassPhase
+from tvm.relay.backend.contrib.uma.backend import UMABackend
+from codegen import gen_includes
+from patterns import conv2d_pattern
+
+
+class MyAiHwBackend(UMABackend):
+    """UMA backend for the MyAiHw accelerator."""
+
+    def __init__(self):
+        super().__init__()
+
+        # Target configuration
+        self._register_target_attr("dimension")
+
+        # Relay Pattern registration
+        self._register_pattern("conv2d", conv2d_pattern())
+
+        # Relay to TIR function registration
+        self._register_tir_pass(PassPhase.TIR_PHASE_0, MyAiHwConv2dPass())
+
+        # TIR to runtime function registration
+        self._register_codegen(fmt="c", includes=gen_includes)
+
+    @property
+    def target_name(self):
+        return "my_ai_hw"
diff --git a/apps/uma/_template/codegen.py b/apps/uma/_template/codegen.py
@@ -0,0 +1,28 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""UMA codegen for the my_ai_hw accelerator"""
+
+import tvm
+import pathlib
+
+
+def gen_includes() -> str:
+    topdir = pathlib.Path(__file__).parent.absolute()
+
+    includes = ""
+    includes += f'#include "{topdir}/conv2dnchw.cc"'
+    return includes
diff --git a/apps/uma/_template/conv2dnchw.cc b/apps/uma/_template/conv2dnchw.cc
@@ -0,0 +1,96 @@
+/*
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+*/
+#include <stdlib.h>
+
+// TODO(mjklaiber): leverage pragma import_c in the future
+#ifdef __cplusplus
+extern "C"
+#endif
+
+    /*!
+     * \brief Conv2D function for mock-accelerator examples. Limited to same-padded Conv2D with
+     * stride (1,1) and datatype float. \param ifmap Pointer to input feature map data of size
+     * iw*ih*ic*sizeof(float). \param weights Pointer to weight data of size
+     * kh*kw*ic**oc*sizeof(float). \param result Pointer to output feature map data of size
+     * iw*ih*oc*sizeof(float). \param oc Number of channels of output feature map. \param iw Width
+     * of input feature map, ifmap. \param ih Height of input feature map, ifmap. \param ic Number
+     * of channels of input feature map. \param kh Height of convolution kernels. \param kw Width of
+     * convolution kernels.
+     *
+     * \return error code
+     *
+     */
+    int
+    my_ai_hw_conv2dnchw(float* ifmap, float* weights, float* result, int oc, int iw, int ih, int ic,
+                        int kh, int kw) {
+
+  int kw_low = kw / 2;
+  int kh_low = kh / 2;
+  int kw_high = iw + kw / 2;
+  int kh_high = ih + kh / 2;
+
+  int padded_iw = iw + 2 * kw_low;
+  int padded_ih = ih + 2 * kh_low;
+
+  // This is only example code. A real hardware accelerator would call a device specific malloc
+  // function.
+  float* pad_temp = (float*)malloc(
+      (((ic * padded_iw * padded_ih) + (padded_ih * padded_iw)) + padded_iw) * sizeof(float));
+
+  if (pad_temp == NULL) {
+    return -1;
+  }
+
+  for (int i1 = 0; i1 < ic; ++i1) {
+    for (int i2 = 0; i2 < padded_ih; ++i2) {
+      for (int i3 = 0; i3 < padded_iw; ++i3) {
+        ((float*)pad_temp)[(((i1 * padded_iw * padded_ih) + (i2 * padded_iw)) + i3)] =
+            (((((kh_low <= i2) && (i2 < kh_high)) && (kw_low <= i3)) && (i3 < kw_high))
+                 ? ifmap[((((i1 * iw * ih) + ((i2 - kh_low) * iw)) + i3 - kw_low))]
+                 : 0.000000e+00f);
+      }
+    }
+  }
+  for (int i11 = 0; i11 < oc; ++i11) {
+    for (int i21 = 0; i21 < ih; ++i21) {
+      for (int i31 = 0; i31 < iw; ++i31) {
+        for (int i4 = 0; i4 < ic; ++i4) {
+          for (int i5 = 0; i5 < kh; ++i5) {
+            for (int i6 = 0; i6 < kw; ++i6) {
+              int cse_var_1 = (((i11 * iw * ih) + (i21 * iw)) + i31);
+              if (((i4 == 0) && (i5 == 0)) && (i6 == 0)) {
+                result[cse_var_1] = 0.000000e+00f;
+              }
+              result[cse_var_1] =
+                  (result[cse_var_1] +
+                   (((float*)
+                         pad_temp)[i4 * padded_iw * padded_ih + (i21 + i5) * padded_iw + i31 + i6] *
+                    weights[((((i11 * ic * kh * kw) + (i4 * kh * kw)) + (i5 * kw)) + i6)]));
+            }
+          }
+        }
+      }
+    }
+  }
+
+  // This is only example code. A real hardware accelerator would call a device specific free
+  // function.
+  free(pad_temp);
+  return 0;
+}
diff --git a/apps/uma/_template/passes.py b/apps/uma/_template/passes.py
@@ -0,0 +1,136 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Transform passes for the my_ai_hw accelerator"""
+
+import tvm
+from tvm import tir
+from tvm.relay.backend.contrib.uma.api.utils import add_llvm_to_block
+
+
+@tvm.tir.transform.prim_func_pass(opt_level=2)
+class MyAiHwConv2dPass:
+    _EXTERNAL_FUNCTION_NAME = "my_ai_hw_conv2dnchw"
+    _TVM_BLOCK_MATCH_NAME = "conv2d_nchw"
+
+    def transform_function(
+        self, func: tvm.tir.PrimFunc, mod: tvm.ir.IRModule, ctx: tvm.ir.transform.PassContext
+    ) -> tvm.tir.PrimFunc:
+        return self._my_ai_hw_conv2d_pass(func, mod, ctx)
+
+    @classmethod
+    def _my_ai_hw_conv2d_pass(cls, func, mod, ctx):
+        _loops = dict()
+        _handles = []
+        _entry_node = None
+
+        def _has_block(name: str, func: tvm.tir.PrimFunc) -> bool:
+            """
+            Determine of a tir.block with `name` exists in `func`
+            """
+
+            def _hb(op):
+                if isinstance(op, tvm.tir.Block):
+                    _found_blocks.append(op.name_hint)
+
+            _found_blocks = []
+            tvm.tir.stmt_functor.post_order_visit(func.body, _hb)
+            return name in _found_blocks
+
+        def _detect_and_replace_conv2d(
+            func: tvm.tir.PrimFunc, mod: tvm.ir.IRModule, ctx: tvm.ir.transform.PassContext
+        ) -> tvm.tir.PrimFunc:
+            def _replace_conv2d(op):
+                if op == _entry_node:
+                    irb = tvm.tir.ir_builder.create()
+                    # Collection of buffer address
+                    buffers = [b[1].data for b in _handles]
+                    # extraction of loop offsets
+                    for k, v in _loops.items():
+                        assert v.min.value == 0
+                    offset_order = ["co", "w", "h", "ci", "kh", "kw"]
+                    offsets = [_loops[i].extent.value for i in offset_order]
+                    args = buffers + offsets
+                    irb.emit(tir_call(irb, True, cls._EXTERNAL_FUNCTION_NAME, *args))
+                    irb_result = irb.get()
+                    return irb_result
+                elif isinstance(op, tvm.tir.SeqStmt):
+                    # Remove that pad block of TOPI's conv2DNCHW by only returning the 2nd statement
+                    return op.seq[1]
+                return op
+
+            sch = tir.Schedule(func)
+
+            if _has_block(cls._TVM_BLOCK_MATCH_NAME, func):
+                conv2d_block = sch.get_block(cls._TVM_BLOCK_MATCH_NAME)
+                rv_loops = sch.get_loops(conv2d_block)
+                assert len(rv_loops) == 7
+                loops = dict(
+                    n=rv_loops[0],
+                    co=rv_loops[1],
+                    h=rv_loops[2],
+                    w=rv_loops[3],
+                    ci=rv_loops[4],
+                    kh=rv_loops[5],
+                    kw=rv_loops[6],
+                )
+                _entry_node = sch.get(rv_loops[1])
+                _loops = {k: sch.get(v) for k, v in loops.items()}
+                _handles = func.buffer_map.items()
+
+                x = tvm.tir.stmt_functor.ir_transform(
+                    func.body, None, _replace_conv2d, ["tir.For", "tir.SeqStmt"]
+                )
+                return func.with_body(x)
+            else:
+                return func
+
+        r = _detect_and_replace_conv2d(func, mod, ctx)
+        return r
+
+
+def tir_call(ib: tvm.tir.ir_builder, extern: bool, name: str, *args):
+    """
+    ib: ir_builder
+    extern: bool
+        True  --> tvm.tir.call_extern
+        False --> tvm.tir.call_packed
+    name: str
+        function name
+    *args:
+        arguments for function call
+    """
+
+    def buf_from_array(ib, arr, dtype):
+        # Allocate enough memory to store the whole array
+        var = ib.allocate("int32", (len(arr),), scope="global")
+        for i, v in enumerate(arr):
+            var[i] = v
+        # Declare a buffer, which is basically a view on the chunk of memory that we allocated
+        buf = tvm.tir.decl_buffer((len(arr),), dtype, data=var, scope="global")
+        return buf
+
+    if extern:
+        args = [i.data if isinstance(i, tvm.tir.Buffer) else i for i in args]
+        return tvm.tir.call_extern("int32", name, *args)
+    else:
+        args = [
+            buf_from_array(ib, i, "int32")
+            if isinstance(i, (tuple, list, tvm.ir.container.Array))
+            else i
+            for i in args
+        ]
+        return tvm.tir.call_packed(name, *args)
diff --git a/apps/uma/_template/patterns.py b/apps/uma/_template/patterns.py
@@ -0,0 +1,25 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Relay graph patterns for the my_ai_hw accelerator"""
+
+from tvm.relay.dataflow_pattern import is_op, wildcard
+
+
+def conv2d_pattern():
+    pattern = is_op("nn.conv2d")(wildcard(), wildcard())
+    pattern = pattern.has_attr({"strides": [1, 1], "groups": 1})
+    return pattern