[RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 #5417

areusch · 2020-04-23T00:01:41Z

This PR contains changes that implement the prototype of microTVM according to the RFC. It can be tested using the scripts in this demo repository. Additional documentation and changes will follow with finer granularity, but would like to get feedback on the implementation thus far.

tqchen · 2020-04-23T00:09:51Z

cc @tmoreau89 @liangfu @weberlo @u99127

tmoreau89 · 2020-04-23T01:52:30Z

cc @u99127

python/tvm/autotvm/tuner/tuner.py

weberlo · 2020-04-24T18:06:36Z

python/tvm/contrib/binutil.py

+    gdb_init_dir = os.environ.get('MICRO_GDB_INIT_DIR')
+    if gdb_init_dir is not None:
+        gdb_init_path = f'{gdb_init_dir}/.gdbinit'
+        with open(gdb_init_path, 'r') as f:
+            gdbinit_contents = f.read().split('\n')
+        new_contents = []
+        for line in gdbinit_contents:
+            new_contents.append(line)
+            if line.startswith('target'):
+                new_contents.append(f'add-symbol-file {rel_obj_path}')
+        with open(gdb_init_path, 'w') as f:
+            f.write('\n'.join(new_contents))


It might be worth splitting these lines into a separate µTVM debugging tools PR

I think that's also going to change soon, so would prefer to fix then

python/tvm/contrib/debugger/debug_runtime.py

weberlo · 2020-04-24T18:16:04Z

python/tvm/micro/base.py

+def _calc_max_workspace_usage(src):
+    # TODO factor in alignment to the calculation (alloc sizes will be aligned up to the word size)
+    alloc_re = re.compile(
+        r'.*\* ?(.+) = (\(.+\))? TVMBackendAllocWorkspace\(.+, .+, \(uint64_t\)(.+), .+, .+\).*')
+    free_re = re.compile(r'.*if \(TVMBackendFreeWorkspace\(.+, .+, (\(void\*\))? (.+)\) != 0\) {.*')
+    max_usage = 0
+    alloc_map = {}
+    for line in src.split('\n'):
+        if line.strip().startswith('//'):
+            continue
+        match = alloc_re.match(line)
+        if match is not None:
+            alloc_map[match.group(1)] = int(match.group(3))
+            max_usage = max(max_usage, sum(alloc_map.values()))
+        else:
+            match = free_re.match(line)
+            if match is not None:
+                print(alloc_map)
+                del alloc_map[match.group(2)]
+    return max_usage
+
+


this is for sure a hacky way to calculate the memory footprint of workspace allocations.
in a followup PR, we should move this calculation further upstream and instead use a visitor to find workspace allocs in the AST.
in the meantime, let's just make sure it doesn't ever crash when src doesn't match the format expected by the regexes.

yeah more robust memory analysis will come in a follow-on. can you explain what you mean by crash if it doesn't match the format given? I don't know it will crash if it finds 0 allocs, it will just not catch anything

for posterity, i think it would crash if free_re matches, but alloc_re doesn't, so it would attempt to delete an entry in the alloc_map that isn't there. C that's coming from codegen should never have this problem, but I remember running into it when writing wrappers for CMSIS-NN by hand.

weberlo · 2020-04-24T18:27:21Z

python/tvm/micro/device/host.py

+    else:
+        options = list(options)
+    # Cannot increase optimization level on host due to code loading method.
+    options.append('-O0')


-Os (and maybe -O1) work. it's just -O2 that's been causing problems on the host

let's just leave as is for now--the runtime will change a lot soon

weberlo · 2020-04-24T22:23:52Z

src/runtime/micro/micro_session.cc

+    // TODO(weberlo): add a `clear_batch_timer` func
+  } else if (name == "get_last_batch_time") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      *rv = this->GetLastBatchTime();
+    });
+    // TODO(weberlo): remove this func
+  } else if (name == "get_last_batch_cycles") {
+    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
+      *rv = this->GetLastBatchCycles();
+    });


perhaps we should rename these functions to GetLastBatchHostTime and GetLastBatchDevTime. I think having both would be of use, for example, if a user wants to verify their device timer impl with host timings, or if a device doesn't have a timer (i think this case is rare tho).

we may also want to rethink the timing API, because resetting the batch time to 0 when GetLastBatchTime is called isn't very user-friendly.

right now it returns either last host or last device time based on use_device_timer. I agree we should rethink the timing API, but perhaps we can move it to next PR, when I would want to implement an API between the host and device, and we can better define the concept of batch time (and the units it is logged in) there?

weberlo · 2020-04-24T22:27:01Z

src/runtime/micro/openocd_low_level_device.cc

@@ -210,9 +210,9 @@ class OpenOCDLowLevelDevice final : public LowLevelDevice {
  // NOTE: OpenOCD will call any request larger than this constant an "absurd
  // request".
  /*! \brief maximum number of bytes allowed in a single memory transfer */
-  static const constexpr ssize_t kMemTransferLimit = 64000;
+  static const constexpr ssize_t kMemTransferLimit = 8000;


i'm curious what openocd version you're running, because it seems like the standard for an "absurd request" is 64k (line 4274)

I can't remember anymore exactly where this limit hits, but iirc it's due to mac os x pipe buffering. I think it's because we are reading the pipe line by line on the TVM side, but if you issue a memory transfer that prints more than ~24k of characters, the os pipe buffer fills up before the newline char is sent and we deadlock. updated comment.

okay. we might want some preprocessor magic to detect if the platform is linux or mac and set this constant accordingly, because leaving it at 8k means linux is issuing 8 times more requests than needs to.

weberlo · 2020-04-24T22:30:36Z

tests/python/unittest/test_runtime_micro.py

+# # Use the host emulated micro device.
+DEV_CONFIG_A = micro.device.host.generate_config()
+DEV_CONFIG_B = micro.device.host.generate_config()
+TARGET = 'c -device=micro_dev'


these can be re-collapsed into a single DEV_CONFIG. I separated them for prototyping, because you need separate server ports for physical devices

we still have test_interleave_sessions for now though?

topi/python/topi/arm_cpu/conv2d_spatial_pack.py

topi/python/topi/arm_cpu/cortex_m7/micro_kernel/gemm.py

liangfu

Great work! @areusch Thank you very much for your contribution. I left some review comments, mostly regard to the coding styles.

Makefile

liangfu · 2020-04-26T09:51:07Z

python/tvm/micro/device/arm/stm32f746xx.py

+        '-DARM_MATH_CM7',
+        '-D__FPU_PRESENT=1U',
+        '-DARM_MATH_DSP',
+        '-Wno-unused-variable',


Is this really necessary? Why can't we remove the unused variables?

these are in the generated GEMM code. for example:

/var/folders/9y/3j808g591ln3kys4qpyl3qmc0000gn/T/tmpb4sk1ylf/temp.c:104:11: error: unused variable \'arg1_code\' [-Werror=unused-variable]

it would be nice if we did not need to include this, but i'm hoping we can merge this PR as-is and incrementally improve things like this (especially since I didn't author most of the code generation stuff and will need a bit of time to understand how to improve it to remove errors like this). the next PR will look at the generated code more in detail, so if it's a quick fix, I can fix it then.

umm.. it seems to be a problem in codegen, please feel free to leave this line as-is.

liangfu · 2020-04-26T09:56:42Z

python/tvm/micro/device/base.py

-        "-nostdlib",
-        "-fdata-sections",
-        "-ffunction-sections",
+        f'{toolchain_prefix}gcc',


Please undo the changes in converting double quotes to single quotes, and please undo other similar changes as well.

undid all the quote conversions, except those that made sense (i.e. to avoid escapes, as in f'unknown variable "{var}"'). can you point out any other changes you want me to revert? I didn't originally author most of this code so I don't have all the context in my head.

python/tvm/micro/device/base.py

python/tvm/micro/device/host.py

src/runtime/micro/micro_common.h

src/target/source/codegen_c_host.cc

liangfu · 2020-04-26T10:12:57Z

topi/python/topi/arm_cpu/cortex_m7/conv2d/direct.py

+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=invalid-name


Please put this at the specific line of code, instead of leaving this on the top.

this placement seems to be pretty conventional for topi, since most of the variable names are too short to be considered valid?

topi/python/topi/arm_cpu/cortex_m7/conv2d/direct.py

* Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc.

src/runtime/micro/micro_session.cc

tests/python/unittest/test_runtime_micro.py

topi/python/topi/arm_cpu/cortex_m7/micro_kernel/gemm.py

python/tvm/autotvm/tuner/callback.py

tmoreau89

Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.

u99127

Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.

Yes please.

I've only done a quick scan and I promise a more detailed review when this gets split up further.

Ramana

python/tvm/autotvm/measure/local_executor.py

python/tvm/micro/device/arm/stm32f746xx.py

tmoreau89 · 2020-04-27T21:59:42Z

@u99127 the pieces that changed the behavior of autoTVM are a small fraction of the overall PR; so post-changes, the PR will remain significantly large / unchanged.

Let us know if you'd like to be able to do a review before it gets merged!

python/tvm/exec/rpc_server.py

tqchen · 2020-04-29T14:12:32Z

@weberlo @tmoreau89 @liangfu @u99127 please take another look and http://tvm.apache.org/docs/contribute/code_review.html#approve-and-request-changes-explicitly

liangfu

Aside from the missing comments, LGTM.

liangfu · 2020-04-30T06:06:55Z

python/tvm/micro/base.py

+    lib_headers: TODO
+        e.g., `['cmsis_gcc.h', 'arm_math.h']`
+
+    lib_include_paths: TODO


Please add a meaningful comment here.

liangfu · 2020-04-30T06:08:38Z

python/tvm/micro/device/arm/stm32f746xx.py

@@ -36,23 +55,40 @@ def create_micro_lib(obj_path, src_path, lib_type, options=None):

    options : Optional[List[str]]
        additional options to pass to GCC
+
+    lib_src_paths : Optional[List[str]]
+        TODO


Please put a meaningful comment here as well.

liangfu · 2020-04-30T06:10:09Z

python/tvm/micro/device/riscv_spike.py

@@ -62,56 +78,31 @@ def default_config(base_addr, server_addr, server_port):
    server_port : int
        port of OpenOCD server to connect to

+    TODO correct type annotation?
+    section_constraints: Optional[Dict[str, Tuple[Number, MemConstraint]]]
+        TODO


leave a meaningful comment

tmoreau89

Thanks @areusch for addressing the comments/reviews. the PR is approved

tmoreau89 · 2020-04-30T17:59:56Z

Thanks @areusch , @liangfu @weberlo @u99127 the PR has been merged

* Prototype for micro TVM. * Cleanup and sync micro tvm prototype. * Use /std:c++14 with MSVC. * Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc. * fix ASF lint, and fix add_asf_header too * Compiles with USE_MICRO=OFF. * Cleanup TargetPtr and word size representations. * fix compile warning * address logan's comments * address logan and liangfu comments * address thierry's comments * address u99127, liangfu, tmoreau89 comments Co-authored-by: Logan Weber <weberlo@cs.washington.edu>

tqchen added the status: need review label Apr 23, 2020

tqchen changed the title ~~micro TVM prototype~~ [RUNTIME][uTVM] Arm Cortext-M support Apr 23, 2020

tqchen changed the title ~~[RUNTIME][uTVM] Arm Cortext-M support~~ [RUNTIME][uTVM] AutoTVM + uTVM Apr 23, 2020

tqchen changed the title ~~[RUNTIME][uTVM] AutoTVM + uTVM~~ [RUNTIME][uTVM] AutoTVM + uTVM for Cortext-M Apr 23, 2020

areusch force-pushed the areusch/utvm-merge branch 2 times, most recently from 54b9410 to c4dfa29 Compare April 23, 2020 23:13

areusch changed the title ~~[RUNTIME][uTVM] AutoTVM + uTVM for Cortext-M~~ [RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 Apr 23, 2020

weberlo suggested changes Apr 24, 2020

View reviewed changes

areusch force-pushed the areusch/utvm-merge branch from 62727a7 to 6719913 Compare April 24, 2020 23:00

liangfu requested changes Apr 26, 2020

View reviewed changes

weberlo and others added 9 commits April 27, 2020 10:27

Prototype for micro TVM.

ab7f5cd

Cleanup and sync micro tvm prototype.

91e132d

Use /std:c++14 with MSVC.

80c6470

* Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc.

fix ASF lint, and fix add_asf_header too

ce06883

Compiles with USE_MICRO=OFF.

9a8e2b3

Cleanup TargetPtr and word size representations.

25961d6

fix compile warning

03782a8

address logan's comments

d351b9c

address logan and liangfu comments

4d426ab

areusch force-pushed the areusch/utvm-merge branch from 4811d4b to 4d426ab Compare April 27, 2020 17:50