-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RUNTIME][uTVM] AutoTVM + uTVM for Cortex-M7 #5417
Conversation
cc @u99127 |
54b9410
to
c4dfa29
Compare
python/tvm/contrib/binutil.py
Outdated
gdb_init_dir = os.environ.get('MICRO_GDB_INIT_DIR') | ||
if gdb_init_dir is not None: | ||
gdb_init_path = f'{gdb_init_dir}/.gdbinit' | ||
with open(gdb_init_path, 'r') as f: | ||
gdbinit_contents = f.read().split('\n') | ||
new_contents = [] | ||
for line in gdbinit_contents: | ||
new_contents.append(line) | ||
if line.startswith('target'): | ||
new_contents.append(f'add-symbol-file {rel_obj_path}') | ||
with open(gdb_init_path, 'w') as f: | ||
f.write('\n'.join(new_contents)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth splitting these lines into a separate µTVM debugging tools PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's also going to change soon, so would prefer to fix then
python/tvm/micro/base.py
Outdated
def _calc_max_workspace_usage(src): | ||
# TODO factor in alignment to the calculation (alloc sizes will be aligned up to the word size) | ||
alloc_re = re.compile( | ||
r'.*\* ?(.+) = (\(.+\))? TVMBackendAllocWorkspace\(.+, .+, \(uint64_t\)(.+), .+, .+\).*') | ||
free_re = re.compile(r'.*if \(TVMBackendFreeWorkspace\(.+, .+, (\(void\*\))? (.+)\) != 0\) {.*') | ||
max_usage = 0 | ||
alloc_map = {} | ||
for line in src.split('\n'): | ||
if line.strip().startswith('//'): | ||
continue | ||
match = alloc_re.match(line) | ||
if match is not None: | ||
alloc_map[match.group(1)] = int(match.group(3)) | ||
max_usage = max(max_usage, sum(alloc_map.values())) | ||
else: | ||
match = free_re.match(line) | ||
if match is not None: | ||
print(alloc_map) | ||
del alloc_map[match.group(2)] | ||
return max_usage | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is for sure a hacky way to calculate the memory footprint of workspace allocations.
in a followup PR, we should move this calculation further upstream and instead use a visitor to find workspace allocs in the AST.
in the meantime, let's just make sure it doesn't ever crash when src
doesn't match the format expected by the regexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah more robust memory analysis will come in a follow-on. can you explain what you mean by crash if it doesn't match the format given? I don't know it will crash if it finds 0 allocs, it will just not catch anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for posterity, i think it would crash if free_re
matches, but alloc_re
doesn't, so it would attempt to delete an entry in the alloc_map that isn't there. C that's coming from codegen should never have this problem, but I remember running into it when writing wrappers for CMSIS-NN by hand.
python/tvm/micro/device/host.py
Outdated
else: | ||
options = list(options) | ||
# Cannot increase optimization level on host due to code loading method. | ||
options.append('-O0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Os
(and maybe -O1
) work. it's just -O2
that's been causing problems on the host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just leave as is for now--the runtime will change a lot soon
// TODO(weberlo): add a `clear_batch_timer` func | ||
} else if (name == "get_last_batch_time") { | ||
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { | ||
*rv = this->GetLastBatchTime(); | ||
}); | ||
// TODO(weberlo): remove this func | ||
} else if (name == "get_last_batch_cycles") { | ||
return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { | ||
*rv = this->GetLastBatchCycles(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should rename these functions to GetLastBatchHostTime
and GetLastBatchDevTime
. I think having both would be of use, for example, if a user wants to verify their device timer impl with host timings, or if a device doesn't have a timer (i think this case is rare tho).
we may also want to rethink the timing API, because resetting the batch time to 0 when GetLastBatchTime
is called isn't very user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now it returns either last host or last device time based on use_device_timer
. I agree we should rethink the timing API, but perhaps we can move it to next PR, when I would want to implement an API between the host and device, and we can better define the concept of batch time (and the units it is logged in) there?
@@ -210,9 +210,9 @@ class OpenOCDLowLevelDevice final : public LowLevelDevice { | |||
// NOTE: OpenOCD will call any request larger than this constant an "absurd | |||
// request". | |||
/*! \brief maximum number of bytes allowed in a single memory transfer */ | |||
static const constexpr ssize_t kMemTransferLimit = 64000; | |||
static const constexpr ssize_t kMemTransferLimit = 8000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm curious what openocd version you're running, because it seems like the standard for an "absurd request" is 64k (line 4274)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember anymore exactly where this limit hits, but iirc it's due to mac os x pipe buffering. I think it's because we are reading the pipe line by line on the TVM side, but if you issue a memory transfer that prints more than ~24k of characters, the os pipe buffer fills up before the newline char is sent and we deadlock. updated comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. we might want some preprocessor magic to detect if the platform is linux or mac and set this constant accordingly, because leaving it at 8k means linux is issuing 8 times more requests than needs to.
# # Use the host emulated micro device. | ||
DEV_CONFIG_A = micro.device.host.generate_config() | ||
DEV_CONFIG_B = micro.device.host.generate_config() | ||
TARGET = 'c -device=micro_dev' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be re-collapsed into a single DEV_CONFIG
. I separated them for prototyping, because you need separate server ports for physical devices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still have test_interleave_sessions for now though?
62727a7
to
6719913
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! @areusch Thank you very much for your contribution. I left some review comments, mostly regard to the coding styles.
'-DARM_MATH_CM7', | ||
'-D__FPU_PRESENT=1U', | ||
'-DARM_MATH_DSP', | ||
'-Wno-unused-variable', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? Why can't we remove the unused variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are in the generated GEMM code. for example:
/var/folders/9y/3j808g591ln3kys4qpyl3qmc0000gn/T/tmpb4sk1ylf/temp.c:104:11: error: unused variable \'arg1_code\' [-Werror=unused-variable]
it would be nice if we did not need to include this, but i'm hoping we can merge this PR as-is and incrementally improve things like this (especially since I didn't author most of the code generation stuff and will need a bit of time to understand how to improve it to remove errors like this). the next PR will look at the generated code more in detail, so if it's a quick fix, I can fix it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm.. it seems to be a problem in codegen, please feel free to leave this line as-is.
python/tvm/micro/device/base.py
Outdated
"-nostdlib", | ||
"-fdata-sections", | ||
"-ffunction-sections", | ||
f'{toolchain_prefix}gcc', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo the changes in converting double quotes to single quotes, and please undo other similar changes as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undid all the quote conversions, except those that made sense (i.e. to avoid escapes, as in f'unknown variable "{var}"'). can you point out any other changes you want me to revert? I didn't originally author most of this code so I don't have all the context in my head.
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# pylint: disable=invalid-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this at the specific line of code, instead of leaving this on the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this placement seems to be pretty conventional for topi
, since most of the variable names are too short to be considered valid?
* Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc.
4811d4b
to
4d426ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @areusch for the great work, that is quite a large PR! I made a few comments that are mostly costmetic. I suggest that we keep changes to autoTVM minimal, and that instead we make the non uTVM changes as part of an isolated RFC+PR combo if needed.
Yes please.
I've only done a quick scan and I promise a more detailed review when this gets split up further.
Ramana
@u99127 the pieces that changed the behavior of autoTVM are a small fraction of the overall PR; so post-changes, the PR will remain significantly large / unchanged. Let us know if you'd like to be able to do a review before it gets merged! |
@weberlo @tmoreau89 @liangfu @u99127 please take another look and http://tvm.apache.org/docs/contribute/code_review.html#approve-and-request-changes-explicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the missing comments, LGTM.
lib_headers: TODO | ||
e.g., `['cmsis_gcc.h', 'arm_math.h']` | ||
|
||
lib_include_paths: TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a meaningful comment here.
@@ -36,23 +55,40 @@ def create_micro_lib(obj_path, src_path, lib_type, options=None): | |||
|
|||
options : Optional[List[str]] | |||
additional options to pass to GCC | |||
|
|||
lib_src_paths : Optional[List[str]] | |||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put a meaningful comment here as well.
@@ -62,56 +78,31 @@ def default_config(base_addr, server_addr, server_port): | |||
server_port : int | |||
port of OpenOCD server to connect to | |||
|
|||
TODO correct type annotation? | |||
section_constraints: Optional[Dict[str, Tuple[Number, MemConstraint]]] | |||
TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave a meaningful comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @areusch for addressing the comments/reviews. the PR is approved
* Prototype for micro TVM. * Cleanup and sync micro tvm prototype. * Use /std:c++14 with MSVC. * Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc. * fix ASF lint, and fix add_asf_header too * Compiles with USE_MICRO=OFF. * Cleanup TargetPtr and word size representations. * fix compile warning * address logan's comments * address logan and liangfu comments * address thierry's comments * address u99127, liangfu, tmoreau89 comments Co-authored-by: Logan Weber <weberlo@cs.washington.edu>
* Prototype for micro TVM. * Cleanup and sync micro tvm prototype. * Use /std:c++14 with MSVC. * Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc. * fix ASF lint, and fix add_asf_header too * Compiles with USE_MICRO=OFF. * Cleanup TargetPtr and word size representations. * fix compile warning * address logan's comments * address logan and liangfu comments * address thierry's comments * address u99127, liangfu, tmoreau89 comments Co-authored-by: Logan Weber <weberlo@cs.washington.edu>
* Prototype for micro TVM. * Cleanup and sync micro tvm prototype. * Use /std:c++14 with MSVC. * Per tqchen: project has already moved to C++14 * Presubmit failed for code that built locally on gcc. * fix ASF lint, and fix add_asf_header too * Compiles with USE_MICRO=OFF. * Cleanup TargetPtr and word size representations. * fix compile warning * address logan's comments * address logan and liangfu comments * address thierry's comments * address u99127, liangfu, tmoreau89 comments Co-authored-by: Logan Weber <weberlo@cs.washington.edu>
This PR contains changes that implement the prototype of microTVM according to the RFC. It can be tested using the scripts in this demo repository. Additional documentation and changes will follow with finer granularity, but would like to get feedback on the implementation thus far.