-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[µTVM] Enable AutoTVM for ARM STM32F746XX Boards #4274
Conversation
f75c516
to
ebaac5d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick comments
@@ -0,0 +1,102 @@ | |||
#ifdef __cplusplus | |||
extern "C" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.cc o be consistent with the rest part of the stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to be compiled and loaded on the device. The #ifdef __cplusplus
is just in case a C++ compiler is run over it.
@tqchen It looks like the CI doesn't allow assembly---namely, utvm_init.s. That file is required to enable the Cortex-M7 FPU and stack pointer. Can you add it as an exception? We might also want an assembly file whitelist for the entire |
1ca1cda
to
2279ce9
Compare
please rebase against the master due to #4286 also fix the ci error |
5b86677
to
babeb97
Compare
cd15454
to
c9629a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the commented bits part of BinaryContents ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent a couple of hours this evening reviewing this and I have some initial minor corrections for what struck my eye when reading this.
It would be good to consider the following points for future direction. I am not asking for this to be fixed as part of this PR.
In the Arm architecture - there is an architecture level, there are optional features in the architecture, usually an FPU. There are multiple implementations for a particular architecture level and finally multiple devices for each of those CPU implementations. There are many differences between the multiple devices but in the context of uTVM the differences we need to start by worrying about between the devices is really the memory maps and what optional features of the ISA are implemented in that device.
Now, why is this important ? In this world there are multiple implementations with Cortex-M7 with an FP5-SP-D16 FPU init, but the memory maps might well be different between different boards from different manufacturers, thus having easy ways of describing only those differences in a first class way are useful.
regards,
Ramana
@u99127 This is good to know. We should evolve the design to accommodate these instances as they crop up. |
next stop: Tcl-driven
actually relocating the binary now
but now floating point instructions don't work
Also, - templatize `EncoderAppend` - remove `DeviceLocation` class - add `TargetVal` union that `DevPtr` uses
4674aa0
to
9d77f82
Compare
One last change incoming. Forgot to move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the time it's taken.
While this feels like a very initial integration, I think the Arm backend parts should certainly be made more modular to make board addition simpler and the overflow counting for performance counters needs to be handled in the future.
regards
Ramana
This PR adds support for autotuning via MicroTVM. To test this infrastructure on a physical board, I have added support for ARM STM32F746XX boards, featuring Cortex-M7 CPUs. As a followup to this PR, I will write a tutorial for tuning conv2d.
Here are the most notable changes:
micro.device
Python namespace featuring a global registry of all supported devices. The registry is indexed by device ID (e.g.,host
,riscv_spike
, orarm.stm32f746xx
). and maps to dictionaries containing two functions:create_micro_lib
(for creating libraries specific to that device) anddefault_config
(for generating default device-specific config).src/runtime/micro/device
folder which mirrors the structure of themicro.device
folder and includes device initialization and timer implementations for each device.MicroTimeEvaluator
when possible, to make use of cycle-accurate timings available on microcontrollers, instead of using wall clock time (which would include communication overhead).Many thanks to @tqchen for discussing the design with me!
CC @u99127 @ajtulloch @jwfromm