-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060
Comments
Cross posting to here. I think it worth to think about memory allocation strategy. Specificially, we should design an API that contains a simple allocator(which is arena like and allocate memory from a stack, and release everything once done), and use that allocator for all memories in the program(including data structures and tensors). This will completely eliminate the usage of system calls and allow the program o run in bare metal. Example API// call use system call to get the memory, or directly points to memory segments in ucontroller
UTVMAllocator* arena = UTVMCreateArena(10000);
// Subsequent data structures are allocated from the allocator
// The free calls will recycle data into the allocator
// The simplest strategy is not to recycle at all
UTVMSetAllocator(arena);
// normal TVM API calls |
@liangfu regarding "superseding uTVM standalone runtime", will MISRA-C runtime support running on bare-metal systems? |
@ajtulloch @weberlo @u99127 (this might be of interest to you) |
Yes, at least it intended to be, but how shall we provide a proper demo on this? Any idea? |
Excellent idea. Perhaps we can also test the bare-metal demo in CI, with a simple RISCV processor like picorv32. |
@tqchen Removing all external allocator use and go with an embedded arena allocator sounds a little bit fishy. Bare-metal platforms does not necessarily lack a proper allocator; |
In PR #5124, we have a reference allocator, which implements vmalloc, vrealloc, and vfree. When necessary, I think we can redirect the function calls to different implementations, e.g. dlmalloc in newlib, jemalloc and many others. I would agree with @KireinaHoro to use implementations in newlib for bare-metal applications. For arena like allocator, I have concerns on how shall we deal with large memory reuse between conv layers, if we don't release allocated workspaces timely. |
The workspace memory could have a different strategy. The way it works is that we create a different arena for workspace, along with a counter.
This will work because all workspace memory are temporal. It also guarantees a constant time allocation As a generalization. If most memory allocation happens in a RAII style lifecycle. e.g. everything de-allocates onces we exit a scope, then the counter based strategy(per scope) is should work pretty well. I am not fixated about the arena allocator, but would like to challenge us to think a bit how much simpler can we make the allocation strategy looks like given what we know about the workload. Of course, we could certainly bring sub-allocator strategies that are more complicated, or fallback to libraries when needed |
Thanks for pointing this to me @tmoreau89 and thank you for this work @liangfu . Very interesting and good questions to ask. From a design level point of view for micro-controllers I'd like to take this one step further and challenge folks to think about whether this can be achieved with static allocation rather than any form of dynamic allocation . The hypothesis being that at compile time one would know how much temporary space is needed between layers rather than having to face a run time failure. Dynamic allocation on micro-controllers suffers from fragmentation issues and further do we want to have dynamic allocation in the runtime on micro-controllers. Further the model being executed will be part of a larger application - how can we allow our users to specify the amount of heap available or being consumed for executing their model ? It would be better to try to provide that with diagnostics at link time or compilation time rather than at runtime. @mshawcroft might have more to add. And yes, in our opinion for micro-controllers one of the challenges is the availability and usage of temporary storage for working set calculations between layers. 2 further design questions.
Purely a nit but from a rationale point of view, I would say that uTVM runtime not being tested in a CI is technical debt :) regards |
re: fragmentation issue, think the allocation strategies carefully and adopt an arena-style allocator(counter based as above) can likely resolve the issue of fragementation. In terms of the total memory cost, we can indeed found the cost out during compile time for simple graph programs |
It's very interesting to see tflite is using arena like allocator for micro-controllers. See how adafruit demonstrate its PyBadge board with TFLite here. |
@liangfu can you try to do a arena based approach given that it is simpler? We could adopt the counter based approach to enable early free of sub-arenas(when the free counters in the arena decreases to zero, we can free the space) |
Sure, as this is definitely the direction we should follow, I can do that. And maybe we need a separate PR for the arena allocator feature. |
Hi @liangfu is there any update on your current implementation efforts? We are really looking forward to it!! |
Hi @Robeast, thanks for your attention. I only have a draft version of the new allocator for now, I'd like to send a PR soon this week. |
Can we close this? |
Since the MISRA-C runtime has been merged in PR #3934 and discussed in RFC #3159 , I think now it's time to migrate uTVM standalone runtime ( introduced in PR #3567 )
Rationale
Actionable Items
Please leave your comment.
cc @areusch
The text was updated successfully, but these errors were encountered: