-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] [microTVM] Truncated names of fused operators exceed CRT graph executor buffer lengths #8953
Comments
I just ran into another issued related to the limited maximum function name in the graph runtime. I got to use a TFLite model with tensor names which are 136 characters long. This exceeds the maximum allowed size of 120 characters and therefore leads to a crash at runtime. I am not very sure we are able to truncate the tensor names on the TVM-side one because the original names may be used for accessing model inputs and outputs in the target software. Increasing the buffer size can fix the issue at relatively small cost but if a default array size >120 is unwanted, it might be possible to make this configurable using I am aware of the fact that the standalone graph executor will be replaced by the AoT executor in the future for most use cases but as long it has still relevance, people might run into these issues as well. |
cc @mehrdadh could you take a look? |
@PhilippvK I'm working on reproducing this today. Sorry for late response. |
@PhilippvK update: I built three models(kws, vww and ic) with zephyr using these settings and it works fine:
will try to replicate it for crt. |
Only graph runtime is affected, so AOT executor works fine. Thank you so much for investigating this issue. It should also be reproducible for other models by increasing the length of model name passed to the |
gotcha--okay so we are just exercising these models with AOT at the present moment. I think reducing kMaxFuncNameLength is not a great option because it will constrain all other use cases just to make the C runtime use case work. as an aside for microTVM we are generally in favor of deprecating the GraphExecutor as soon as AOT reaches parity. i'm okay with making the |
@areusch I agree, that a low-effort approach such as making the array length configure would be the best way to tackle this issue, given the low relevance of the graph executor in microTVMs future. Preparing a PR for that should be hopefully fairly simple, so I would be happy to do that. |
@areusch I was going to implement this and I found an inconsistency I want to get rid off first: Two different constant values (
It makes sense to me that 1. and 2. match but however the name Please let me know if I am missing anything. |
ah i think i see what happened--since we started including the mangling and post-TE-compiler refactor, we haven't updated
i sort of prefer 1, and add a comment next to kMaxFuncNameLength to keep in sync with crt_config.h and vice versa. What do you think? this would be changing 80 to 120, then. |
@areusch You are more or less right.
The latter does not make sense to me, therefore it feels wrong to just increase |
@PhilippvK you are right, i think I see what happened here. #8380 was authored a long time ago and sync'd up several times, and i think as part of a "cleanup" these constants were changed from So it seems like the "correct" thing to do is to define |
@areusch thanks for clearing things up. I should no be able to get rid of these inconsistencies in a PR. What would be a suitable value for |
Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See apache#8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
@PhilippvK following up on the PR |
Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See apache#8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See apache#8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
* Clean up redundant code in graph_executor.cc How did these lines ended up here? * Fix inconsistencies in graph_executor function names handling Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See #8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
@areusch This is resolved. Can we close it? |
yep let's close, @PhilippvK please re-open if you still see a problem here! |
This can mitigate issues described in apache#8953 without increasing memory requirements
…e#9255) * Clean up redundant code in graph_executor.cc How did these lines ended up here? * Fix inconsistencies in graph_executor function names handling Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See apache#8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
) This can mitigate issues described in #8953 without increasing memory requirements
…e#9255) * Clean up redundant code in graph_executor.cc How did these lines ended up here? * Fix inconsistencies in graph_executor function names handling Updates value of `TVM_CRT_MAX_STRLEN_FUNCTION_NAME` from `80` to `120` Replace all occurences of `[120]` with `[TVM_CRT_MAX_STRLEN_FUNCTION_NAME]` to maintain consistency and make the array lengths user-configurable. Introduces `TVM_CRT_MAX_STRLEN_PARAM_NAME` used for parameter names only Adds comments to `kMaxFuncNameLength` variabe in src/relay/backend/te_compiler_cache.cc making sure that the values are kept "in sync". (sort of) See apache#8953 for more context. The actual bug reported there however can only be fixed by increasing the TVM_CRT_MAX_STRLEN_FUNCTION_NAME to a value larger than the maximum possible truncated function name length (including prefixes and suffices) Example: 6 ['tvmgen' prefix length] + 7 ['default' model name length] + 5 ['fused' fused function name prefix length] + 80 [truncated function name length] + 19 [length of appended hash] + 4 [Number of '_' between components] = 121
…ache#9787) This can mitigate issues described in apache#8953 without increasing memory requirements
…ache#9787) This can mitigate issues described in apache#8953 without increasing memory requirements
I realized that two models (
int8
quantizedvww
andaww
of the MLPerf Tiny benchmark) which have worked fine using the graph executor a few month ago, stopped working with the latest TVM version.Expected behavior
Running the builded models (on x86 for simplicity) via the CRT graph runtime should not result in any error messages, segmentation faults or other crashes.
Actual behavior
The following is printed to the terminal during the Initialization of the graph executor (e.g. while parsing the JSON graph):
The actual error is the the first line while the others just seem to be consequences of the previous problem.
Investigations
The big question to me was why is this only happening for these too models? As the error message is about string sizes I figured out, that this is related to the
char func_name[120];
array in thestruct TVMOpParam
. (See include/tvm/runtime/crt/graph_executor.h)Looking at the
graph.json
file I found out that there are in fact function names with a length slightly larger than 120 characters. I also get rid of the error temporarily by limiting the maximum number of fused ops (usingrelay.FuseOps.max_depth
) which results in shorter function names.Here is the code used by TVM to generate unique truncated function names by appending a hash:
Then I did the following calculation based on this (example) JSON
func_name
entry:tvmgen_default_fused_nn_conv2d_add_cast_multiply_add_right_shift_cast_add_clip_cast_clip_cast_s_9959535092109263429__2
This leads to the conclusion that the
mod_name
passed to therelay.build()
function needs to have at most 5 characters which is suboptimal, especially considering that the default name seems to thedefault
(e.g.tvmgen_default
).I suppose that the easiest possible fixes would be either
func_name[]
array by a few characterskMaxFuncNameLength
by some characters.In addition it might be good to limit the length of the model name which can be specified by the user to make sure that this does not break again for too long model names in the future.
Why does this only happen for a few models? - Most operator/function names afer the
FuseOps
transformation do not exceed thekMaxFuncNameLength
and therefore do not need to be truncated.Environment
apps/bundle_deploy/bundle_static.c
)Steps to reproduce
Here is a branch I set up to reproduce this issue using the aforementioned TFLite models:
Step-by-Step instructions for reproducing the issue can be found here.
There is also a GitHub Action which runs the demonstration of the bug. The log output can be investigated here: https://github.com/PhilippvK/tvm/runs/3531514621?check_suite_focus=true
The text was updated successfully, but these errors were encountered: