-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize names of input tensors in interface header #8720
Conversation
ff94e42
to
82f24d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for interface_api to be in Python?
If not, shall we move the interface_api.py to C++ ?
I feel sanitizing and keeping alignment on names at two places to be a bit fragile.
We could use a same utility for sanitizing then.
Hi @manupa-arm, The answer is that writing it in Python was easier for the initial version of the interface API. I agree eventually it'd be good to port it to C++ but believe it to be out of scope of this PR, it appears to be easily tested so we can guarantee it works despite it being implemented twice. |
Ack, agree that it being out of scope. If we agree with the direction, any thoughts on maintaining a sanitize utility only in C++ exposed to python? |
I'd suggest we implement the test cases and the desired behaviour here and refactor it altogether in a later PR? It'll be a bit wonky either way and we can do a single refactor to C++ rather than a hybrid using the function registry now. |
82f24d3
to
b8384f9
Compare
Cool! make sense if we are following up with that. For better transparency, I'd suggest we place a TODO for that and get this in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised #8792 to track the consolidation rather than adding a TODO. Otherwise LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
cc : @areusch @jroesch @mbs-octoml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - lets leave it here a bit more, trying to get others to have a look. Will merge it tomorrow morning in case nobody disagrees.
LGTM. Agree the cc/py split is unfortunate and a comment cross-linking them with the TODO would be nice. Is the plan to move all code generation to the cc side at some point? |
b4afd3c
to
3b0e7bf
Compare
Thanks for pointing that out @areusch . I've pushed an update now that raises an exception in the case that 2 sanitized names clash. I've also added a test case for that. |
5c44fcb
to
fddc3df
Compare
fddc3df
to
0c18758
Compare
Change-Id: I7f02a993887bf84316262cd2586a734a9079c338
Change-Id: I157d8d8d607de2904285e403893f146e97b510d5
0c18758
to
8c28118
Compare
Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
8c28118
to
1e07a0e
Compare
Hi @areusch , would you mind taking a look at this again and see whether you're happy with the change I made to address the case where multiple tensor names with punctuation are identical after sanitizing. If this happens now, an exception is raised during code generation. |
@areusch can you take another look ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @grant-arm this looks good now
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
* Sanitize names of input tensors in interface header Change-Id: I7f02a993887bf84316262cd2586a734a9079c338 * Update tensor name sanitizer tests to parameterize them. Change-Id: I157d8d8d607de2904285e403893f146e97b510d5 * Only test unpacked, C interface API, AOT case Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
* Introduce centralised name transformation functions To address some of the concerns raised in #8280 and #8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31). * Improve error handling and error messages * Sanitize sanitise to sanitize This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
* Introduce centralised name transformation functions To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31). * Improve error handling and error messages * Sanitize sanitise to sanitize This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
* Sanitize names of input tensors in interface header Change-Id: I7f02a993887bf84316262cd2586a734a9079c338 * Update tensor name sanitizer tests to parameterize them. Change-Id: I157d8d8d607de2904285e403893f146e97b510d5 * Only test unpacked, C interface API, AOT case Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
* Introduce centralised name transformation functions To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++. These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31). * Improve error handling and error messages * Sanitize sanitise to sanitize This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
This PR adds code to sanitize the names of input tensors (by replacing special characters with underscores) before using them in the header file generated by the generate_c_interface_header() function. In particular, this is necessary because using the Tensorflow API to create models can result in input names containing colons e.g.
serving_default_x:0_int8
. Using these tensor names directly as variable names, results in invalid C code.@Mousius @manupa-arm @areusch