Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize names of input tensors in interface header #8720

Merged
merged 3 commits into from
Sep 3, 2021

Conversation

grant-arm
Copy link
Contributor

@grant-arm grant-arm commented Aug 11, 2021

This PR adds code to sanitize the names of input tensors (by replacing special characters with underscores) before using them in the header file generated by the generate_c_interface_header() function. In particular, this is necessary because using the Tensorflow API to create models can result in input names containing colons e.g. serving_default_x:0_int8 . Using these tensor names directly as variable names, results in invalid C code.

@Mousius @manupa-arm @areusch

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit, otherwise LGTM

python/tvm/micro/interface_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grant-arm @Mousius,

Is there a reason for interface_api to be in Python?
If not, shall we move the interface_api.py to C++ ?

I feel sanitizing and keeping alignment on names at two places to be a bit fragile.
We could use a same utility for sanitizing then.

@Mousius
Copy link
Member

Mousius commented Aug 18, 2021

Hi @manupa-arm,

The answer is that writing it in Python was easier for the initial version of the interface API. I agree eventually it'd be good to port it to C++ but believe it to be out of scope of this PR, it appears to be easily tested so we can guarantee it works despite it being implemented twice.

@manupak
Copy link
Contributor

manupak commented Aug 18, 2021

Ack, agree that it being out of scope.

If we agree with the direction, any thoughts on maintaining a sanitize utility only in C++ exposed to python?

@Mousius
Copy link
Member

Mousius commented Aug 18, 2021

I'd suggest we implement the test cases and the desired behaviour here and refactor it altogether in a later PR? It'll be a bit wonky either way and we can do a single refactor to C++ rather than a hybrid using the function registry now.

@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch from 82f24d3 to b8384f9 Compare August 18, 2021 18:42
@manupak
Copy link
Contributor

manupak commented Aug 19, 2021

Cool! make sense if we are following up with that.

For better transparency, I'd suggest we place a TODO for that and get this in.

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised #8792 to track the consolidation rather than adding a TODO. Otherwise LGTM!

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - lets leave it here a bit more, trying to get others to have a look. Will merge it tomorrow morning in case nobody disagrees.

@mbs-octoml
Copy link
Contributor

LGTM. Agree the cc/py split is unfortunate and a comment cross-linking them with the TODO would be nice. Is the plan to move all code generation to the cc side at some point?

@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch 3 times, most recently from b4afd3c to 3b0e7bf Compare August 20, 2021 13:34
@grant-arm
Copy link
Contributor Author

@grant-arm @manupa-arm @Mousius sorry for the late review here. i do think we need to consider the case that multiple distinct punctuation are mangled into identical variable names. could you propose a solution for this?

Thanks for pointing that out @areusch . I've pushed an update now that raises an exception in the case that 2 sanitized names clash. I've also added a test case for that.

@grant-arm grant-arm requested a review from areusch August 20, 2021 17:29
@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch 2 times, most recently from 5c44fcb to fddc3df Compare August 25, 2021 15:39
tests/python/relay/aot/test_crt_aot.py Outdated Show resolved Hide resolved
@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch from fddc3df to 0c18758 Compare August 31, 2021 16:00
Change-Id: I7f02a993887bf84316262cd2586a734a9079c338
Change-Id: I157d8d8d607de2904285e403893f146e97b510d5
@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch from 0c18758 to 8c28118 Compare September 1, 2021 11:01
Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
@grant-arm grant-arm force-pushed the sanitize_input_tensor_names branch from 8c28118 to 1e07a0e Compare September 1, 2021 13:57
@grant-arm
Copy link
Contributor Author

Hi @areusch , would you mind taking a look at this again and see whether you're happy with the change I made to address the case where multiple tensor names with punctuation are identical after sanitizing. If this happens now, an exception is raised during code generation.

@manupak
Copy link
Contributor

manupak commented Sep 3, 2021

@areusch can you take another look ?

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @grant-arm this looks good now

@areusch areusch merged commit 06a0d63 into apache:main Sep 3, 2021
Mousius added a commit to Mousius/tvm that referenced this pull request Sep 23, 2021
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
Mousius added a commit to Mousius/tvm that referenced this pull request Sep 23, 2021
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
Mousius added a commit to Mousius/tvm that referenced this pull request Sep 23, 2021
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
Mousius added a commit to Mousius/tvm that referenced this pull request Sep 23, 2021
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
Mousius added a commit to Mousius/tvm that referenced this pull request Sep 24, 2021
To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* Sanitize names of input tensors in interface header

Change-Id: I7f02a993887bf84316262cd2586a734a9079c338

* Update tensor name sanitizer tests to parameterize them.

Change-Id: I157d8d8d607de2904285e403893f146e97b510d5

* Only test unpacked, C interface API, AOT case

Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
areusch pushed a commit that referenced this pull request Sep 30, 2021
* Introduce centralised name transformation functions

To address some of the concerns raised in #8280 and #8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).

* Improve error handling and error messages

* Sanitize sanitise to sanitize

This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Introduce centralised name transformation functions

To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).

* Improve error handling and error messages

* Sanitize sanitise to sanitize

This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Sanitize names of input tensors in interface header

Change-Id: I7f02a993887bf84316262cd2586a734a9079c338

* Update tensor name sanitizer tests to parameterize them.

Change-Id: I157d8d8d607de2904285e403893f146e97b510d5

* Only test unpacked, C interface API, AOT case

Change-Id: I9082ae32079a1a3924c06c7f26c757aafa46dec2
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Introduce centralised name transformation functions

To address some of the concerns raised in apache#8280 and apache#8720 I've put together a series of functions to combine together names to re-use between these areas. These are meant to be a starting point to fix up the name generation to use the TVM C conventions and port the interface API header to C++.

These functions will also be used for constructing the names in the C Device API (apache/tvm-rfcs#31).

* Improve error handling and error messages

* Sanitize sanitise to sanitize

This patch aims to sanitize uses of sanitise to the form of sanitize to be consistent with the overall codebases use of American English.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants