Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TFLite] TFLite FP16 Post Quantization Support #5823

Closed
FrozenGene opened this issue Jun 16, 2020 · 2 comments
Closed

[TFLite] TFLite FP16 Post Quantization Support #5823

FrozenGene opened this issue Jun 16, 2020 · 2 comments

Comments

@FrozenGene
Copy link
Member

FrozenGene commented Jun 16, 2020

TensorFlow Lite now supports converting weights to 16-bit floating point values during model conversion from TensorFlow to TensorFlow Lite's flat buffer format. This results in a 2x reduction in model size.

However, this will insert new dequantize for ops (like conv2d) used for dequantize fp16 weight to fp32. Like this:
image

TVM doesn't support this behavior. List the things we mainly should to do:

  • Support float16 type inside tflite parser
  • Extend dequantize to support fp16 to fp32

Related issue:#5774

@onkar-sima-ai
Copy link
Contributor

onkar-sima-ai commented Nov 18, 2021

@FrozenGene Is this issue still open?

@FrozenGene
Copy link
Member Author

@FrozenGene Is this issue still open?

I think #7093 is enough and could close it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants