Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using quantization to reduce model's size #151

Open
haritha-mohan opened this issue Jan 5, 2024 · 1 comment
Open

Investigate using quantization to reduce model's size #151

haritha-mohan opened this issue Jan 5, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@haritha-mohan
Copy link
Collaborator

haritha-mohan commented Jan 5, 2024

The ONNX runtime has a number of different precision options to choose from.

By default, the fp32 (float32) option is used. However, using alternatives like float16 or mixed precision can help decrease the size of the model in half and significantly help improve performance- especially as it scales in the future.

Will look into exploring more of https://github.com/microsoft/onnxruntime to customize the precision of the model to best suit our use case.

Refs:
https://medium.com/data-science-at-microsoft/model-compression-and-optimization-why-think-bigger-when-you-can-think-smaller-216ec096f68b
https://onnxruntime.ai/docs/performance/model-optimizations/float16.html

@haritha-mohan haritha-mohan changed the title Investigate using quantization to reduce model Investigate using quantization to reduce model's size Jan 5, 2024
@haritha-mohan haritha-mohan self-assigned this Jan 5, 2024
@haritha-mohan haritha-mohan added the enhancement New feature or request label Jan 5, 2024
@haritha-mohan
Copy link
Collaborator Author

fp16 support for JS API is a wip: microsoft/onnxruntime#17230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant