Investigate using quantization to reduce model's size #151

haritha-mohan · 2024-01-05T23:42:59Z

The ONNX runtime has a number of different precision options to choose from.

By default, the fp32 (float32) option is used. However, using alternatives like float16 or mixed precision can help decrease the size of the model in half and significantly help improve performance- especially as it scales in the future.

Will look into exploring more of https://github.com/microsoft/onnxruntime to customize the precision of the model to best suit our use case.

Refs:
https://medium.com/data-science-at-microsoft/model-compression-and-optimization-why-think-bigger-when-you-can-think-smaller-216ec096f68b
https://onnxruntime.ai/docs/performance/model-optimizations/float16.html

haritha-mohan · 2024-02-09T03:24:49Z

fp16 support for JS API is a wip: microsoft/onnxruntime#17230

haritha-mohan changed the title ~~Investigate using quantization to reduce model~~ Investigate using quantization to reduce model's size Jan 5, 2024

haritha-mohan self-assigned this Jan 5, 2024

haritha-mohan added the enhancement New feature or request label Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using quantization to reduce model's size #151

Investigate using quantization to reduce model's size #151

haritha-mohan commented Jan 5, 2024 •

edited

Loading

haritha-mohan commented Feb 9, 2024

Investigate using quantization to reduce model's size #151

Investigate using quantization to reduce model's size #151

Comments

haritha-mohan commented Jan 5, 2024 • edited Loading

haritha-mohan commented Feb 9, 2024

haritha-mohan commented Jan 5, 2024 •

edited

Loading