Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317

abhipn · 2021-06-25T14:13:52Z

I have been using Flair in our production environment for some time now, and I haven't faced any issues so far. But the issue here is not every organization uses GPU for inference, and having a CPU for inference will not be ideal when latency becomes important.

I was wondering is there a way I couldn't convert flair.pt to flair.onnx in the process to apply integer quantization, a small trade off of accuracy over performance is not actually a bad idea. I have gone through docs, but couldn't find any reference for optimizations or distillation etc.,

If someone managed to do it, really appreciate if you could share the details.

The text was updated successfully, but these errors were encountered:

stale · 2021-10-24T12:36:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix This will not be worked on label Oct 24, 2021

stale bot closed this as completed Nov 1, 2021

helpmefindaname mentioned this issue Feb 20, 2022

ONNX compatible models #2640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317

Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317

abhipn commented Jun 25, 2021

stale bot commented Oct 24, 2021

Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317

Is there any possibility of optimizing the flair model (like INT8 quantization etc.,) ? #2317

Comments

abhipn commented Jun 25, 2021

stale bot commented Oct 24, 2021