We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unit tests
The text was updated successfully, but these errors were encountered:
@MahmoudWahdan How to do inferencing from tflite compressed model??
Sorry, something went wrong.
Hi @deathsurgeon1 Please, refer to example script
Firstly, you need to have saved the model in tflite format
nlu.save(save_path, save_tflite=True, conversion_mode="hybrid_quantization")
with conversion_mode can be one of the following modes: normal fp16_quantization hybrid_quantization
normal
fp16_quantization
hybrid_quantization
Then, based on the conversion mode and your environment, you may need to disable GPU in the beginning of your script.
import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
Then, load the model with quantized=True and num_process=1 or any number of processes you want and then do prediction
quantized=True
num_process=1
nlu = TransformerNLU.load(model_path, quantized=True, num_process=1) utterance = "add sabrina salerno to the grime instrumentals playlist" result = nlu.predict(utterance)
I hope this will help you. I'm planing to provide more examples and notebooks. Documentation is in our plan.
Kindly, try to post your question in respective issues or open new issue. Thanks.
Thanks a lot for such detailed response :)
@deathsurgeon1 You are much welcomed!
MahmoudWahdan
No branches or pull requests
Add unit tests
The text was updated successfully, but these errors were encountered: