You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I was wondering how the process works for taking a bert model, transferring it to ONNX, quantizing it, and then using for embedding retrieval works. I have been using this bert model (finbert) https://huggingface.co/yiyanghkust/finbert-pretrain. I have basically tried multiple combinations but will just show this code, and maybe you guys can provide some direction, I don't know much about these sort of techniques:
Export Model to ONNX
fromtransformers.convert_graph_to_onnximportconvertfromtransformersimportBertTokenizer, BertModelmodel_name='yiyanghkust/finbert-pretrain'onnx_model_path='path/to/newfinbertmodel.onnx'tokenizer=BertTokenizer.from_pretrained(model_name)
model=BertModel.from_pretrained(model_name).to('cpu)
# Convert the model to ONNXconvert(framework='pt', model=model, output=onnx_model_path, opset=11, tokenizer=tokenizer)
Optimize the Model Using onnxruntime
fromonnxruntime.transformersimportoptimizerfromonnxruntime.transformers.onnx_model_bertimportBertOptimizationOptionsoptimized_model_path='path/to/optimized_model.onnx'# Define optimization optionsopt_options=BertOptimizationOptions('bert')
opt_options.enable_embed_layer_norm=False# Optimize the modelopt_model=optimizer.optimize_model(
onnx_model_path,
'bert',
num_heads=12,
hidden_size=768,
optimization_options=opt_options
)
opt_model.save_model_to_file(optimized_model_path)
print(f"Optimized model saved to: {optimized_model_path}")
Quantize the Model Using onnxruntime
fromonnxruntime.quantizationimportquantize_dynamic, QuantTypequantized_onnx_model_path='path/to/quantized_model.onnx'# Quantizequantize_dynamic(optimized_model_path, quantized_onnx_model_path, weight_type=QuantType.QInt8)
print(f"Quantized model saved to: {quantized_onnx_model_path}")
This code gives me an error about "invalid argument ONNXRUNTIMERROR invalid argument invalid feed input name: token_lengths", which I know it indicates that the ONNX model I am using does not have an input named token_lengths. So I tried going a route of fixing that but have also come up short. Like I said, I've tried a combination of things and have seen multiple errors no matter what I do, so I am not necessarily looking for a solution towards this specific error but more maybe someone can shine light on proper direction, thanks!
The text was updated successfully, but these errors were encountered:
Question
Hello, I was wondering how the process works for taking a bert model, transferring it to ONNX, quantizing it, and then using for embedding retrieval works. I have been using this bert model (finbert) https://huggingface.co/yiyanghkust/finbert-pretrain. I have basically tried multiple combinations but will just show this code, and maybe you guys can provide some direction, I don't know much about these sort of techniques:
Export Model to ONNX
Optimize the Model Using onnxruntime
Quantize the Model Using onnxruntime
Use the Model with Flair
This code gives me an error about "invalid argument ONNXRUNTIMERROR invalid argument invalid feed input name: token_lengths", which I know it indicates that the ONNX model I am using does not have an input named token_lengths. So I tried going a route of fixing that but have also come up short. Like I said, I've tried a combination of things and have seen multiple errors no matter what I do, so I am not necessarily looking for a solution towards this specific error but more maybe someone can shine light on proper direction, thanks!
The text was updated successfully, but these errors were encountered: