Description
Version
Version 0.18.0
Description
When the input shape of an ONNX model has been set to a string (thus indicating that the axes are dynamic), then making a prediction will give an error of this kind:
cortex.lib.exceptions.UserException: error: key 'input_ids' for model '_cortex_default': failed to convert to NumPy array for model '_cortex_default': cannot reshape array of size 6 into shape (1,1)
Here's an example of a model's input shapes:
model input type shape
attention_mask int64 (batch, sequence)
input_ids int64 (batch, sequence)
Steps to reproduce
Within a given directory, run all the following steps.
Creating environment/model
Create a virtual environment for Python 3.6.9 and install the following pip dependencies:
onnxruntime==1.3.0
torch==1.5.0
transformers==3.0.0
scipy==1.4.1
Within that environment, run the following instructions to export the XLM-Roberta model in ONNX format:
from transformers.convert_graph_to_onnx import convert
convert(framework="pt", model="xlm-roberta-base", output="./output/xlm-roberta-base.onnx", opset=11)
Now, let's run the following:
python -m onnxruntime_tools.optimizer_cli --input ./output/xlm-roberta-base.onnx --output ./output/xlm-roberta-base.onnx --model_type bert --float16
Creating the Cortex deployment
Create a cortex.yaml
config file with the following content:
# cortex.yaml
- name: api
predictor:
type: onnx
model_path: ./output/xlm-roberta-base.onnx
path: predictor.py
image: cortexlabs/onnx-predictor-cpu:0.18.0
Create a predictor.py
script with the following content:
# predictor.py
from transformers import XLMRobertaTokenizer
from scipy.special import softmax
import time
class ONNXPredictor:
def __init__(self, onnx_client, config):
self.client = onnx_client
self.tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base")
def predict(self, payload):
start = time.time()
model_inputs = self.tokenizer.encode_plus(payload["text"], max_length=512, return_tensors="pt", truncation=True)
inputs_onnx = {k: v.cpu().detach().numpy() for k, v in model_inputs.items()}
print(self.client._signatures)
output = self.client.predict(inputs_onnx)
output = softmax(output[0], axis=1)[0].tolist()
end = time.time()
return {"output": output, "time": end - start}
Copy-paste the pip dependencies as mentioned above into a requirements.txt
file and within the same directory as that of the cortex.yaml
config file, run cortex deploy -e local
. Wait for the API to be live and then run:
curl http://localhost:8888 -X POST -H "Content-Type: application/json" -d '{"text": "That is a nice"}'
Error
The above command will return a non-200 response code. Inspect the logs with cortex get api
. The expected error is:
cortex.lib.exceptions.UserException: error: key 'input_ids' for model '_cortex_default': failed to convert to numpy array for model '_cortex_default': cannot reshape array of size 6 into shape (1,1)