Sending an object dtype array as the request JSON for a Model API #461

nsriram13 · 2019-03-07T20:07:21Z

I am building an sklearn pipeline that involves some pre-processing (specifically one-hot-encoding categorical variables and scaling numeric variables) which then gets forwarded to a classifier. I have provided the pipeline below (inspired by this blog):

import joblib
import pandas as pd

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.impute import SimpleImputer

titanic = pd.read_csv("https://raw.githubusercontent.com/amueller/scipy-2017-sklearn/master/notebooks/datasets/titanic3.csv")
target = titanic.survived.values
features = titanic[['pclass', 'sex', 'age', 'fare', 'embarked']].copy()
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=0)

numerical_features = features.dtypes == 'float'
categorical_features = ~numerical_features

preprocess = make_column_transformer(
    (make_pipeline(SimpleImputer(strategy='median'), StandardScaler()), numerical_features),
    (make_pipeline(SimpleImputer(strategy='constant', fill_value='missing'), OneHotEncoder(handle_unknown='ignore')), categorical_features))

model = make_pipeline(
    preprocess,
    LogisticRegression(solver='lbfgs'))

model.fit(X_train, y_train)
print("logistic regression score: %f" % model.score(X_test, y_test))
joblib.dump(model, 'model.joblib')

I am saving the trained model and then using that to create a Docker image to be deployed using s2i based on the model template app (link). I am able to successfully build the image and have it run locally.

Now I am trying to test by sending in a curl request to a predict endpoint. But I am not sure what the input format should be like. An example input to the model is an array like so:

array([[1, 'female', 29.0, 211.3375, 'S']], dtype=object)

All examples seemed to point to the use case were all input array was all numeric. I did not find an example where we can send this mixed object type.

Can you please let me know if this is possible in seldon-core or should I always assume that the input has to be numeric. If so, is there a recommended way of deploying this pipeline.

The text was updated successfully, but these errors were encountered:

ukclivecox · 2019-03-07T20:31:38Z

You can send mix typed data with the ndarray payload:

seldon-core/proto/prediction.proto

Line 29 in a4013f6

google.protobuf.ListValue ndarray = 3;

A JSON payload for this would be for example

{"data":{"ndarray":["a",1,"b"]}}

nsriram13 · 2019-03-07T21:11:33Z

Thanks for the speedy response. Based on above, I tried to issue the following curl command

curl -g http://localhost:5000/predict --data-urlencode 'json={"data": {"ndarray": [[1, "female", 29, 211.33, "S"]]}}'

and I get the following error:

ValueError: SimpleImputer does not support data with dtype <U21. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object dtype.

A more detailed log can be seen here. Any ideas on how to troubleshoot?

ukclivecox · 2019-03-07T21:20:49Z

When converting to a numpy array automatically its type is <U21 (unicode 21 chars).
So you maybe need to convert it yourself into dtype=object I assume before passing to the sklearn pipeline.

nsriram13 · 2019-03-07T21:56:11Z

When converting to a numpy array automatically its type is <U21 (unicode 21 chars).
So you maybe need to convert it yourself into dtype=object I assume before passing to the sklearn pipeline.

Thanks for the tip @cliveseldon . I have this working now. Basically had to modify the predict function to cast X as an object dtype as shown below:

class TitanicServe(object):

    def __init__(self, model_file='model.joblib'):
        self.model = joblib.load(model_file)

    def predict(self,X,features_names):
        X = X.astype(object)
        return self.model.predict_proba(X)

PS: thanks for giving the community this awesome toolkit for model deployment! 👍

ukclivecox · 2019-03-07T22:36:09Z

Glad its working. Keep us updated on how you're using Seldon Core.
I'll close this now. Feel free to open future issues.

ukclivecox added the Customer Integration label Mar 7, 2019

ukclivecox closed this as completed Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending an object dtype array as the request JSON for a Model API #461

Sending an object dtype array as the request JSON for a Model API #461

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019

Sending an object dtype array as the request JSON for a Model API #461

Sending an object dtype array as the request JSON for a Model API #461

Comments

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019

nsriram13 commented Mar 7, 2019

ukclivecox commented Mar 7, 2019