Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending an object dtype array as the request JSON for a Model API #461

Closed
nsriram13 opened this issue Mar 7, 2019 · 5 comments
Closed

Comments

@nsriram13
Copy link

I am building an sklearn pipeline that involves some pre-processing (specifically one-hot-encoding categorical variables and scaling numeric variables) which then gets forwarded to a classifier. I have provided the pipeline below (inspired by this blog):

import joblib
import pandas as pd

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.impute import SimpleImputer

titanic = pd.read_csv("https://raw.githubusercontent.com/amueller/scipy-2017-sklearn/master/notebooks/datasets/titanic3.csv")
target = titanic.survived.values
features = titanic[['pclass', 'sex', 'age', 'fare', 'embarked']].copy()
X_train, X_test, y_train, y_test = train_test_split(features, target, random_state=0)

numerical_features = features.dtypes == 'float'
categorical_features = ~numerical_features

preprocess = make_column_transformer(
    (make_pipeline(SimpleImputer(strategy='median'), StandardScaler()), numerical_features),
    (make_pipeline(SimpleImputer(strategy='constant', fill_value='missing'), OneHotEncoder(handle_unknown='ignore')), categorical_features))

model = make_pipeline(
    preprocess,
    LogisticRegression(solver='lbfgs'))

model.fit(X_train, y_train)
print("logistic regression score: %f" % model.score(X_test, y_test))
joblib.dump(model, 'model.joblib')

I am saving the trained model and then using that to create a Docker image to be deployed using s2i based on the model template app (link). I am able to successfully build the image and have it run locally.

Now I am trying to test by sending in a curl request to a predict endpoint. But I am not sure what the input format should be like. An example input to the model is an array like so:

array([[1, 'female', 29.0, 211.3375, 'S']], dtype=object)

All examples seemed to point to the use case were all input array was all numeric. I did not find an example where we can send this mixed object type.

Can you please let me know if this is possible in seldon-core or should I always assume that the input has to be numeric. If so, is there a recommended way of deploying this pipeline.

@ukclivecox
Copy link
Contributor

You can send mix typed data with the ndarray payload:

google.protobuf.ListValue ndarray = 3;

A JSON payload for this would be for example

{"data":{"ndarray":["a",1,"b"]}}

@nsriram13
Copy link
Author

Thanks for the speedy response. Based on above, I tried to issue the following curl command

curl -g http://localhost:5000/predict --data-urlencode 'json={"data": {"ndarray": [[1, "female", 29, 211.33, "S"]]}}'

and I get the following error:

ValueError: SimpleImputer does not support data with dtype <U21. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object dtype.

A more detailed log can be seen here. Any ideas on how to troubleshoot?

@ukclivecox
Copy link
Contributor

When converting to a numpy array automatically its type is <U21 (unicode 21 chars).
So you maybe need to convert it yourself into dtype=object I assume before passing to the sklearn pipeline.

@nsriram13
Copy link
Author

When converting to a numpy array automatically its type is <U21 (unicode 21 chars).
So you maybe need to convert it yourself into dtype=object I assume before passing to the sklearn pipeline.

Thanks for the tip @cliveseldon . I have this working now. Basically had to modify the predict function to cast X as an object dtype as shown below:

class TitanicServe(object):

    def __init__(self, model_file='model.joblib'):
        self.model = joblib.load(model_file)

    def predict(self,X,features_names):
        X = X.astype(object)
        return self.model.predict_proba(X)

PS: thanks for giving the community this awesome toolkit for model deployment! 👍

@ukclivecox
Copy link
Contributor

Glad its working. Keep us updated on how you're using Seldon Core.
I'll close this now. Feel free to open future issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants