diff --git a/examples/models/statsmodels/statsmodels.ipynb b/examples/models/statsmodels/statsmodels.ipynb new file mode 100644 index 0000000000..044e814d64 --- /dev/null +++ b/examples/models/statsmodels/statsmodels.ipynb @@ -0,0 +1,402 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploying Time-Series Models on Seldon " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following notebook are steps to deploy your first time-series model on Seldon. The first step is to install statsmodels on our local system, along with s2i. s2i will be used to convert the source code to a docker image and stasmodels is a python library to build statistical models. \n", + "\n", + "Dependencies:\n", + "\n", + "1. Seldon-core (https://docs.seldon.io/projects/seldon-core/en/v1.1.0/workflow/install.html) \n", + "\n", + "2. s2i - Source to Image (https://rb.gy/jgybo9)\n", + "\n", + "3. statsmodels (https://www.statsmodels.org/stable/index.html) \n", + "\n", + "\n", + "\n", + "Assuming you have installed statsmodels and s2i, the next step is to create a joblib file of your time-series model. The sample code is given below . Here we have considered a Holt- Winter's seasonal model and the shampoo sales dataset as a basic example. \n", + " \n", + " \n", + "The univariate dataset : https://raw.githubusercontent.com/jbrownlee/Datasets/master/shampoo.csv " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install statsmodels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Code snippet to create a joblib file :\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from statsmodels.tsa.holtwinters import ExponentialSmoothing\n", + "import numpy as np\n", + "import joblib\n", + "\n", + "df=pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/shampoo.csv')\n", + "\n", + "#Taking a test-train split of 80 %\n", + "train=df[0:int(len(df)*0.8)] \n", + "test=df[int(len(df)*0.8):]\n", + "\n", + "#Pre-processing the Month field\n", + "train.Timestamp = pd.to_datetime(train.Month,format='%m-%d') \n", + "train.index = train.Timestamp \n", + "test.Timestamp = pd.to_datetime(test.Month,format='%m-%d') \n", + "test.index = test.Timestamp \n", + "\n", + "#fitting the model based on optimal parameters\n", + "model = ExponentialSmoothing(np.asarray(train['Sales']) ,seasonal_periods=7 ,trend='add', seasonal='add',).fit()\n", + "joblib.dump(model,'model.sav')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Next step is to write the code in a format defined by s2i as given below : " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile holt_winter.py\n", + "\n", + "import joblib\n", + "class holt_winter(object):\n", + " \"\"\"\n", + " Model template. You can load your model parameters in __init__ from a location accessible at runtime\n", + " \"\"\"\n", + " \n", + " def __init__(self):\n", + " \n", + " \"\"\"\n", + " Add any initialization parameters. These will be passed at runtime from the graph definition parameters defined in your seldondeployment kubernetes resource manifest.\n", + " \n", + " loading the joblib file \n", + " \"\"\"\n", + " self.model = joblib.load('model.sav')\n", + " print(\"Initializing ,inside constructor\")\n", + "\n", + "\n", + " def predict(self,X,feature_names):\n", + " \"\"\"\n", + " Return a prediction.\n", + " Parameters\n", + " ----------\n", + " X : array-like\n", + " feature_names : array of feature names (optional)\n", + " \n", + " This space can be used for data pre-processing as well\n", + " \"\"\"\n", + " print(X)\n", + " print(\"Predict called - will run idenity function\")\n", + " return self.model.forecast(X)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After saving the code, we now create an environment_rest file and add the following lines: \n", + "\n", + "MODEL_NAME=holt_winter
\n", + "API_TYPE=REST
\n", + "SERVICE_TYPE=MODEL
\n", + "PERSISTENCE =0
\n", + "\n", + "\n", + "MODEL_NAME:
\n", + "The name of the class containing the model. Also the name of the python file which will be imported.
\n", + "\n", + "API_TYPE:
\n", + "API type to create. Can be REST or GRPC
\n", + "\n", + "SERVICE_TYPE:
\n", + "The service type being created. Available options are:
\n", + "1. MODEL
\n", + "2. ROUTER
\n", + "3. TRANSFORMER
\n", + "4. COMBINER
\n", + "5. OUTLIER_DETECTOR
\n", + "\n", + "\n", + "\n", + "PERSISTENCE:
\n", + "Set either to 0 or 1. Default is 0. If set to 1 then your model will be saved periodically to redis and loaded from redis (if exists) or created fresh if not.
\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile requirements.txt\n", + "joblib\n", + "statsmodels\n", + "pandas\n", + "numpy\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile environment_rest\n", + "\n", + "MODEL_NAME=holt_winter\n", + "API_TYPE=REST \n", + "SERVICE_TYPE=MODEL\n", + "PERSISTENCE =0\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we build the image using the s2i command, replace \"seldonio/statsmodel-holts:0.1\" with the image name of your choice :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!s2i build -E environment_rest . seldonio/seldon-core-s2i-python3:0.18 seldonio/statsmodel-holts:0.1\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Running the docker image created:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!docker run --name \"holt_predictor\" -d --rm -p 5000:5000 seldonio/statsmodel-holts:0.1\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The code is now running at the local host at port 5000. It can be tested by sending a curl command, here we are sending a request to the model to predict the sales for the next 3 weeks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!curl -s http://localhost:5000/predict -H \"Content-Type: application/json\" -d '{\"data\":{\"ndarray\":3}}'\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The next step is to push the code into the docker registry, you are free to use the docker hub or the private registry in your cluster. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!docker push seldonio/statsmodel-holts:0.1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The final step is to deploy the configuration file on your cluster as shown below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile model.yaml\n", + "\n", + "apiVersion: machinelearning.seldon.io/v1alpha2\n", + "kind: SeldonDeployment\n", + "metadata:\n", + " name: holt-predictor\n", + "spec:\n", + " name: holt-predictor\n", + " predictors:\n", + " - componentSpecs:\n", + " - spec:\n", + " containers:\n", + " - image: seldonio/statsmodel-holts:0.1\n", + " imagePullPolicy: IfNotPresent\n", + " name: holt-predictor\n", + " graph:\n", + " children: []\n", + " endpoint:\n", + " type: REST\n", + " name: holt-predictor\n", + " type: MODEL\n", + " name: holt-predictor\n", + " replicas: 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!kubectl apply -f model.yaml\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your model will now be deployed as a service, create a route in order for external traffic to access it . A sample curl request (with a dummy I.P, replace it with the route created by you) for the model is :" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!curl -s -d '{\"data\": {\"ndarray\":2}}' -X POST http://160.11.22.334:4556/seldon/testseldon/holt-predictor/api/v1.0/predictions -H \"Content-Type: application/json\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the above command, we send a request to get a prediction of the sales of the shampoo for the next 2 days. testseldon is the namespace, you can replace it with the namespace created by you where the model is deployed .\n", + "\n", + "\n", + "The response we get is : \n", + "\n", + "{\"data\":{\"names\":[],\"ndarray\":[487.86681173,415.82743026 ]},\"meta\":{}}\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data returned is an n-dimensional array with 2 values which is the predicted values by the model, in this case the sales of the shampoo." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: it is suggested that you try the model on your local system before deploying it on the cluster." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Model Monitoring" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the model is deployed, you can now monitor various metrics, the 2 main ones being:\n", + "\n", + "1. Requests per second
\n", + "2. Latency in serving the request\n", + "\n", + "\n", + "\n", + "\n", + "The model deployed on Seldon can be monitored using build in metrics dashboard on Grafana. Here is the link to deploy metrics dashboard: https://docs.seldon.io/projects/seldon-core/en/v1.1.0/analytics/analytics.html.
\n", + "The screenshot of a sample dashboard is given below:
\n", + "![dashboard_image1](dashboard_image.png)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This documentation covers deploying time series model on Seldon, this model could be inferenced for forecasting values from a given data set. This is very useful for customers who want to deploy time series alogithm for forecasting models.\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}