Skip to content

Commit

Permalink
Add final example of a learning API.
Browse files Browse the repository at this point in the history
  • Loading branch information
ihuston committed Jun 17, 2015
1 parent 4ffc4ac commit c4209ca
Show file tree
Hide file tree
Showing 9 changed files with 467 additions and 0 deletions.
27 changes: 27 additions & 0 deletions 04-learning-api/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Copyright (c) 2015, Alexander Kagoshima, Pivotal Software Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of ds-cfpylearning nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
60 changes: 60 additions & 0 deletions 04-learning-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Simple Cloud Foundry based machine learning API

Modified from code originally written by Alexander Kagoshima
See the full version at https://github.com/alexkago/ds-cfpylearning

This app demonstrates a very simple API that can be used to create model instances, feed data to them and let these models retrain periodically. Currently, it uses redis to store model instances, model state and data as well - for scalability and distributed processing of data this should be replaced by a distributed data storage.

For all the examples below replace ```http://<model_domain>``` with your Cloud Foundry app domain.


Create a model
--

```
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "model_type": "LinearRegression", "retrain_counter": 10}' http://<model_domain>/createModel
```


Add in some data
--

This example shows how to send data into the model created before, s.t. the linear regression model becomes y = x. Since we set the retrain_counter to 10 previously, the model will retrain after it received the 10th data instance.

```
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 1, "label": 1}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 2, "label": 2}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 3, "label": 3}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 4, "label": 4}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 5, "label": 5}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 6, "label": 6}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 7, "label": 7}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 8, "label": 8}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 9, "label": 9}' http://<model_domain>/ingest
curl -i -X POST -H "Content-Type: application/json" -d '{"model_name": "model1", "input": 10, "label": 10}' http://<model_domain>/ingest
```


Look at all created models
--

There's a very rudimentary view on the redis set of all models that have been created:

```
http://<model_domain>/models/
```


Look at model details
--

This lets you check out the status of the previously created model as well as its trained parameters:

```
http://<model_domain>/models/model1
```

License
--

This application is released under the Modified BSD license. Please see the LICENSE.txt file for details.
22 changes: 22 additions & 0 deletions 04-learning-api/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: cfpylearning
dependencies:
- flask=0.10.1=py27_1
- itsdangerous=0.24=py27_0
- jinja2=2.7.3=py27_1
- markdown=2.6.2=py27_0
- markupsafe=0.23=py27_0
- nose=1.3.7=py27_0
- numpy=1.9.2=py27_0
- openssl=1.0.1k=1
- pip=7.0.3=py27_0
- python=2.7.10=0
- readline=6.2=2
- scikit-learn=0.16.1=np19py27_0
- scipy=0.15.1=np19py27_0
- setuptools=17.1.1=py27_0
- sqlite=3.8.4.1=1
- tk=8.5.18=0
- werkzeug=0.10.4=py27_0
- zlib=1.2.8=0
- pip:
- redis==2.10.3
178 changes: 178 additions & 0 deletions 04-learning-api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
import os
import json
import redis
import pickle
from markdown import markdown
from flask import Flask, request, jsonify, abort, make_response, Markup, render_template, g
from models.StandardModels import LinearRegression
from models import ModelFactory

app = Flask(__name__)

# Get hostname
cf_app_env = os.getenv('VCAP_APPLICATION')
if cf_app_env is not None:
host = json.loads(cf_app_env)['application_uris'][0]
else:
host = 'localhost'

# initialize redis connection for local and CF deployment
def connect_db():
if os.environ.get('VCAP_SERVICES') is None: # running locally
DB_HOST = 'localhost'
DB_PORT = 6379
DB_PW = ''
REDIS_DB = 1 if app.config["TESTING"] else 0 # use other db for testing

else: # running on CF
env_vars = os.environ['VCAP_SERVICES']
rediscloud_service = json.loads(env_vars)['rediscloud'][0]
credentials = rediscloud_service['credentials']
DB_HOST = credentials['hostname']
DB_PORT = credentials['port']
DB_PW = password=credentials['password']
REDIS_DB = 0


app.r = redis.StrictRedis(host=DB_HOST,
port=DB_PORT,
password=DB_PW,
db=REDIS_DB)


# define routes
@app.route('/')
def hello():

return render_template('help.html', host=host)


@app.route('/flushDB')
def flushDB():
app.r.flushdb()
return 'db flushed', 200


@app.route('/createModel', methods=['POST'])
def createModel():
json_data = request.get_json(force=True)

# check if all fields are there
if json_data.get('model_name') is None:
abort(make_response("model_name field is missing.\n", 422))

if json_data.get('model_type') is None:
abort(make_response("model_type field is missing.\n", 422))

if json_data.get('retrain_counter') is None:
abort(make_response("no retrain information set.\n", 422))

# add model to list of models
app.r.sadd('models', json_data.get('model_name'))

# save model definition
mdl = ModelFactory.createModel(json_data.get('model_type'),
json_data.get('model_name'),
json_data.get('retrain_counter'))

if mdl is None:
return abort(make_response("No model available of type " +
json_data.get('model_type') + "\n",
422))

app.r.set(json_data.get('model_name') + '_object', pickle.dumps(mdl))

return "created model: " + str(mdl) + "\n", 201


@app.route('/models')
def modelOverview():
return str(app.r.smembers('models')), 200


@app.route('/models/<model_name>')
def modelInfo(model_name):
return str(pickle.loads(app.r.get(model_name + '_object'))), 200


@app.route('/ingest', methods=['POST'])
def ingest():
json_data = request.get_json(force=True)

if json_data.get('model_name') is None:
abort(make_response("model_name field is missing.\n", 422))

# prepare db keys
mdl_key = json_data.get('model_name') + '_object'
data_key = json_data.get('model_name') + '_data'

# get the model from the db
pickled_mdl = app.r.get(mdl_key)
mdl = pickle.loads(pickled_mdl)

# pre-process data
del json_data['model_name']
col_names = json_data.keys()

# update the model
if mdl.available_data == 0:
mdl.set_data_format(col_names)
else:
if mdl.col_names != col_names:
return abort(make_response("Data format changed!\n", 422))

mdl.avail_data_incr()

# save data to redis
app.r.rpush(data_key, json.dumps(json_data))

# kick off re-training
if (mdl.available_data % mdl.retrain_counter) == 0:
data = app.r.lrange(data_key, 0, mdl.available_data)
mdl.train(data)

# save model file
app.r.set(mdl_key, pickle.dumps(mdl))

return json.dumps(json_data) + " added at " + data_key + "\n", 201

@app.route('/score', methods=['POST'])
def score():
json_data = request.get_json(force=True)

if json_data.get('model_name') is None:
abort(make_response("model_name field is missing.\n", 422))

# prepare db keys
mdl_key = json_data.get('model_name') + '_object'
pickled_mdl = app.r.get(mdl_key)
mdl = pickle.loads(pickled_mdl)

if not mdl.trained:
return abort(make_response("Model has not been trained yet!\n", 404))

train_data = dict(json_data)
del train_data['model_name']
input_keys = mdl.col_names
input_keys.remove('label')

if input_keys != train_data.keys():
return abort(make_response("Data format for training is different!\n", 422))

pred_val = mdl.score([train_data[key] for key in input_keys])

prediction = {'predicted_label': pred_val[0], 'request': json_data}

return json.dumps(prediction), 201

# run app
if __name__ == "__main__":
if os.environ.get('VCAP_SERVICES') is None: # running locally
PORT = 8080
DEBUG = True
else: # running on CF
PORT = int(os.getenv("VCAP_APP_PORT"))
DEBUG = False

connect_db()
app.run(host='0.0.0.0', port=PORT, debug=DEBUG)
12 changes: 12 additions & 0 deletions 04-learning-api/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
applications:
- name: learning-api
memory: 512M
instances: 1
domain: cfapps.io
random-route: true
path: .
buildpack: https://github.com/ihuston/python-conda-buildpack.git
command: python main.py
services:
- myredis
80 changes: 80 additions & 0 deletions 04-learning-api/models/ModelFactory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import json
import abc

class ModelInterface:
__metaclass__ = abc.ABCMeta
def __init__(self, model_name, retrain_counter, model_type):
self.model_name = model_name
self.model_type = model_type
self.trained = False
self.available_data = 0
self.used_training_data = 0
self.retrain_counter = retrain_counter

def avail_data_incr(self):
self.available_data += 1

def set_data_format(self, col_names):
self.col_names = col_names

def update_mdl_state(self):
self.used_training_data = self.available_data
self.trained = True

@abc.abstractmethod
def get_parameters(self):
"""This method needs to be implemented"""

@abc.abstractmethod
def train(self, train_data):
"""This method needs to be implemented"""

@abc.abstractmethod
def score(self, score_data):
"""This method needs to be implemented"""

def __eq__(self, other):
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)

def __str__(self):
obj_dict = self.__dict__
if self.trained:
obj_dict['parameters'] = self.get_parameters()
return str(obj_dict)


def train_wrapper(func):
def wrapper(self, data):
# pre-process data
dict_data = [json.loads(el) for el in data]
col_names = dict_data[0].keys()

# # run some update functions on the object
# if not self.trained:
# self.set_data_format(col_names)
# else:
# if self.col_names != col_names:
# raise InputError('Data format is not the same as used before.')

# run the actual training function
val = func(self, dict_data, col_names)

# update the model state
self.update_mdl_state()

return val

return wrapper


def createModel(model_type, model_name, retrain_counter):
try:
import StandardModels
return getattr(StandardModels, model_type)(model_name, retrain_counter)
except:
try:
import CustomModels
return getattr(CustomModels, model_type)(model_name, retrain_counter)
except:
return None
Loading

0 comments on commit c4209ca

Please sign in to comment.