Sadly, Heroku free-tier has now expired - will try and move this over to another cloud platform
This github repository contains an online API for a simple classification model on the Census Income Data Set to predict salary. The API is live and deployed on Heroku and can be found at: https://uscensus-fastapi.herokuapp.com/. This app is fast, type-checked and autodocumented API and created using FastAPI. I've also created a simple front-end for the API using Anvil. You can interact with the app using the front-end at https://census-salary-predictor.anvil.app/
The machine learning model is a very simple random forest classifier, and can be replaced easily with better models. However the point of this project was to:
- implement production frameworks such as Continuous Integration and Continuous Deployment
- ensure pipeliness pass unit tests before deployment
- testing of local and live API
- use a remote data pipeline and storage with AWS S3 and implement DVC (data version control) with git.
This is a project completed as part of the Udacity Machine Learning DevOps Engineer Nanodegree.
POST requests are used to send data to the API. You can use the API to predict the salary by:
- using the docs page on the Heroku at https://uscensus-fastapi.herokuapp.com/docs/
- use the requests for an individual using python request module (see example in this repository)
- use curl: an example curl command would be:
curl -X 'POST' \
'https://uscensus-fastapi.herokuapp.com/prediction' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"age": 20,
"workclass": "Self-emp-not-inc",
"fnlgt": 205100,
"education": "HS-grad",
"education_num": 9,
"marital_status": "Married-civ-spouse",
"occupation": "Exec-managerial",
"relationship": "Wife",
"race": "White",
"sex": "Female",
"capital_gain": 0,
"capital_loss": 0,
"hours_per_week": 40,
"native_country": "United-States"
}'
Coverage is now assessed using pytest-cov automatically on pushing commits. The report can be seen in the Github Actions page on the most recent build workflow under the pytest heading.
Models and data are stored in an AWS S3 bucket and pulled by DVC on Heroku when the API starts.
Continuous Integration and Continuous Deployment (CI/CD) practices were used. Every commit push triggers a Github workflow, and unit tests using pytest are run before master branch is automatically deployed to Heroku.
The badge above tracks whether CI is passing. More details can be found at the Actions page
- Model performance on data slices for categories education, sex and race can be found at slice_outputs/slice_output.txt
- Screenshot of example json on FastAPI docs page
- Screenshot of browser contents of GET
- Screenshot of successful test of POSTS to API
We need to give Heroku the ability to pull in data from DVC upon app start up. We will install a buildpack that allows the installation of apt-files and then define the Aptfile that contains a path to DVC. I.e., in the CLI run:
heroku buildpacks:add --index 1 heroku-community/apt
Then in your root project folder create a file called Aptfile
that specifies the release of DVC you want installed, e.g.
https://github.com/iterative/dvc/releases/download/2.0.18/dvc_2.0.18_amd64.deb
Add the following code block to your main.py:
import os
if "DYNO" in os.environ and os.path.isdir(".dvc"):
os.system("dvc config core.no_scm true")
if os.system("dvc pull") != 0:
exit("dvc pull failed")
os.system("rm -r .dvc .apt/usr/lib/dvc")