This repository shows an example of an MLOps pipeline built for the Porto Seguro Safe Driver Prediction dataset. The project demonstrates how to train and deploy a machine learning model using open-source tools and frameworks, including Docker, Kubernetes, and FastAPI. The project serves as an example (and reference) for scalable and production-ready machine learning system design.
Predicting insurance risk scores is a critical task for reducing losses in the insurance industry. Using the Porto Seguro Safe Driver Prediction dataset, this project develops a machine learning model to predict the likelihood of claims, with an emphasis on high accuracy and operational scalability.
-
Machine Learning Model:
- The model is built using XGBoost and optimized using Bayesian Optimization.
- Performance is measured using the Gini coefficient, a common metric in risk prediction tasks.
-
Model Serving:
- A FastAPI-based REST API serves predictions and provides health checks.
-
Scalability:
- The application is containerized with Docker and deployed using Kubernetes.
porto-seguro-mlops/
├── data/
├── models/
│ ├── xgboost_test_gini.pkl
│ ├── model_metadata.json
├── serving/ # FastAPI app for model serving
│ ├── predict.py # Prediction and health-check endpoints
│ ├── requirements.txt
├── k8s/ # Kubernetes manifests
│ ├── deployment.yaml
│ ├── service.yaml
├── scripts/
│ ├── train_model.py # Script to train the model
│ ├── create_observation.py
│ ├── run_predict.py # Script to test the FastAPI app
├── Dockerfile # Image for teh serving app
├── README.txt
├── requirements.txt
Install and set up the following programs:
- Python 3.10+
- Docker (Desktop)
- Kubernetes (Minikube for local testing)
- Git
git clone https://github.com/<your-username>/porto-seguro-mlops.git
cd porto-seguro-mlops
Train and save the machine learning model with metadata:
python scripts/train_model.py
This will generate:
models/xgboost_gini.pkl
: Serialized trained model.models/model_metadata.json
: Metadata containing feature details.
Build the Docker Image:
docker build -t porto-seguro-api .
Tag the Docker Image:
docker tag porto-seguro-api jellewillekes/porto-seguro-api:latest
Replace jellewillekes
with Docker Hub username where image is stored.
Push the Docker Image:
docker push jellewillekes/porto-seguro-api:latest
Run the Docker Container Locally:
docker run -p 8080:8080 porto-seguro-api
Test the API:
python scripts/run_predict.py
The data/observation.csv
file contains input data with 91 features. To use the deployed endpoint:
-
Ensure the API is running and accessible. For Kubernetes deployment, retrieve the service URL:
minikube service porto-seguro-service --url
Replace
<SERVICE_URL>
below with the retrieved URL. -
Send a request to the endpoint:
curl -X POST -H "Content-Type: application/json" -d '{"features": [<list_of_91_features>]}' <SERVICE_URL>/predict
Replace
<list_of_91_features>
with the actual feature values. -
Alternatively, use the provided script to send requests:
python scripts/run_predict.py
This script reads the input data from
data/observation.csv
, formats it, and sends a request to the endpoint.
The response is a predicted likelihood of an insurance claim based on the provided features.