MLOPS MEDICAL INSURANCE COSTS PREDICTION ⚱️

This is a personal MLOps project based on a Kaggle dataset for medical insurance costs prediction. It contains several AWS SageMaker pipelines from preprocessing till deployment, inference and monitoring.

Feel free to ⭐ and clone this repo 😉

Tech Stack

Project Structure

The project has been structured with the following folders and files:

.github/workflows: contains the CI/CD files (GitHub Actions)
aws_pipelines: AWS pipelines from preprocessing till deployment and monitoring
- ✅ preprocessing_pipeline.py: data preprocessing
- ✅ training_pipeline.py: model training
- ✅ tuning_pipeline.py: model fine tuning
- ✅ evaluate_pipeline.py: model evaluation
- ✅ register_pipeline.py: model registry
- ✅ cond_register_pipeline.py: model conditional registry (based on MAE Threshold)
- ✅ deployment_pipeline.py: model automatic deployment
- ✅ manual_deployment_pipeline.py: model manual deployment (requires manual approval on AWS)
- ✅ inference_pipeline.py: model automatic deployment and endpoint creation
- ✅ data_quality_pipeline.py: model registry with data quality baseline
- ✅ model_quality_pipeline.py: model registry with data and model quality baseline
- ✅ monitoring_pipeline.py: data and model monitor schedules creation
data: raw and clean data
Notebooks: Exploratory Data Analysis
src: code_scripts for processing, training, evaluation, serving (Flask), lambda, inference and endpoint testing
.env_sample: sample environmental variables
.flake8: flake requirements
.gitattributes: gitattributes
Makefile: install requirements, formating, testing, linting, coverage report and clean up
pyproject.toml: linting and formatting
requirements.txt: project requirements

Project Description

The dataset was obtained from Kaggle and contains 1338 rows and 7 columns to predict health insurance costs. To prepare the data for modelling, an Exploratory Data Analysis was conducted. For modeling, the categorical features where encoded, Tensorflow was use as model and the mean absolute error threshold was selected for model registry.

Project Set Up

The Python version used for this project is Python 3.10.

Clone the repo (or download it as a zip file):

git clone https://github.com/benitomartin/mlops-aws-insurance.git

Create the virtual environment named main-env using Conda with Python version 3.10:
```
conda create -n main-env python=3.10
conda activate main-env
```
Execute the Makefile script and install the project dependencies included in the requirements.txt:
```
pip install -r requirements.txt

or

make install
```

Additionally, please note that an AWS Account, credentials, and proper policies with full access to SageMaker, S3, and Lambda are necessary for the projects to function correctly. Make sure to configure the appropriate credentials to interact with AWS services.

Pipeline Deployment

All pipelines where deployed on AWS SageMaker, as well as the Model Registry and Endpoints. At the end of each pipeline the is a line that must be uncommented to run it on AWS:

# Start the pipeline execution (if required)
evaluation_pipeline.start()

Additionally the experiments were tracked on Comel ML.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOPS MEDICAL INSURANCE COSTS PREDICTION ⚱️

Tech Stack

Project Structure

Project Description

Project Set Up

Pipeline Deployment

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
aws_pipelines		aws_pipelines
data		data
notebooks		notebooks
src		src
tests		tests
.env_sample		.env_sample
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

benitomartin/mlops-aws-insurance

Folders and files

Latest commit

History

Repository files navigation

MLOPS MEDICAL INSURANCE COSTS PREDICTION ⚱️

Tech Stack

Project Structure

Project Description

Project Set Up

Pipeline Deployment

About

Topics

Resources

Stars

Watchers

Forks

Languages