Drift Detection using EvidentlyAI

This repository demonstrates how to detect data drift in a machine learning pipeline using EvidentlyAI integrated with Valohai. It showcases the steps to preprocess data, train a model, and monitor data drift, with automated retraining triggered upon drift detection.

What is Data Drift?

Data drift in machine learning refers to the change in input data distribution or the relationship between input and output data over time, which can adversely affect model performance. Monitoring and managing drift is crucial to maintaining model accuracy and reliability in production.

Training Pipeline

This pipeline preprocesses the data and trains the model.

Pipeline Steps:

Data Preprocessing:
- Load the dataset from Valohai inputs or fetch the California Housing dataset if not available.
- Preprocess the data.
- Save the processed data to Valohai with an alias.
Model Training:
- Load the preprocessed data.
- Train the model using scikit-learn.
- Save the trained model with a Valohai alias.

Training Pipeline view in Valohai:

Drift Detection Pipeline

This pipeline performs inference with the fine-tuned model and detects data drift using EvidentlyAI.

Pipeline Steps:

Inference and Drift Detection:
- Load the reference dataset, current dataset, and the trained model.
- Perform inference on the current dataset.
- Generate data drift reports using EvidentlyAI.
- Save the drift reports in various formats (JSON, HTML).
Conditional Retraining:
- Check if drift is detected based on the reports.
- If drift is detected, update the status and trigger the retraining pipeline.
- If no drift is detected, stop the pipeline.

Drift Detection Pipeline view in Valohai:

Overall Flow of the Project

Data is preprocessed and stored.
Model is trained and evaluated.
Inference is performed on new data to detect drift.
If drift is detected, the pipeline triggers retraining with human approval.
If no drift is detected, the pipeline stops.

Visual Representation:

Running on Valohai

Configuring the Repository:

To run this code on Valohai from your terminal, follow these steps:

Install Valohai CLI and utilities:
```
pip install valohai-cli
```
Log in to Valohai from the terminal:
```
vh login
```

Create a directory for your project and initialize a Valohai project:

mkdir valohai-evidently-example
cd valohai-evidently-example
vh project create

Clone this repository into your project directory:

git clone https://github.com/valohai/evidently-example.git .

Running Executions:

To run individual steps:

vh execution run <step-name> --adhoc

Example to run the preprocessing step:

vh execution run preprocess --adhoc

Running Pipelines:

To run the entire pipeline:

vh pipeline run <pipeline-name> --adhoc

Example to run the training pipeline:

vh pipeline run inference-drift-detection-pipeline --adhoc

Working with secrets.

In this project you need to use private token in to use Valohai API in call-retrain.py.

Note that you should never include the token in your version control. Instead of pasting it directly into your code, we recommend storing it as a secret environment variable.

You can add environment variables in a couple of ways in Valohai.

Add the environment variable when creating an execution from the UI (Create Execution -> Environment Variables). The env variable are only available in the execution where it was created.
Add the project environment variable (Project Settings -> "Environment Variables" tab -> Check "Secret" checkbox). In this case, the env variable will be available for all executions of the project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
README.md		README.md
call-retrain.py		call-retrain.py
preprocess.py		preprocess.py
report.py		report.py
train_model.py		train_model.py
valohai.yaml		valohai.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drift Detection using EvidentlyAI

What is Data Drift?

Training Pipeline

Pipeline Steps:

Training Pipeline view in Valohai:

Drift Detection Pipeline

Pipeline Steps:

Drift Detection Pipeline view in Valohai:

Overall Flow of the Project

Visual Representation:

Running on Valohai

Configuring the Repository:

Running Executions:

Running Pipelines:

Working with secrets.

About

Releases

Packages

Languages

valohai/evidently-drift-detection

Folders and files

Latest commit

History

Repository files navigation

Drift Detection using EvidentlyAI

What is Data Drift?

Training Pipeline

Pipeline Steps:

Training Pipeline view in Valohai:

Drift Detection Pipeline

Pipeline Steps:

Drift Detection Pipeline view in Valohai:

Overall Flow of the Project

Visual Representation:

Running on Valohai

Configuring the Repository:

Running Executions:

Running Pipelines:

Working with secrets.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages