Skip to content

An implementation of continuous training and continuous model deployment using dockerized Airflow & MLflow.

Notifications You must be signed in to change notification settings

instork/mlflow-airflow-exercise

Repository files navigation

mlflow-airflow-exercise

An implementation of continuous training and continuous model deployment using dockerized Airflow & MLflow. ARIMA & auto-ARIMA will be trained with BTC closing price everyday. Data validation, statistics, trained models will be logged on MLFlow. Model versions out of date will be deleted automatically.

 

Sample Image

 

Prerequisites

  • Install docker
  • Install docker-compose
  • Create .env like below for docker compose
    • Cloud or exteranl server: port-forwarding on 8080, 9000, 9001, 5000 ports is needed. Use <Exertnal IP> address below
    • Local: localhost does not work. Use <Internal IP> like 192.x.x.x. How to find Internal IP address: https://www.avast.com/c-how-to-find-ip-address
    MLFLOW_S3_ENDPOINT_URL=http://<EXTERNAL/INTERNAL IP>:9000
    MLFLOW_TRACKING_URI=http://<EXTERNAL/INTERNAL IP>:5000
    
    _AIRFLOW_WWW_USER_USERNAME=airlfow
    _AIRFLOW_WWW_USER_PASSWORD=airlfow
    
    AWS_ACCESS_KEY_ID=mlflow
    AWS_SECRET_ACCESS_KEY=mlflow
    
    AWS_REGION=us-east-1
    AWS_BUCKET_NAME=mlflow
    MYSQL_DATABASE=mlflow
    MYSQL_USER=mlflow
    MYSQL_PASSWORD=mlflow
    MYSQL_ROOT_PASSWORD=mlflow_pwd
    
    MONGODB_USER=airflow
    MONGODB_PWD=airflow
    MONGODB_HOST=mongoservice
    MONGODB_PORT=27017
    
    # if you want to use fred data
    FRED_API_KEY=<FRED_API_KEY>
    

 

How to run

  • run docker-compose

    $ docker-compose up
    $ docker-compose up --build --remove-orphans --force-recreate
    $ docker-compose up --build --remove-orphans --force-recreate --detach
  • stop docker-compose

    $ docker-compose down
    $ docker-compose down --volumes --remove-orphans
  • remove data

    # clean up airflow & mlflow data without MongoDB(BTC, ETH, googleNews, FRED)data)
    $ make clean_up
    # clean up all data 
    $ make clean_up_all
  • To get OHLCV of BTC run de-upbit2db DAG

  • To train ARIMA and auto-ARIMA run ml-arima_pipeline DAG

    • At least 121 days of BTC ohlcv must be collected to run ARIMA DAG.
    • You can change dag configs on each DAG python file for smaller ohlcvs days.
  • Airflow:

  • MLflow

  • MinIO

 

Data Time

  • Data Request Time
    • Upbit Data : every hour from 00:00 (UCT)
    • Google News : every day from 00:00 (Eastern Time: EST EDT)
    • Fred Data : every day from 00:00 (Eastern Time: EST EDT) (missing on weekend & holidays)

 

Check data

$ docker ps -a --filter name=mongo 
$ docker exec -it <CONTAINER ID> /bin/bash    
$ mongo -u airflow -p airflow
$ show dbs
$ use test_db
$ show collections
$ db["<collection_name>"].find()
$ db["USDT-BTC"].find({}).sort({"candle_date_time_utc":1}).limit(1);
$ db.dropDatabase() # to Drop database
$ db["USDT-BTC"].find({}).sort({"utc_time":-1}).limit(1)
$ db["news"].find({}).sort({"etz_time":-1}).limit(1);
$ db["fred"].find({}).sort({"etz_time":-1}).limit(1);

 

(Optional) Prerequisites for MongoDB on local

  • Make mongodb user
  • Connecting from a Docker container to a local MongoDB
  • Use docker-compose-localdb.yaml as docker-compose.yaml
    $ docker-compose -f docker-compose.localdb.yaml up
    $ docker-compose -f docker-compose.localdb.yaml up --build --remove-orphans --force-recreate
    $ docker-compose -f docker-compose.localdb.yaml up --build --remove-orphans --force-recreate --detach
    $ docker-compose -f docker-compose.localdb.yaml down
    $ docker-compose -f docker-compose.localdb.yaml down --volumes --remove-orphans

References

About

An implementation of continuous training and continuous model deployment using dockerized Airflow & MLflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published