Skip to content

hsuyuming/airflow_mlflow_kaggle_practice

Repository files navigation

ML code in a real-world ML system is a lot smaller than the infrastructure

Deep learning use cases in the real world.

Machine Learning Platform

About this tutorial

After this tutorial, you will know :

Apache airflow - a platform to programmatically author, schedule, and monitor workflows.

kaggle - an online community of data scientists and machine learners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

jupyter notebook - an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

mlflow - An open source platform for the machine learning lifecycle

Prerequisites

  • Ubuntu >= 16.04
  • Docker
  • Docker-compose
  • memory >= 5G

Installation

sudo apt install docker-compose # install docker-compose
sudo apt-get install docker.io # install docker
service docker status

Install Docker Desktop on Mac

docker doc

Join Kaggle Competition

House Prices: Advanced Regression Techniques

Setting kaggle user name and API key in kaggle.json

create a kaggle's public key

cd airflow
vim kaggle.json
# {"username":"<Kaggle account username>", "key":"<API key>"}

Build

sudo docker-compose build

Usage

sudo docker-compose -f docker-compose.yml up

UI Links

  • mlflow : localhost:5000
  • jupyter notebook : localhost:7000
  • airflow : localhost:8080

1. Turn on airflow DAG

2. Trigger DAG

3. Open data_visualization.ipynb and start visualizing data

4. mlflow compare ml experiments

5. Try to optimize ML model

# open ./dags/src/training.py and tune parameters
params = {
        "colsample_bytree": 0.4603,
        "gamma": 0.0468,
        "learning_rate": 0.05,
        "max_depth": 20,
        "min_child_weight": 2,
        "n_estimators": 2200,
        "reg_alpha": 0.4640,
        "reg_lambda": 0.8571,
        "subsample": 0.5213,
        "random_state": 7,
        "nthread": -1
    }

6. Kaggle Leaderboard

Leaderboard

vim /usr/local/airflow/.local/lib/python3.7/site-packages/mlflow/lightgbm.py :set nu 90gg lgb_model.booster_.save_model(model_data_path)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published