After this tutorial, you will know :
Apache airflow - a platform to programmatically author, schedule, and monitor workflows.
kaggle - an online community of data scientists and machine learners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
jupyter notebook - an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text
mlflow - An open source platform for the machine learning lifecycle
- Ubuntu >= 16.04
- Docker
- Docker-compose
- memory >= 5G
sudo apt install docker-compose # install docker-compose
sudo apt-get install docker.io # install docker
service docker status
House Prices: Advanced Regression Techniques
cd airflow
vim kaggle.json
# {"username":"<Kaggle account username>", "key":"<API key>"}
sudo docker-compose build
sudo docker-compose -f docker-compose.yml up
- mlflow : localhost:5000
- jupyter notebook : localhost:7000
- airflow : localhost:8080
# open ./dags/src/training.py and tune parameters
params = {
"colsample_bytree": 0.4603,
"gamma": 0.0468,
"learning_rate": 0.05,
"max_depth": 20,
"min_child_weight": 2,
"n_estimators": 2200,
"reg_alpha": 0.4640,
"reg_lambda": 0.8571,
"subsample": 0.5213,
"random_state": 7,
"nthread": -1
}
vim /usr/local/airflow/.local/lib/python3.7/site-packages/mlflow/lightgbm.py :set nu 90gg lgb_model.booster_.save_model(model_data_path)