This repository accompanying the blog post:
Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn - by Eli Simhayev & Benjamin Bodner
When I wrote this blog-post, the stable version of Hydra was 1.1. Now, the stable version is 1.3, so note that this code work with Hydra 1.1 :)
Run:
python main.py preprocessing_pipeline=decision_tree
to execute the decision_tree
preprocessing pipeline. You might also run other pipelines (from configs/preprocessing_pipeline
)
by just changing:
python main.py preprocessing_pipeline=<your-pipeline>
Hydra also supports Tab completion to complete config.
Adding new pipelines can be easily done using a yaml configuration in configs/preprocessing_pipeline
.
You might add another configurations: which model to use, which visualizations, etc. - learn more here: Hydra — A fresh look at configuration for machine learning projects