Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

Latest commit

 

History

History
141 lines (115 loc) · 5.43 KB

README.md

File metadata and controls

141 lines (115 loc) · 5.43 KB

Lale.jl: Julia wrapper of python's lale package

Documentation Build Status Help

Lale.jl is a Julia wrapper of Python's Lale library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.

More details of the design can be found in the paper: Lale AutoML@KDD.

Instructions for Lale developers can be found here.

For a quick notebook demo: Lale Notebook Demo or you can view it with NBViewer.

Package Features

  • automation: provides a consistent high-level interface to existing pipeline search tools including Hyperopt, GridSearchCV, and SMAC
  • correctness checks: uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators
  • interoperability: supports growing library of transformers and estimators

Here is an example of a typical Lale pipeline using the following processing elements: Principal Component Analysis (PCA), NoOp (no operation), Random Forest Regression (RFR), and Decision Tree Regression (DTree):

lalepipe  = (PCA + NoOp) >> (RFR | DTree)
laleopt   = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletr    = fit!(laleopt, Xtrain,Ytrain)
pred      = transform!(laletr,Xtest)

The block of code above will jointly search the optimal hyperparameters of both Random Forest and Decision Tree learners and select the best learner while at the same time searching the optimal hyperparameters of the PCA.

The pipe combinator, p1 >> p2, first runs sub-pipeline p1 and then pipes its output into sub-pipeline p2. The union combinator, p1 + p2, runs sub-pipelines p1 and p2 separately over the same data, and then concatenates the output columns of both. The or combinator, p1 | p2, creates an algorithmic choice for the optimizer to search and select which between p1 and p2 yields better results.

Installation

Lale is in the Julia General package registry. The latest release can be installed from the julia prompt:

julia> using Pkg
julia> Pkg.update()
julia> Pkg.add("Lale")

or use Julia's pkg shell which can be triggered by ]

julia> ]
pkg> update
pkg> add Lale

Sample Lale Workflow

using Lale
using DataFrames: DataFrame

# load data
iris = getiris()
Xreg = iris[:,1:3] |> DataFrame
Yreg = iris[:,4]   |> Vector
Xcl  = iris[:,1:4] |> DataFrame
Ycl  = iris[:,5]   |> Vector

# regression dataset
regsplit = train_test_split(Xreg,Yreg;testprop = 0.20)
trXreg,trYreg,tstXreg,tstYreg = regsplit

# classification dataset
clsplit = train_test_split(Xcl,Ycl;testprop = 0.20)
trXcl,trYcl,tstXcl,tstYcl = clsplit

# lale ops
pca     = laleoperator("PCA")
rb      = laleoperator("RobustScaler")
noop    = laleoperator("NoOp","lale")
rfr     = laleoperator("RandomForestRegressor")
rfc     = laleoperator("RandomForestClassifier")
treereg = laleoperator("DecisionTreeRegressor")

# Lale regression
lalepipe  = (pca + noop) >>  (rfr | treereg )
lale_hopt = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletrain = fit(lale_hopt,trXreg,trYreg)
lalepred  = transform(laletrain,tstXreg)
score(:rmse,lalepred,tstYreg) |> println

# Lale classification
lalepipe  = (rb + pca) |> rfc
lale_hopt = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletrain = fit(lale_hopt,trXcl,trYcl)
lalepred  = transform(laletrain,tstXcl)
score(:accuracy,lalepred,tstYcl) |> println

Moreover, Lale is also compatible with AutoMLPipeline @pipeline syntax:

# regression pipeline
regpipe      = @pipeline (pca + rb) |>  rfr
regmodel     = fit(regpipe,trXreg, trYreg)
regpred      = transform(regmodel,tstXreg)
regperf(x,y) = score(:rmse,x,y)
regperf(regpred, tstYreg) |> println
crossvalidate(regpipe,Xreg,Yreg,regperf)

# classification pipeline
clpipe         = @pipeline (pca + noop) |>  rfc
clmodel        = fit(clpipe,trXcl, trYcl)
clpred         = transform(clmodel,tstXcl)
classperf(x,y) = score(:accuracy,x,y)
classperf(clpred, tstYcl) |> println
crossvalidate(clpipe,Xcl,Ycl,classperf)