Remove the need to annotate supervised models with a fitresult type #93

ablaom · 2019-02-26T23:48:25Z

Recall that presently one is required to declare the type of fitresult returned by a supervised model's fit method in the model struct declaration, as in:

const FitResultType{F} = Tuple(Matrix{F},Vector{F})
mutable struct KNNRegressor{F} <: Deterministic{FitResultType{F}} 
...
end

One is allowed to use Any as the fitresult type, but for efficient ensembling of your model this should be concrete. This is a pain point because the concrete type can be quite complicated. It also may depend on the data and/or hyperparameters in ways that are difficult to anticipate - DecisionTree.jl is a case-in-point - although this is arguably a design flaw at the level of the algorthim being wrapped.

There may be other reasons for declaring this type but I think ensembling is the main one. However, on reviewing ensembling (a legacy of Koala), I don't see why the type could not be inferred by the compiler if this code is refactored appropriately, and we could potentially simplify the API.

Investigating this is not currently a priority for me, but I am optimistically opening this issue anyway. For now I think model API implementers should feel free to declare their fitresult types as Any.

The text was updated successfully, but these errors were encountered:

ablaom · 2019-05-01T07:44:09Z

Progress report:

I have benchmarked 9 atomic models in ensembles of size 500, run on the existing code and smarter alternative code that does not need the fitrestult types (see below). The only statistically significant differences detected are, from a practical point of view, slight. The left and right columns refer to the 95% confidence intervals for run times.

So I will be proceeding with the refactoring of MLJBase, MLJ, MLJModels to eliminate the pesky fit-result type requirement. This will simplify the model interface and eliminate some confusion. This refactoring does not effect regular MLJ users, except advanced users building learning networks, and only in a miniscule way. I hope to have the changes ready in 2-4 days.

I am sorry I did not get this right from the beginning. This represents some of the earliest code I wrote for Koala.

run on alternative code (no use of atomic model fit-result type)

│ Row │ model                         │ left      │ right     │ memory    │ allocs │
│     │ Union{Missing, String}        │ Float64⍰  │ Float64⍰  │ Int64⍰    │ Int64⍰ │
├─────┼───────────────────────────────┼───────────┼───────────┼───────────┼────────┤
│ 1   │ ConstantClassifier @ 1…95     │ 2.05649e7 │ 2.09617e7 │ 9334208   │ 56142  │
│ 2   │ SVMClassifier @ 7…15          │ 5.60209e8 │ 5.8366e8  │ 26074448  │ 296399 │
│ 3   │ GPClassifier @ 6…49           │ 6.46297e8 │ 7.47014e8 │ 358806368 │ 75646  │
│ 4   │ DecisionTreeClassifier @ 1…62 │ 5.94981e7 │ 6.07129e7 │ 16535376  │ 103086 │
│ 5   │ OLSRegressor @ 5…03           │ 1.92026e8 │ 2.29164e8 │ 121889568 │ 73646  │
│ 6   │ RidgeRegressor @ 1…21         │ 9.29217e7 │ 9.48835e7 │ 78529600  │ 148147 │
│ 7   │ ConstantRegressor @ 1…62      │ 2.71332e7 │ 2.77541e7 │ 29765552  │ 49146  │
│ 8   │ KNNRegressor @ 1…05           │ 7.81976e7 │ 1.03665e8 │ 49393600  │ 52646  │
│ 9   │ DecisionTreeRegressor @ 1…49  │ 5.36255e8 │ 5.88652e8 │ 80310016  │ 255238 │

run on master (using declared fit result type)

julia> master = CSV.read("master.csv")
9×5 DataFrames.DataFrame
│ Row │ model                         │ left      │ right     │ memory    │ allocs │
│     │ Union{Missing, String}        │ Float64⍰  │ Float64⍰  │ Int64⍰    │ Int64⍰ │
├─────┼───────────────────────────────┼───────────┼───────────┼───────────┼────────┤
│ 1   │ ConstantClassifier @ 1…84     │ 2.00471e7 │ 2.0614e7  │ 9338224   │ 56138  │
│ 2   │ SVMClassifier @ 2…15          │ 5.0585e8  │ 5.20881e8 │ 25998624  │ 291405 │
│ 3   │ GPClassifier @ 8…22           │ 6.35922e8 │ 7.409e8   │ 358810384 │ 75642  │
│ 4   │ DecisionTreeClassifier @ 1…37 │ 5.87301e7 │ 6.0047e7  │ 16539120  │ 103075 │
│ 5   │ OLSRegressor @ 4…98           │ 2.00281e8 │ 2.28798e8 │ 121893584 │ 73642  │
│ 6   │ RidgeRegressor @ 1…24         │ 9.58506e7 │ 1.04885e8 │ 78533616  │ 148143 │
│ 7   │ ConstantRegressor @ 8…27      │ 2.69583e7 │ 2.75019e7 │ 29769568  │ 49142  │
│ 8   │ KNNRegressor @ 6…08           │ 1.01598e8 │ 1.33261e8 │ 68845616  │ 54142  │
│ 9   │ DecisionTreeRegressor @ 2…85  │ 5.37627e8 │ 5.78552e8 │ 80313952  │ 255233 │

julia>

ablaom · 2019-05-09T03:42:10Z

Resolved on all MLJBase/MLJModels/MLJ master branches. Live in upcoming releases 0.2.0, 0.2.0, 0.2.0, respectively

ablaom self-assigned this Feb 26, 2019

This was referenced Feb 27, 2019

Implement K-Nearest Neighbours #87

Closed

[WIP] Initial NaiveBayes classifier JuliaAI/MLJModels.jl#5

Merged

ablaom mentioned this issue May 1, 2019

Implementing MLJ model interface for EvoTrees.jl #122

Closed

ablaom closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the need to annotate supervised models with a fitresult type #93

Remove the need to annotate supervised models with a fitresult type #93

ablaom commented Feb 26, 2019 •

edited

Loading

ablaom commented May 1, 2019

ablaom commented May 9, 2019

Remove the need to annotate supervised models with a fitresult type #93

Remove the need to annotate supervised models with a fitresult type #93

Comments

ablaom commented Feb 26, 2019 • edited Loading

ablaom commented May 1, 2019

ablaom commented May 9, 2019

ablaom commented Feb 26, 2019 •

edited

Loading