Curated list of models #716

juliohm · 2020-11-29T01:29:45Z

I am opening this issue to discuss the possibility of a curated list of models.

Right now end-users are forced to rely on a non-trivial macro @load that fails depending on the scope (local vs. global) and can be considered advanced for newcomers.

My opinion is that a curated list should be the recommended workflow where users don't need to bother installing dependencies manually:

using MLJ

# well-tested models available
m1 = DecisionTreeClassifier()
m2 = KNeighborsClassifier()
...

This curated list could be made a dependency of the umbrella package. I don't think users would complain about too many dependencies given that any modern ML pipeline nowadays runs dozens of models at least.

cc: @DilumAluthge

The text was updated successfully, but these errors were encountered:

DilumAluthge · 2020-11-29T05:22:42Z

This curated list could be made a dependency of the umbrella package. I don't think users would complain about too many dependencies given that any modern ML pipeline nowadays runs dozens of models at least.

Can you clarify what the "umbrella package" is?

If the "umbrella package" is MLJ.jl, then I would definitely complain. I don't want ] add MLJ to install the entire kitchen sink.

What's wrong with asking users to install MLJCuratedModels.jl if they want the curated list?

DilumAluthge · 2020-11-29T05:30:06Z

If the "umbrella package" is MLJ.jl, then I would definitely complain. I don't want ] add MLJ to install the entire kitchen sink.

For example, the ensemble functionality lives inside MLJ.jl. I would be quite annoyed if I had to install a whole bunch of unrelated packages just so I could use MLJ's ensemble functionality.

DilumAluthge · 2020-11-29T05:33:37Z

Now, on the other hand, if we first moved ALL of the functionality out of MLJ.jl into other repos, then I would have no problem adding a whole bunch of dependencies to MLJ.jl.

But as long as there is functionality in MLJ.jl that is not available in another package (MLJBase.jl, etc.), then I am opposed to adding lots of dependencies to MLJ.jl.

DilumAluthge · 2020-11-29T05:37:36Z

So I guess the two options are:

Keep MLJ.jl the way it is, and put the curated list in a separate MLJCuratedModels.jl package.
Move ALL of the actual features/functionality out of MLJ.jl into separate packages. Once this process is done, we can add MLJCuratedModels.jl as a dependency of MLJ.jl.

ablaom · 2020-11-30T02:56:06Z

For the record, MLJ is not intended to load any code, but still has the ensemble.jl stuff. The plan has always been to remove this. Maybe there are few other small things too, I forget.

Also, @load has been recently improved to eliminate some possible strange behaviour. And - after JuliaAI/MLJModels.jl#244 is complete (almost there!) - @load should work from within packages for any model (only KNN models still use Requires.jl).

I very much like @DilumAluthge 's proposal JuliaAI/MLJModels.jl#346 to address the beginner's problem.

@juliohm What do you think?

ablaom · 2020-11-30T20:36:07Z

Also, if you want to directly load a model (no macros) you can do load_path to find out the location:

julia> load_path("PCA")
"MLJMultivariateStatsInterface.PCA"

julia> load_path("RandomForestRegressor")
ERROR: ArgumentError: Ambiguous model name. Use pkg=... .
The model RandomForestRegressor is provided by these packages:
 ["DecisionTree", "ScikitLearn"].

Stacktrace:
 [1] info(::String; pkg::Nothing) at /Users/anthony/.julia/packages/MLJModels/GyILf/src/model_search.jl:80
 [2] load_path(::String; pkg::Nothing) at /Users/anthony/.julia/packages/MLJModels/GyILf/src/loading.jl:32
 [3] load_path(::String) at /Users/anthony/.julia/packages/MLJModels/GyILf/src/loading.jl:32
 [4] top-level scope at REPL[16]:1

julia> load_path("RandomForestRegressor", pkg="ScikitLearn")
"MLJScikitLearnInterface.RandomForestRegressor"

julia> using MLJScikitLearnInterface

julia> import MLJScikitLearnInterface.RandomForestRegressor

julia> RandomForestRegressor()
RandomForestRegressor(
    n_estimators = 100,
    criterion = "mse",
    max_depth = nothing,
    min_samples_split = 2,
    min_samples_leaf = 1,
    min_weight_fraction_leaf = 0.0,
    max_features = "auto",
    max_leaf_nodes = nothing,
    min_impurity_decrease = 0.0,
    bootstrap = true,
    oob_score = false,
    n_jobs = nothing,
    random_state = nothing,
    verbose = 0,
    warm_start = false,
    ccp_alpha = 0.0,
    max_samples = nothing) @245

juliohm · 2020-11-30T20:59:03Z

I think my concern is twofold: (1) we still need manual intervention to get a new model into an existing session. This could be addressed with a prompt installation option yes/no triggered by @load whenever a package is missing and the user could just press ENTER. (2) We have too many implementations of the same model and the user doesn't know which one to use. This could be solved with a curated list of "best" well-maintained, pure Julia implementations. For example, DecisionTree.jl is quite mature now and it doesn't make much sense to load sklearn trees or other tree implementations from other languages. I guess we can find similar examples where a single best implementation in pure Julia could be promoted to new Julia users. Keep in mind that a beginner user just wants to load a decision tree, no matter where it comes from, no matter the internal implementation details. He just wants something well-tested that works.

juliohm · 2020-12-13T23:28:11Z

For the record, MLJ is not intended to load any code, but still has the ensemble.jl stuff. The plan has always been to remove this. Maybe there are few other small things too, I forget.

I fully support this idea. MLJ.jl would therefore provide a more user-friendly installation for users who are not writing packages, but actually writing ML pipelines for solving their problems with various models from a curated list. Advanced users seeking a more lightweight dependency to add to their own packages could be using a subpackage of the MLJ.jl stack like MLJBase.jl and MLJModelInterface.jl, and possibly a MLJEnsemble.jl.

In summary, one must always keep in mind two types of users:

Users who want to write ML pipelines with well-tested and readily available models, who don't care about a long list of dependencies in their final application or Pluto notebook.
Package writers who want to interface with the MLJ stack and use a subset of the functionality encountered in subpackages like MLJBase.jl, but cannot afford a dependency on model packages like DecisionTree.jl

juliohm mentioned this issue Nov 29, 2020

Unsupported const declaration #715

Closed

juliohm closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curated list of models #716

Curated list of models #716

juliohm commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020 •

edited

Loading

DilumAluthge commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020

ablaom commented Nov 30, 2020 •

edited

Loading

ablaom commented Nov 30, 2020

juliohm commented Nov 30, 2020 •

edited

Loading

juliohm commented Dec 13, 2020

Curated list of models #716

Curated list of models #716

Comments

juliohm commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020 • edited Loading

DilumAluthge commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020

DilumAluthge commented Nov 29, 2020

ablaom commented Nov 30, 2020 • edited Loading

ablaom commented Nov 30, 2020

juliohm commented Nov 30, 2020 • edited Loading

juliohm commented Dec 13, 2020

DilumAluthge commented Nov 29, 2020 •

edited

Loading

ablaom commented Nov 30, 2020 •

edited

Loading

juliohm commented Nov 30, 2020 •

edited

Loading