Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax autotest importance and selected features naming requirement #495

Open
RaphaelS1 opened this issue Apr 29, 2020 · 4 comments
Open

Comments

@RaphaelS1
Copy link
Contributor

There are a few learners which, for some reason or other, (usually conversion to model matrix), change the original variable names. This means that variable importance and selected features could be offered by the learner but can't be because autottest requires these to return the same variable names as the original task. It's a shame to have to get rid of these useful methods because of this restriction.

A few alternative suggestions:

  1. Allow importance and selected_features to be added to the exclude argument in run_autotest
  2. In the autotest, first check if these names match original task names and then check if they match a model.matrix name
  3. Remove this check all together

Reprex below for an important example (i.e. where selection is often of interest)

library(mlr3); library(mlr3learners.mboost)
learn = lrn("regr.glmboost")
task = tsk("boston_housing")
learn$train(task)
variable.names(learn$model, usedonly = TRUE)
#>             (Intercept)                   cmedv   townBoston Savin Hill 
#>           "(Intercept)"                 "cmedv" "townBoston Savin Hill" 
#>           townLexington              townNatick               townSalem 
#>         "townLexington"            "townNatick"             "townSalem" 
#>            townWinthrop 
#>          "townWinthrop"

Created on 2020-04-29 by the reprex package (v0.3.0)

@pat-s
Copy link
Member

pat-s commented Apr 29, 2020

It's a shame to have to get rid of these useful methods because of this restriction.

I'd say its unfortunate, not a shame.

I'd say 2) sounds good. In your example, the differences are only the quotes and each variable appears twice therefore? This could be easily accounted for internally. Are there any other, more complicated, cases of name alteration?

@RaphaelS1
Copy link
Contributor Author

RaphaelS1 commented May 7, 2020

Sorry I forgot to reply before. The example above omits the original learner names, see the reprex below. Contrast the names returned by the model (first output) to the originals (second)

library(mlr3); library(mlr3learners.mboost)
learn = lrn("regr.glmboost")
task = tsk("boston_housing")
learn$train(task)
variable.names(learn$model, usedonly = TRUE)
#>             (Intercept)                   cmedv   townBoston Savin Hill 
#>           "(Intercept)"                 "cmedv" "townBoston Savin Hill" 
#>           townLexington              townNatick               townSalem 
#>         "townLexington"            "townNatick"             "townSalem" 
#>            townWinthrop 
#>          "townWinthrop"
task$feature_names
#>  [1] "age"     "b"       "chas"    "cmedv"   "crim"    "dis"     "indus"  
#>  [8] "lat"     "lon"     "lstat"   "nox"     "ptratio" "rad"     "rm"     
#> [15] "tax"     "town"    "tract"   "zn"

Created on 2020-05-07 by the reprex package (v0.3.0)

@mllg mllg modified the milestone: v1.0.0 (stable) Jul 30, 2020
@mllg
Copy link
Member

mllg commented Jul 30, 2020

Related to #401.

@mllg
Copy link
Member

mllg commented Jul 9, 2021

selectected_features can easily be fixed. For importance, it is more difficult because it is unclear how to aggregate multiple scores into a single one.

Do you have an overview which learners need to be patched?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants