Skip to content

aorsf - engine: model fit fails if mtry is specified #1276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MasterLuke84 opened this issue Aug 8, 2024 · 3 comments
Open

aorsf - engine: model fit fails if mtry is specified #1276

MasterLuke84 opened this issue Aug 8, 2024 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@MasterLuke84
Copy link

Hi,

the model fit fails if mtry is specified for the aorsf-engine. If it is not specified, it works with the default engine values.

library(bonsai)
#> Loading required package: parsnip

# This works with default mtry value
rf_mod <- 
  rand_forest() %>%
  set_engine(engine = "aorsf") %>%
  set_mode(mode = "regression") %>% 
  set_args(min_n = 1, trees = 2, importance = "permute") %>% 
  fit(
    formula = mpg  ~ . , 
    data = mtcars 
  )


rf_mod
#> parsnip model object
#> 
#> ---------- Oblique random regression forest
#> 
#>      Linear combinations: Accelerated Linear regression
#>           N observations: 32
#>                  N trees: 2
#>       N predictors total: 10
#>    N predictors per node: 4
#>  Average leaves per tree: 7.5
#> Min observations in leaf: 1
#>           OOB stat value: 0.27
#>            OOB stat type: RSQ
#>      Variable importance: permute
#> 
#> -----------------------------------------



# Error occurs...
rf_mod_w_mtry <- 
  rand_forest() %>%
  set_engine(engine = "aorsf") %>%
  set_mode(mode = "regression") %>% 
  set_args(mtry = 3, min_n = 1, trees = 2, importance = "permute") %>% 
  fit(
    formula = mpg  ~ . , 
    data = mtcars 
  )
#> Error in ncol(source): object 'x' not found

Created on 2024-08-08 with reprex v2.0.2


Thank you in advance and best regards
@MasterLuke84 MasterLuke84 changed the title aorsf - enginemodel fit fails if mtry is specified aorsf - engine: model fit fails if mtry is specified Aug 8, 2024
@simonpcouch
Copy link
Contributor

Thanks for the issue! Just confirming that I can reproduce this and 1) it does seem to be aorsf-specific (i.e. xgboost is not an issue) and 2) it doesn't seem to be due to any changes in parsnip (issue persists with parsnip v1.0.0). min_cols() seems to be evaluated in a different environment than its usual.

@bcjaeger
Copy link
Contributor

bcjaeger commented Sep 21, 2024

Just saw this and thought I'd check it out. The issue appears to occur in parsnip:::make_form_call:

If we modify:

  if (object$engine == "spark") {
    env$x <- env$data
  }

to

  if (object$engine %in% c("spark", "aorsf")) {
    env$x <- env$data
  }

the x object will be recognized when you specify mtry.

@simonpcouch - would this solution be too hacky? A more general approach might change the call to aorsf::orsf, making it so that instead of mtry = min_cols(~3, x) we use mtry = min_cols(~3, data)

@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Jun 11, 2025
@EmilHvitfeldt EmilHvitfeldt transferred this issue from tidymodels/bonsai Jun 11, 2025
@EmilHvitfeldt
Copy link
Member

This is what happens:

For the ranger engine, you end up in form_xy() where x is passed in the evaluation environment

parsnip/R/fit_helpers.R

Lines 148 to 157 in 6d4c684

env$x <- data_obj$x
env$y <- data_obj$y
res <- xy_xy(
object = object,
env = env, #weights!
control = control,
target = target,
call = call
)

On the other hand, with the aorsf engine, you end up in form_form().

parsnip/R/fit_helpers.R

Lines 40 to 55 in 6d4c684

fit_call <- make_form_call(object, env = env)
res <- list(
lvl = y_levels$lvl,
ordered = y_levels$ordered,
spec = object
)
time <- proc.time()
res$fit <- eval_mod(
fit_call,
capture = control$verbosity == 0,
catch = control$catch,
envir = env,
...
)

The created call is

aorsf::orsf(formula = mpg ~ ., data = data, mtry = min_cols(~3, 
    x), n_tree = ~2, leaf_min_obs = ~1, n_thread = 1, verbose_progress = FALSE)

and the env has the elements "formula", "data", and "weights".

Leading to the error because we try to evaluate x in min_cols(~3, x) while it isn't available anywhere.

We could either do what @bcjaeger suggests, OR add a env$x <- env$data in form_form().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants