Skip to content

Don't coerce sparse data to non-sparse during predict() #950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 17, 2023
Merged

Conversation

EmilHvitfeldt
Copy link
Member

This bug was first reported here: https://community.rstudio.com/t/predict-not-working-with-ranger-model-when-using-sparse-data/163352/7

When you call predict() you eventually get to run prepare_data(), which didn't know about the alllow_sparse_x encoding so it would try to turn it into matrices/data.frames. This was bad for 2 reasons. First we are losing some performance. Secondly, in the case of the ranger method, it wants the data as a data.frame which a sparse data can't be turned into, yielding the error seen below.

Main

library(parsnip)

data(agaricus.train, package = 'xgboost')
train <- agaricus.train

rf_model <- parsnip::rand_forest(trees = 100) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_fit <- fit_xy(rf_model, x = train$data, y = factor(train$label))

predict(rf_fit, train$data)
#> Error in as.data.frame.default(new_data): cannot coerce class 'structure("dgCMatrix", package = "Matrix")' to a data.frame

This PR

library(parsnip)

data(agaricus.train, package = 'xgboost')
train <- agaricus.train

rf_model <- parsnip::rand_forest(trees = 100) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_fit <- fit_xy(rf_model, x = train$data, y = factor(train$label))

predict(rf_fit, train$data)
#> # A tibble: 6,513 × 1
#>    .pred_class
#>    <fct>      
#>  1 1          
#>  2 0          
#>  3 0          
#>  4 1          
#>  5 0          
#>  6 0          
#>  7 0          
#>  8 1          
#>  9 0          
#> 10 0          
#> # ℹ 6,503 more rows

@EmilHvitfeldt
Copy link
Member Author

EmilHvitfeldt commented Apr 12, 2023

Small speedup benchmark. (this is likely to vary majorly for different data, but it should almost always be in the right direction

Code used for the following reprexes
library(parsnip)

data(agaricus.train, package = 'xgboost')
train <- agaricus.train

xg_model <- parsnip::boost_tree(trees = 100) %>% 
  set_engine("xgboost") %>% 
  set_mode("classification")

xg_fit <- fit_xy(xg_model, x = train$data, y = factor(train$label))

Main

bench::mark(
  old = predict(xg_fit, train$data)
)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old          8.96ms   9.55ms      105.     7.2MB     60.4

This PR

bench::mark(
  new = predict(xg_fit, train$data)
)
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new          6.24ms   6.65ms      150.    1.13MB     4.17

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. :)

Closes #694. Does this also address #690?

Could we also clarify in these docs that allow_sparse_x now applies to predict()ion, too? (I think?) Are there any engines that would allow sparsity at fit() but not predict() time?

parsnip/R/aaa_models.R

Lines 553 to 555 in 51b0cd7

#' Finally, `allow_sparse_x` specifies whether the model function can natively
#' accommodate a sparse matrix representation for predictors during fitting
#' and tuning.

@topepo topepo merged commit 145bac2 into main May 17, 2023
@topepo topepo deleted the sparse-predict branch May 17, 2023 01:00
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants