-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CppMethod
error when applying prepped UMAP recipe after saving/reading as .rds
#84
Comments
I am very late to discovering this, but yes this is almost certainly because of the underlying UMAP package (uwot), which uses RcppAnnoy, which itself wraps the C++ library Annoy to find approximate nearest neighbors. The I do intend to fix this but my current solution involves writing an entirely new approximate nearest neighbors package. As that and maintaining |
Thanks for the message @jlmelville and for your work on uwot! 🙌 We also are thinking about serialization for trained model objects like xgboost, torch, etc, that have native methods for saving/loading. Definitely an area that needs some attention from all of us! |
This has now been solved with the new bundle package: library(tidymodels)
library(tidyverse)
library(embed)
split <- seq.int(1, 150, by = 9)
tr <- iris[-split, ]
te <- iris[ split, ]
set.seed(11)
supervised <-
recipe(Species ~ ., data = tr) %>%
step_center(all_predictors()) %>%
step_scale(all_predictors()) %>%
step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
prep(training = tr)
library(bundle)
temp_file <- fs::file_temp(pattern = "umap", ext = "rds")
bundle(supervised) %>% write_rds(temp_file)
saved_rec <- read_rds(temp_file)
unbundle(saved_rec) %>% bake(new_data = te)
#> # A tibble: 17 × 3
#> Species UMAP1 UMAP2
#> <fct> <dbl> <dbl>
#> 1 setosa 13.3 2.93
#> 2 setosa 12.0 4.69
#> 3 setosa 14.5 3.12
#> 4 setosa 13.5 3.07
#> 5 setosa 13.4 2.99
#> 6 setosa 12.0 4.86
#> 7 versicolor -10.1 8.80
#> 8 versicolor -9.79 8.28
#> 9 versicolor -4.91 -11.6
#> 10 versicolor -9.66 6.12
#> 11 versicolor -10.1 6.61
#> 12 versicolor -10.3 6.98
#> 13 virginica -4.14 -11.6
#> 14 virginica -2.69 -12.1
#> 15 virginica -4.06 -10.3
#> 16 virginica -1.73 -11.5
#> 17 virginica -2.33 -10.9 Created on 2022-09-16 with reprex v2.0.2 We should document somewhere that this step needs to be bundled for use in a new session. How do you all want to do that? |
Looks like I need to get in on this bundle thing... |
I think we should document it as a section. Like we do with |
Agreed. We just did this for the parsnip engine docs. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Seems like there is a bug 🐛 for
step_umap()
when trying to save a prepped recipe as.rds
and reading it back to apply it new data.Created on 2021-08-02 by the reprex package (v2.0.0)
I'm sure this is not us (i.e. not the embed package) but I wonder if there is anything we can do about this.
The recipe is fine if you don't save as
.rds
and then read it back.The text was updated successfully, but these errors were encountered: