Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grf qr engine #360

Merged
merged 18 commits into from
Aug 27, 2024
Merged

Grf qr engine #360

merged 18 commits into from
Aug 27, 2024

Conversation

dajmcdon
Copy link
Contributor

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Make sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).

@dajmcdon dajmcdon requested a review from dshemetov July 12, 2024 21:30
@dajmcdon dajmcdon requested a review from dsweber2 August 12, 2024 23:09
Copy link
Contributor

@dsweber2 dsweber2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing make_quantile_reg and make_grf_quantiles, it looks like they're both using the same model/engine/mode settings. I remember some discussion about maybe switching so that quantile_regression was a separate mode? I guess if that's happening it's happening some later version?

Also, I'm assuming all parameters not here are just set to their defaults, so sample.fraction=0.5; is there some reason for this particular trio of parameters, and not including say sample.fraction, honesty, honesty.fraction, honesty.prune.leaves, etc?

Overall, mostly just some clarification stuff and some bike-shedding w.r.t which parameters we include. I think it's good to go.

set.seed(12345)
library(grf)
tib <- tibble(
y = rnorm(100), x = rnorm(100), z = rnorm(100),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y = rnorm(100), x = rnorm(100), z = rnorm(100),
+ y = rnorm(100), x = rnorm(100), z = y-x+ .01 * rnorm(100)+1,

if we make it so that there's something other than noise, the fit models are the same regardless of the state of the seed.
So I tried this, and got different results still, unfortunately.

colnames(p_manual) <- c("0.1", "0.5", "0.9")
p_manual <- tibble::as_tibble(p_manual)
# not equal despite the seed, etc
# expect_equal(p_pars, p_manual)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that this is because fseed specifies the c++ seed, whereas set_engine(...seed=fseed) is specifying an R seed, which isn't getting passed down. We may just want to set the tolerances large on expect_equal or something along those lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the manual twice and got something roughly at the same size. Given that this is just a wrapper I'm not sure we need to do too extensive of testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that these two should result in the same thing. So, I'm a bit miffed it doesn't.

library(epipredict)
library(grf)
tib <- tibble::tibble(
  y = rnorm(100), x = rnorm(100), z = rnorm(100),
  f = factor(sample(letters[1:3], 100, replace = TRUE))
)
fseed <- 12345
spec_seed <- rand_forest(mode = "regression", mtry = 2L, min_n = 10) %>%
  set_engine("grf_quantiles", seed = fseed)

manual <- quantile_forest(
  as.matrix(tib[, 2:3]), tib$y,
  quantiles = c(0.1, 0.5, 0.9), seed = fseed,
  mtry = 2L, min.node.size = 10
)
translate(spec_seed)
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 2
#>   min_n = 10
#> 
#> Engine-Specific Arguments:
#>   seed = fseed
#> 
#> Computational engine: grf_quantiles 
#> 
#> Model fit template:
#> grf::quantile_forest(X = missing_arg(), Y = missing_arg(), mtry = min_cols(~2L, 
#>     x), min.node.size = min_rows(~10, x), seed = fseed, quantiles = c(0.1, 
#> 0.5, 0.9), num.threads = 1L)

Created on 2024-08-13 with reprex v2.1.1

The function is being called with the same args, and passing the seed. I'm not concerned, but I wish the were the same.

@dajmcdon
Copy link
Contributor Author

As for the overall comment, yes, the plan is for {parsnip} to add a mode = "quantile_regression" setting. At that point, we'll probably pivot all of the engines here to that setup, and potentially remove them from here.

@dajmcdon dajmcdon merged commit f187192 into dev Aug 27, 2024
2 checks passed
@dajmcdon dajmcdon deleted the grf-qr-engine branch September 20, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants