Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test parsnip "overhead" #1071

Merged
merged 9 commits into from
Feb 28, 2024
Merged

test parsnip "overhead" #1071

merged 9 commits into from
Feb 28, 2024

Conversation

simonpcouch
Copy link
Contributor

Some minimal testing to alert us when we've allowed additional "overhead" to creep in. A more true test of overhead would test differences rather than ratios, but those numbers are largely system-dependent.

My intent here is not that we'd automatically reject a PR that causes this test to fail, but to let us know when a PR changes these ratios and give us a chance to consider whether the slowdown is worth the benefit or ought to be addressed.

time_parsnip_form <-
timing(fit(parsnip::linear_reg(), mpg ~ ., mtcars))
time_parsnip_xy <-
timing(fit_xy(parsnip::linear_reg(), mtcars[2:11], mtcars[1]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bracket subsetting does add some additional compute that's not "our fault," though it's a fraction of time_engine.

@simonpcouch simonpcouch requested a review from topepo February 26, 2024 16:52
@simonpcouch
Copy link
Contributor Author

ubuntu-latest (devel) failure is due to unrelated #1074.

@EmilHvitfeldt
Copy link
Member

I feel a little weird trying to test these, but I agree about the sentiment. I totally agree about the skip on cran!

Firstly, how variable are the ratios when you test them locally?

@simonpcouch
Copy link
Contributor Author

Firstly, how variable are the ratios when you test them locally?

Heard. Here are the distributions of those ratios locally:

library(parsnip)

timing <- function(expr) {
  expr <- substitute(expr)
  system.time(replicate(100, eval(expr)))[["elapsed"]]
}

time_engine <-
  replicate(100, timing(lm(mpg ~ ., mtcars)))
time_parsnip_form <-
  replicate(100, timing(fit(linear_reg(), mpg ~ ., mtcars)))
time_parsnip_xy <-
  replicate(100, timing(fit_xy(linear_reg(), mtcars[2:11], mtcars[1])))

hist(time_parsnip_form / time_engine)

hist(time_parsnip_xy / time_engine)

Created on 2024-02-26 with reprex v2.1.0

@simonpcouch
Copy link
Contributor Author

Given the current timings, we could even use 1000 replicates for each and this test would only take a couple seconds, if we wanted to cut down on that variability further. I've set the thresholds to be quite permissive generally, though.

@EmilHvitfeldt
Copy link
Member

that is not a bad idea. Up to you 😄

Copy link
Member

@topepo topepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of summing the elapsed times for 100 reps, take the median of the reps (or mean) and do a ratio of those.

@simonpcouch
Copy link
Contributor Author

I like the idea of using the median timing! The only issue is that individual fits go very fast, and system.time() only measures in 1000ths of a second, so timings on individual runs would be either .0000 or .0001.

@topepo
Copy link
Member

topepo commented Feb 27, 2024

You gotta bump them numbers up mtcars[rep(1:32, 100),] or use a different data set (Sacramento?)

@simonpcouch
Copy link
Contributor Author

simonpcouch commented Feb 27, 2024

The idea to use medians was spot on. Prioritizing the consistency of these tests (i.e. avoiding false failures), I propose we add a Suggests for bench in favor of system.time(), which allows us to test more quickly-running expressions and get the median run's timing back by default. The variability of those ratios decreases drastically with the transition to testing medians rather than the sum. :)

library(ggplot2)
library(parsnip)

bm <- function() {
  res <- bench::mark(
    time_engine = lm(mpg ~ ., mtcars),
    time_parsnip_form = fit(linear_reg(), mpg ~ ., mtcars),
    time_parsnip_xy = fit_xy(linear_reg(), mtcars[2:11],  mtcars[1]),
    relative = TRUE,
    check = FALSE
  )

  c(form = res$median[2], xy = res$median[3])
}

ratios <- replicate(100, bm())
ratios <- data.frame(t(ratios))

ggplot(ratios) + 
  aes(x = form) + 
  geom_histogram() +
  geom_vline(xintercept = 3)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(ratios) + 
  aes(x = xy) + 
  geom_histogram() +
  geom_vline(xintercept = 3.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2024-02-27 with reprex v2.1.0

The alternative is to test this in extratests to avoid the Suggests, but these aren't really integration tests and moving them brings them farther away from our development cycle.

@simonpcouch
Copy link
Contributor Author

The decreased ratio cutoffs in the above comment are due to speedups merged upstream from PRs in the last two days!

@simonpcouch simonpcouch requested a review from topepo February 27, 2024 22:33
Copy link
Member

@topepo topepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great with bench. Not much dependency overhead either.

@simonpcouch simonpcouch merged commit 36ad315 into main Feb 28, 2024
7 checks passed
@simonpcouch simonpcouch deleted the timings branch February 28, 2024 13:31
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants