Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-processing write-up #27

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Post-processing write-up #27

wants to merge 4 commits into from

Conversation

topepo
Copy link
Member

@topepo topepo commented May 12, 2023

No description provided.


wflow_individ <-
wflow_2 %>%
add_prob_calibration(cal_object) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note about where cal_object comes from

# 'object' is a pre-made calibration object
# Data Inputs: class probabilities
# Data Outputs: class probabilities and recomputed class predictions
add_prob_calibration(object, .priority = 1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these also have x as a first argument. Where x is the workflow

# Potentially tunable
# Data Inputs: class probabilities
# Data Outputs: class predictions
add_prob_threshold(threshold = numeric(), .priority = 2.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need a levels argument and an ordered argument? Like probably::make_two_class_pred()

# Potentially tunable
# Data Inputs: class probabilities
# Data Outputs: class predictions
add_cls_eq_zone(value = numeric(), threshold = numeric(), .priority = 3.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In probably probably::make_class_pred() had a buffer argument that created a range of [threshold - buffer[1], threshold + buffer[2]] where anything inside the buffer range was marked equivocal. Maybe you could use the buffer arg here?


Maybe also name it something similar to add_prob_threshold() like add_prob_threshold_buffered() where:

  • add_prob_threshold() always returns a factor (maybe ordered)
  • add_prob_threshold_buffered() always returns a <class_pred> from probably

# User will have to always set the priority
# Data Inputs: all predictions
# Data Outputs: all predictions (only these columns are retained)
add_post_mutate(..., .priority)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider giving all of these functions a common prefix that differentiates them from the other add_*() functions, like add_post_*():

add_post_calibration() # do you need prob vs reg calibration? can you just "figure it out"? can it be an argument?
add_post_threshold()
add_post_threshold_buffered()
add_post_mutate()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicted on this. I agree this would be nice for tab completion, but is inconsistent with the naming convention for preprocessors: add_variables(), add_recipe(), add_formula().

- A container for the list of possible post-processors specified by the user.
- A validation system to resolve conflicts in type or priority.
- An interface to apply the operations to the predicted values.
- The requisite package dependencies (primarily the probably package)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mildly worried about the number of Imports that dev probably has. It is fairly high, and might be worth it to go back and see if some of them can be moved to Suggests as optional deps


- `new_stage_post()`, and `new_action_post()` are existing constructors.

We will require a `.fit_post(workflow, data)` that will execute _only_ the post-processing operations; the `workflows` object will already have trained the `pre` and `fit` stages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is really an "internal" function, I think we can just assume that the workflow has already trained the pre and fit stages, i.e. we don't need to try to do any checks to see if that is true or not. It should only be used by workflows internally and tune

@simonpcouch
Copy link

Looking at this now—will merge some of Davis' suggestions and push some small edits. Will leave a review when finished!

simonpcouch and others added 2 commits May 16, 2023 12:50
Co-authored-by: Davis Vaughan <davis@rstudio.com>
Copy link

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few big-picture edits from me beyond Davis' comments!

- An interface to apply the operations to the predicted values.
- The requisite package dependencies (primarily the probably package).

The dwai code may eventually make its way into workflows or probably.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is our plan, I might argue we put this functionality in workflows or probably from the get-go. Feels a bit like this living its own package could eventually feel like technical debt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants