-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post-processing write-up #27
base: main
Are you sure you want to change the base?
Conversation
|
||
wflow_individ <- | ||
wflow_2 %>% | ||
add_prob_calibration(cal_object) %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a note about where cal_object
comes from
# 'object' is a pre-made calibration object | ||
# Data Inputs: class probabilities | ||
# Data Outputs: class probabilities and recomputed class predictions | ||
add_prob_calibration(object, .priority = 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these also have x
as a first argument. Where x
is the workflow
# Potentially tunable | ||
# Data Inputs: class probabilities | ||
# Data Outputs: class predictions | ||
add_prob_threshold(threshold = numeric(), .priority = 2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need a levels
argument and an ordered
argument? Like probably::make_two_class_pred()
# Potentially tunable | ||
# Data Inputs: class probabilities | ||
# Data Outputs: class predictions | ||
add_cls_eq_zone(value = numeric(), threshold = numeric(), .priority = 3.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In probably probably::make_class_pred()
had a buffer
argument that created a range of [threshold - buffer[1], threshold + buffer[2]]
where anything inside the buffer range was marked equivocal. Maybe you could use the buffer
arg here?
Maybe also name it something similar to add_prob_threshold()
like add_prob_threshold_buffered()
where:
add_prob_threshold()
always returns a factor (maybe ordered)add_prob_threshold_buffered()
always returns a<class_pred>
from probably
# User will have to always set the priority | ||
# Data Inputs: all predictions | ||
# Data Outputs: all predictions (only these columns are retained) | ||
add_post_mutate(..., .priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider giving all of these functions a common prefix that differentiates them from the other add_*()
functions, like add_post_*()
:
add_post_calibration() # do you need prob vs reg calibration? can you just "figure it out"? can it be an argument?
add_post_threshold()
add_post_threshold_buffered()
add_post_mutate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conflicted on this. I agree this would be nice for tab completion, but is inconsistent with the naming convention for preprocessors: add_variables()
, add_recipe()
, add_formula()
.
post-processing/readme.qmd
Outdated
- A container for the list of possible post-processors specified by the user. | ||
- A validation system to resolve conflicts in type or priority. | ||
- An interface to apply the operations to the predicted values. | ||
- The requisite package dependencies (primarily the probably package) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mildly worried about the number of Imports that dev probably has. It is fairly high, and might be worth it to go back and see if some of them can be moved to Suggests as optional deps
|
||
- `new_stage_post()`, and `new_action_post()` are existing constructors. | ||
|
||
We will require a `.fit_post(workflow, data)` that will execute _only_ the post-processing operations; the `workflows` object will already have trained the `pre` and `fit` stages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is really an "internal" function, I think we can just assume that the workflow has already trained the pre
and fit
stages, i.e. we don't need to try to do any checks to see if that is true or not. It should only be used by workflows internally and tune
Looking at this now—will merge some of Davis' suggestions and push some small edits. Will leave a review when finished! |
Co-authored-by: Davis Vaughan <davis@rstudio.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few big-picture edits from me beyond Davis' comments!
- An interface to apply the operations to the predicted values. | ||
- The requisite package dependencies (primarily the probably package). | ||
|
||
The dwai code may eventually make its way into workflows or probably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is our plan, I might argue we put this functionality in workflows or probably from the get-go. Feels a bit like this living its own package could eventually feel like technical debt.
No description provided.