-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find-threshold: CLI command for multi-label classifier threshold tuning #11280
find-threshold: CLI command for multi-label classifier threshold tuning #11280
Conversation
I realize this is a draft, but some general concerns:
The |
Should be possible. Would it be acceptable if we ditch the automatic component recognition then and always require naming the component to be evaluated?
Wasn't aware of this.
I'll look into harmonizing the arguments.
Can you elaborate on how modifying thresholds would affect annotations for other components? |
In general the situation is that you have a component that has:
Examples:
I would initially think that |
Are there smart(-ish) ways to...
We can hardcode this ofc, but I was wondering whether there's a better way to do this. 1. could be done by checking for the existence of a |
No, I think we need to rely on the user to provide a path to the threshold in the config and the scores key to optimize. In v4, I'm planning to move all these settings out of |
It's not just the scores key, I think. The scoring method in Clarification: I interpret "scores key" to be the |
The component already has a registered scorer, so what I mean by "scores key" is the entry in the output of |
… 'spacy evaluate' CLI.
The latest commit should be closer to a generic solution. Two remarks:
|
Added a draft for integrating |
I don't think beta makes sense as a
The existing textcat scorer is kind of a bad example because For example, you could have two spancat components with different |
Let me know if changes for |
Can you have a look at the conflicts? |
34c6c3b
to
188a7d0
Compare
This is quite weird. Apparently my |
Should be fine now. Are we ok with this? Then I'd update the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a useful CL tool to have, nice work!
I mainly had some comments around UX and documentation. It would be a good idea to document some standard settings for this command
(like spacy find-threshold my_nlp data.spacy textcat_multilabel threshold cats_macro_f
) that users can just copy-paste if they're working with standard pipelines/configs.
I'll include some in the docs. Do you have any suggestions other than this one you'd like to have included? |
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
It'd be nice to include one for each of the main pipeline components we see as relevant - currently mainly multilabel textcat & spancat, no? |
# Conflicts: # website/docs/api/cli.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'll leave it open for one more day in case anyone else wanted to do a final review.
Goal
Add a
find-threshold
CLI command investigating different threshold values for classification models and returning the ones maximizing the specified score.Description
find-threshold
; API call isfind_threshold()
.spacy.tests.test_cli
.Supported options are:
pipe_name
: Which pipe to evaluate (with pipelines with multipleMultiLabel_TextCategorizer
components the name has to be specified, otherwise it's optional).average
: Whether to usemicro
ormacro
to compute F-score over all labels.n_trials
: Number of sample points in threshold space between 0 and 1.beta
: Beta coefficient for F-score calculation.Types of change
New feature.
Checklist