get_bugbug_labels no longer adds nobug type to regression training data #3396
+4
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#539
Modified
get_bugbug_labels
indefect.py
to include only those data points that are labelled eitherregression
orbug_no_regression
in the training set.Training the model without changes
72486 non-regression bugs
Cross Validation scores:
Accuracy: f0.9731263445549161 (+/- 0.0012810455820845609)
Precision: f0.9560802008310938 (+/- 0.006503421458310747)
Recall: f0.9316432362619518 (+/- 0.0042866900183067425)
Training the model after changes
71597 non-regression bugs (889 dropped)
Cross Validation scores:
Accuracy: f0.9739072259525028 (+/- 0.0019480324611321944)
Precision: f0.9561803892880535 (+/- 0.006928496874119621)
Recall: f0.9358629670750973 (+/- 0.0045683573571298)
Minor improvement in precision and recall.
Should categories
task
,enhancement
,feature
also be removed from the training data for regression?Please let me know if I have misunderstood the task.