-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouped instance metric inherit from InstanceMetrics #452
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @sam-data-guy-iam thanks for this new PR, looks good!
i left a few comments, please have a look..
src/unitxt/version.py
Outdated
@@ -1 +1 @@ | |||
version = "1.4.3" | |||
version = "1.4.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file shouldn't be in the PR, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I guess this can be deleted from the PR, since it is just from the merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I updated the branch with the merge, but I'm not sure what to do to fix this in particular. I guess this file can just be removed from the PR?
src/unitxt/metrics.py
Outdated
group_total_scores = [ | ||
score for score in group_total_scores if not np.isnan(score) | ||
] | ||
# ignore NaNs in aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is not what the line below does.. should it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a warnings catch for the RuntimeWarning from nanmean
src/unitxt/metrics.py
Outdated
ci = bootstrap( | ||
(scores,), | ||
statistic=mean, | ||
(identifiers,), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please explain why pass the identifiers
and not the instances
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I found a work-around. I had copied this part of the code from the GlobalMetric (which might not need it either).
src/unitxt/metrics.py
Outdated
try: | ||
return aggregation_func(instances, score_name) | ||
except Exception as e: | ||
# this happens in edge cases, for example, when the sampling creates a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something that happens? because the comment talks about bleu which is a GlobalMetric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I copied this from GlobalMetric confidence intervals. In this case, since the instance scores are already computed, there is no additional computation on the instances (unless there is a corner case where one designs an aggregation function to do something weird like this) there should be no issue, and the confidence interval already deals with the case where there are NaNs. I am removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @sam-data-guy-iam tks for making the changes, i went over the changes in metrics.py again, pls have a look ..
src/unitxt/metrics.py
Outdated
|
||
@property | ||
@abstractmethod | ||
def reduction_map(self) -> dict: | ||
pass | ||
|
||
def _validate_group_mean_reduction(self): | ||
if "group_mean" in self.reduction_map: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since _validate_group_mean_reduction
is called after checking if reduction == "group_mean":
then perhaps there is no need to check again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem, adding it as an assert without an if condition, just in case
src/unitxt/metrics.py
Outdated
if reduction == "mean": | ||
from statistics import mean | ||
@staticmethod | ||
def aggregate(instances, field_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def aggregate(instances, field_name): | |
def average_instance_scores(instances, field_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming this and the group function
src/unitxt/metrics.py
Outdated
def process(self, stream: Stream, stream_name: Optional[str] = None) -> Generator: | ||
instances, global_score = self.compute_instance_scores(stream, stream_name) | ||
|
||
for reduction, fields in self.reduction_map.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for reduction, fields in self.reduction_map.items(): | |
for reduction_type, reduction_params in self.reduction_map.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved readability IMO, pls also see suggestions below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
src/unitxt/metrics.py
Outdated
|
||
if reduction == "group_mean": | ||
self._validate_group_mean_reduction() | ||
score_fields = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score_fields = ( | |
reduction_fields = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
src/unitxt/metrics.py
Outdated
|
||
aggregation_func = None | ||
if reduction == "mean": | ||
aggregation_func = self.aggregate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregation_func = self.aggregate | |
aggregation_func = self.aggregate | |
reduction_fields = reduction_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
src/unitxt/metrics.py
Outdated
score_names = ( | ||
self.ci_scores if self.ci_scores is not None else [self.main_score] | ||
) | ||
if score_names is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to always explicitly pass this param, i.e. remove the default, and update the relevant calls
src/unitxt/metrics.py
Outdated
with warnings.catch_warnings(): | ||
# in case instances is empty, return NaN but avoid printing a RuntimeWarning | ||
warnings.simplefilter("ignore", category=RuntimeWarning) | ||
return np.nanmean(scores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened before your changes if some of the instance scores were NaN? Is it still the same behaviour?
…ted CIs. score_based_confidence_interval accepts list of score fields without definining bootstrap function
move aggregate_instance_scores as static method to MetricWithConfidenceInterval so can be used in score_based_confidence_interval
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #452 +/- ##
==========================================
+ Coverage 88.03% 88.40% +0.36%
==========================================
Files 87 87
Lines 7440 7699 +259
==========================================
+ Hits 6550 6806 +256
- Misses 890 893 +3 ☔ View full report in Codecov by Sentry. |
…aphrase_accuracy.json rename to norm hedges g
…/unitxt into aggregation_inherit_instance
I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past). |
@sam-data-guy-iam |
fair enough. I will remove them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @sam-data-guy-iam , pls see my comments regarding the tests.
also left some very minor suggestions for the non-test code
…n of prediction and reference to strings internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! thanks @sam-data-guy-iam for this new feature!
…ues, so CI will not be NaN
…-NaN values, so CI will not be NaN" This reverts commit 6249538.
…ues, so CI will not be NaN
Rebase failed
Rebase failed
Please ignore the earlier PR for aggregation metrics and use this branch instead. I took @matanor s suggestion to modify InstanceMetric and make the grouped mean type metrics inherit from the InstanceMetric classes that define the instance calculation.
Also: Closes #389