-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregation metrics #430
Aggregation metrics #430
Conversation
add effect sizes
add metric_significant_pairs_graph
…erence p-value represents
… prediction rather than the reverse.
…ean and PDR for both exact and string_containment instance accuracies
add tests for global and confidence interval for string containment mean and pdr
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #430 +/- ##
==========================================
+ Coverage 93.37% 93.46% +0.08%
==========================================
Files 177 178 +1
Lines 7340 7512 +172
==========================================
+ Hits 6854 7021 +167
- Misses 486 491 +5 ☔ View full report in Codecov by Sentry. |
@@ -8,3 +8,5 @@ scikit-learn | |||
dpath | |||
jiwer | |||
editdistance | |||
statsmodels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sam. Are these requirements needed for the change? I don't see their use.
src/unitxt/metrics.py
Outdated
|
||
from statistics import mean | ||
|
||
metric = Accuracy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best not to have a default to make sure people override it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not initialize with the class object? Meaning: metric = Accuracy()
src/unitxt/metrics.py
Outdated
references, predictions, additional_inputs | ||
): | ||
# allow any number of columns to be used as the identifier | ||
keyname = tuple(inputs_dict.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user needs to explicitly specify which are fields from additional_inputs to take. There could be many which are irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cards, is there a way for the user to specify which fields are relevant to be included in additional_inputs? I assume perhaps there are cases where additional_inputs fields could be used for calculation of the accuracy metric but NOT the grouping (which the aggregation function operates on). I assumed that all fields in additional_inputs would be used for grouping. What did you envision?
Here is the question: In other cases with metrics, there are class attributes like main_score, reduction_map, etc. which are integral to defining the metric and are fixed for each version of the metric. However, the grouping variables in additional_inputs would depend on the dataset, not the metric. Therefore, it made sense to me that the user would specify these fields externally, such as in the cards for the dataset, and therefore I assumed these would just affect additional_inputs directly. Currently there aren't any metrics that I can see that actually use the additional_inputs, so it is unclear to me what examples to follow.
src/unitxt/metrics.py
Outdated
|
||
metric = Accuracy | ||
group_score_func = mean | ||
group_score_name = "mean" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'group_score_name' this used?
src/unitxt/metrics.py
Outdated
from statistics import mean | ||
|
||
metric = Accuracy | ||
group_score_func = mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it 'group_score_aggregation_func'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good and useful. A few requested changes and questions.
I'd prefer @elronbandel will also reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @sam-data-guy-iam looks good! please see my comments ..
src/unitxt/metrics.py
Outdated
@@ -1430,3 +1431,116 @@ def _compute( | |||
for k in self.k_list: | |||
result[self.score_name(measure_name, k)] = measure_array[min(k, max_k)] | |||
return result | |||
|
|||
|
|||
class GroupedInstanceMetric(GlobalMetric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that deriving from GlobalMetric
is suitable here, since the underlying metric is computed per instance. So its a bit confusing with the global metric goal to allow computation which depends on a set of instances, but here we are doing per-instance computations.
I think a better option could be modifying InstanceMetric.process(..)
to support grouping. After the loop that calculates the scores per instance (this loop), you could group the scores by the group key, if one is provided. This requires keeping track of the group key per score. Then, the code could loop over the reduction map and do the computation per group. In case there is no group key specified, there will be only one group with all the instances. Then the final step would be to averege over all group scores (which would be one score if no groups are defined). What do you think?
src/unitxt/metrics.py
Outdated
|
||
from statistics import mean | ||
|
||
metric = Accuracy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why not initialize with the class object? Meaning: metric = Accuracy()
…add its own confidence interval method. Add instance with F1 score
After more consideration, I rewrote this as a sub-class of InstanceMetric, as suggested by @matanor, which can now take an InstanceMetric as well as GlobalMetric (and possibly others, not sure), which have a process method. I added a confidence interval method that resamples the instances and extract the scores, without having to recalculate the instance scores each time as GlobalMetric would have required. Please take a look and let me know what you think. |
@matanor @elronbandel - Can we proceed with this PR and see if there are any additional needed changes (beside fix to the conflict) |
@sam-data-guy-iam is this PR replaced by your new PR #452? |
Yes
…On Tue, Jan 9, 2024, 10:32 AM matanor ***@***.***> wrote:
@sam-data-guy-iam <https://github.com/sam-data-guy-iam> is this PR
replaced by your new PR #452 <#452>?
—
Reply to this email directly, view it on GitHub
<#430 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5V5WLBVQ2PWDE5BMAUKYLDYNT6B3AVCNFSM6AAAAABBCS6AK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGYYTGMJRGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@sam-data-guy-iam OK, so i am closing this PR.. |
Addition of GroupedInstanceMetric and derived metrics @matanor @elronbandel