Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation metrics #430

Closed
wants to merge 61 commits into from
Closed

Aggregation metrics #430

wants to merge 61 commits into from

Conversation

sam-data-guy-iam
Copy link
Collaborator

Addition of GroupedInstanceMetric and derived metrics @matanor @elronbandel

Samuel Ackerman added 30 commits November 7, 2023 19:37
Samuel Ackerman added 4 commits December 25, 2023 21:32
…ean and PDR for both exact and string_containment instance accuracies
add tests for global and confidence interval for string containment mean and pdr
Copy link

codecov bot commented Dec 25, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (a576b32) 93.37% compared to head (cb35a36) 93.46%.

Files Patch % Lines
src/unitxt/metrics.py 96.73% 3 Missing ⚠️
tests/test_metrics.py 95.12% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
+ Coverage   93.37%   93.46%   +0.08%     
==========================================
  Files         177      178       +1     
  Lines        7340     7512     +172     
==========================================
+ Hits         6854     7021     +167     
- Misses        486      491       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -8,3 +8,5 @@ scikit-learn
dpath
jiwer
editdistance
statsmodels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam. Are these requirements needed for the change? I don't see their use.


from statistics import mean

metric = Accuracy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best not to have a default to make sure people override it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not initialize with the class object? Meaning: metric = Accuracy()

references, predictions, additional_inputs
):
# allow any number of columns to be used as the identifier
keyname = tuple(inputs_dict.values())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user needs to explicitly specify which are fields from additional_inputs to take. There could be many which are irrelevant.

Copy link
Collaborator Author

@sam-data-guy-iam sam-data-guy-iam Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cards, is there a way for the user to specify which fields are relevant to be included in additional_inputs? I assume perhaps there are cases where additional_inputs fields could be used for calculation of the accuracy metric but NOT the grouping (which the aggregation function operates on). I assumed that all fields in additional_inputs would be used for grouping. What did you envision?
Here is the question: In other cases with metrics, there are class attributes like main_score, reduction_map, etc. which are integral to defining the metric and are fixed for each version of the metric. However, the grouping variables in additional_inputs would depend on the dataset, not the metric. Therefore, it made sense to me that the user would specify these fields externally, such as in the cards for the dataset, and therefore I assumed these would just affect additional_inputs directly. Currently there aren't any metrics that I can see that actually use the additional_inputs, so it is unclear to me what examples to follow.


metric = Accuracy
group_score_func = mean
group_score_name = "mean"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'group_score_name' this used?

from statistics import mean

metric = Accuracy
group_score_func = mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it 'group_score_aggregation_func'?

Copy link
Member

@yoavkatz yoavkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good and useful. A few requested changes and questions.
I'd prefer @elronbandel will also reviews.

Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @sam-data-guy-iam looks good! please see my comments ..

@@ -1430,3 +1431,116 @@ def _compute(
for k in self.k_list:
result[self.score_name(measure_name, k)] = measure_array[min(k, max_k)]
return result


class GroupedInstanceMetric(GlobalMetric):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that deriving from GlobalMetric is suitable here, since the underlying metric is computed per instance. So its a bit confusing with the global metric goal to allow computation which depends on a set of instances, but here we are doing per-instance computations.

I think a better option could be modifying InstanceMetric.process(..) to support grouping. After the loop that calculates the scores per instance (this loop), you could group the scores by the group key, if one is provided. This requires keeping track of the group key per score. Then, the code could loop over the reduction map and do the computation per group. In case there is no group key specified, there will be only one group with all the instances. Then the final step would be to averege over all group scores (which would be one score if no groups are defined). What do you think?


from statistics import mean

metric = Accuracy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not initialize with the class object? Meaning: metric = Accuracy()

@sam-data-guy-iam
Copy link
Collaborator Author

After more consideration, I rewrote this as a sub-class of InstanceMetric, as suggested by @matanor, which can now take an InstanceMetric as well as GlobalMetric (and possibly others, not sure), which have a process method. I added a confidence interval method that resamples the instances and extract the scores, without having to recalculate the instance scores each time as GlobalMetric would have required. Please take a look and let me know what you think.

@yoavkatz
Copy link
Member

yoavkatz commented Jan 8, 2024

@matanor @elronbandel - Can we proceed with this PR and see if there are any additional needed changes (beside fix to the conflict)

@matanor
Copy link
Member

matanor commented Jan 9, 2024

@sam-data-guy-iam is this PR replaced by your new PR #452?

@sam-data-guy-iam
Copy link
Collaborator Author

sam-data-guy-iam commented Jan 9, 2024 via email

@matanor
Copy link
Member

matanor commented Jan 9, 2024

Yes

On Tue, Jan 9, 2024, 10:32 AM matanor @.> wrote: @sam-data-guy-iam https://github.com/sam-data-guy-iam is this PR replaced by your new PR #452 <#452>? — Reply to this email directly, view it on GitHub <#430 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5V5WLBVQ2PWDE5BMAUKYLDYNT6B3AVCNFSM6AAAAABBCS6AK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGYYTGMJRGQ . You are receiving this because you were mentioned.Message ID: @.>

@sam-data-guy-iam OK, so i am closing this PR..

@matanor matanor closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants