Aggregation metrics #430

sam-data-guy-iam · 2023-12-25T20:17:19Z

Addition of GroupedInstanceMetric and derived metrics @matanor @elronbandel

add effect sizes

add metric_significant_pairs_graph

…erence p-value represents

…instead.

… prediction rather than the reverse.

…ean and PDR for both exact and string_containment instance accuracies

add tests for global and confidence interval for string containment mean and pdr

codecov · 2023-12-25T20:52:47Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (a576b32) 93.37% compared to head (cb35a36) 93.46%.

Files	Patch %	Lines
src/unitxt/metrics.py	96.73%	3 Missing ⚠️
tests/test_metrics.py	95.12%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
+ Coverage   93.37%   93.46%   +0.08%     
==========================================
  Files         177      178       +1     
  Lines        7340     7512     +172     
==========================================
+ Hits         6854     7021     +167     
- Misses        486      491       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yoavkatz · 2023-12-28T07:19:26Z

requirements/base.rqr

@@ -8,3 +8,5 @@ scikit-learn
 dpath
 jiwer
 editdistance
+statsmodels


Sam. Are these requirements needed for the change? I don't see their use.

yoavkatz · 2023-12-28T07:20:47Z

src/unitxt/metrics.py

+
+    from statistics import mean
+
+    metric = Accuracy


Probably best not to have a default to make sure people override it.

Also, why not initialize with the class object? Meaning: metric = Accuracy()

yoavkatz · 2023-12-28T07:23:00Z

src/unitxt/metrics.py

+            references, predictions, additional_inputs
+        ):
+            # allow any number of columns to be used as the identifier
+            keyname = tuple(inputs_dict.values())


I think the user needs to explicitly specify which are fields from additional_inputs to take. There could be many which are irrelevant.

In cards, is there a way for the user to specify which fields are relevant to be included in additional_inputs? I assume perhaps there are cases where additional_inputs fields could be used for calculation of the accuracy metric but NOT the grouping (which the aggregation function operates on). I assumed that all fields in additional_inputs would be used for grouping. What did you envision?
Here is the question: In other cases with metrics, there are class attributes like main_score, reduction_map, etc. which are integral to defining the metric and are fixed for each version of the metric. However, the grouping variables in additional_inputs would depend on the dataset, not the metric. Therefore, it made sense to me that the user would specify these fields externally, such as in the cards for the dataset, and therefore I assumed these would just affect additional_inputs directly. Currently there aren't any metrics that I can see that actually use the additional_inputs, so it is unclear to me what examples to follow.

yoavkatz · 2023-12-28T07:38:40Z

src/unitxt/metrics.py

+
+    metric = Accuracy
+    group_score_func = mean
+    group_score_name = "mean"


Is 'group_score_name' this used?

yoavkatz · 2023-12-28T07:39:40Z

src/unitxt/metrics.py

+    from statistics import mean
+
+    metric = Accuracy
+    group_score_func = mean


Maybe call it 'group_score_aggregation_func'?

yoavkatz

Overall, looks good and useful. A few requested changes and questions.
I'd prefer @elronbandel will also reviews.

matanor

hi @sam-data-guy-iam looks good! please see my comments ..

matanor · 2023-12-31T09:44:10Z

src/unitxt/metrics.py

@@ -1430,3 +1431,116 @@ def _compute(
            for k in self.k_list:
                result[self.score_name(measure_name, k)] = measure_array[min(k, max_k)]
        return result
+
+
+class GroupedInstanceMetric(GlobalMetric):


I am not sure that deriving from GlobalMetric is suitable here, since the underlying metric is computed per instance. So its a bit confusing with the global metric goal to allow computation which depends on a set of instances, but here we are doing per-instance computations.

I think a better option could be modifying InstanceMetric.process(..) to support grouping. After the loop that calculates the scores per instance (this loop), you could group the scores by the group key, if one is provided. This requires keeping track of the group key per score. Then, the code could loop over the reduction map and do the computation per group. In case there is no group key specified, there will be only one group with all the instances. Then the final step would be to averege over all group scores (which would be one score if no groups are defined). What do you think?

matanor · 2023-12-31T09:45:49Z

src/unitxt/metrics.py

+
+    from statistics import mean
+
+    metric = Accuracy


Also, why not initialize with the class object? Meaning: metric = Accuracy()

…gregation_func

…al_inputs

…add its own confidence interval method. Add instance with F1 score

…more resampling

sam-data-guy-iam · 2024-01-01T20:07:24Z

After more consideration, I rewrote this as a sub-class of InstanceMetric, as suggested by @matanor, which can now take an InstanceMetric as well as GlobalMetric (and possibly others, not sure), which have a process method. I added a confidence interval method that resamples the instances and extract the scores, without having to recalculate the instance scores each time as GlobalMetric would have required. Please take a look and let me know what you think.

yoavkatz · 2024-01-08T15:07:34Z

@matanor @elronbandel - Can we proceed with this PR and see if there are any additional needed changes (beside fix to the conflict)

matanor · 2024-01-09T08:32:17Z

@sam-data-guy-iam is this PR replaced by your new PR #452?

sam-data-guy-iam · 2024-01-09T08:48:19Z

Yes

…

On Tue, Jan 9, 2024, 10:32 AM matanor ***@***.***> wrote: @sam-data-guy-iam <https://github.com/sam-data-guy-iam> is this PR replaced by your new PR #452 <#452>? — Reply to this email directly, view it on GitHub <#430 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5V5WLBVQ2PWDE5BMAUKYLDYNT6B3AVCNFSM6AAAAABBCS6AK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGYYTGMJRGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

matanor · 2024-01-09T08:57:07Z

Yes
…
On Tue, Jan 9, 2024, 10:32 AM matanor @.> wrote: @sam-data-guy-iam https://github.com/sam-data-guy-iam is this PR replaced by your new PR #452 <#452>? — Reply to this email directly, view it on GitHub <#430 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5V5WLBVQ2PWDE5BMAUKYLDYNT6B3AVCNFSM6AAAAABBCS6AK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBSGYYTGMJRGQ . You are receiving this because you were mentioned.Message ID: @.>

@sam-data-guy-iam OK, so i am closing this PR..

Samuel Ackerman added 30 commits November 7, 2023 19:37

initial commit

3f2147f

add scipy and statsmodels for metric significance tests

7bbb020

Merge branch 'main' into model_diff_sig_tests

12e2999

fix assertion in mcnemar

9ecaa01

add metric_significance_heatmap

2836b4d

add effect sizes

add tests for effect sizes

5da0e4f

Merge branch 'main' into model_diff_sig_tests

1272433

replace use of dicts with namedtuple

fe4db1d

replace use of dicts with namedtuples, fix validity assertions

25cd983

add metric_significant_pairs_graph

Merge branch 'main' into model_diff_sig_tests

21134d5

add edge weighting and node coloring to metric_significant_pairs_graph

d997082

Merge branch 'main' into model_diff_sig_tests

69c6346

modify heatmap to use blue-red coloring to indicate direction of diff…

11f5f92

…erence p-value represents

Merge branch 'main' into model_diff_sig_tests

a2a11d0

heatmap: use automatic tick placement; add option to use effect size …

bf65b93

…instead.

initial commit

5cafa6e

initial commit

33564e3

initial commit

9ca4db5

add SubstringAccuracy

7a1d8fd

fix color ordering in lineplot, add legend

3905b6d

Merge branch 'main' of https://github.com/IBM/unitxt

004ea61

correct test cases to accord with fixed definition of substring accuracy

a89d21d

fix SubstringAccuracy to check if any reference is a substring of the…

e61746c

… prediction rather than the reverse.

Merge branch 'main' of https://github.com/IBM/unitxt

fc8e7a8

Merge branch 'main' into aggregation_metrics

11f6bfe

Merge branch 'main' of https://github.com/IBM/unitxt

5df2a09

Merge branch 'main' of https://github.com/IBM/unitxt

1384c13

Merge branch 'main' into aggregation_metrics

b8cc6fb

add GroupedInstanceMetric

e8ac1c2

initial commit

49a2ba7

Samuel Ackerman added 4 commits December 25, 2023 21:32

correct mistakes in string_containment instance scores; add grouped m…

90702dd

…ean and PDR for both exact and string_containment instance accuracies

correct formatting with ruff;

5527532

add tests for global and confidence interval for string containment mean and pdr

initial commit

585cddc

changes from merge with main

fc44c93

Samuel Ackerman added 5 commits December 27, 2023 14:11

remove newline

af4613a

remove newline after docstring

9ff015f

ruff formatting

268fec9

remove newline

c64a7b7

overwrite ruff formatting

08112a8

yoavkatz reviewed Dec 28, 2023

View reviewed changes

src/unitxt/metrics.py Outdated

metric = Accuracy

group_score_func = mean

group_score_name = "mean"

Copy link

Member

yoavkatz Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'group_score_name' this used?

yoavkatz reviewed Dec 28, 2023

View reviewed changes

yoavkatz requested changes Dec 28, 2023

View reviewed changes

matanor requested changes Dec 31, 2023

View reviewed changes

Samuel Ackerman added 7 commits December 31, 2023 21:59

rename GroupedInstanceMetric instance_score_metric and group_score_ag…

bed35f3

…gregation_func

merge with main

478d18b

use fixed grouping_field rather than accepting all fields in addition…

2dcdd9c

…al_inputs

ruff format

98d8c66

rewrite GroupedInstanceMetric as inheriting from InstanceMetric, and …

f565006

…add its own confidence interval method. Add instance with F1 score

rename metrics, add F1 based score, fix test values given new resampling

076230c

add test for GroupedInstanceMetrics, modify confidence intervals for …

cb35a36

…more resampling

matanor closed this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation metrics #430

Aggregation metrics #430

sam-data-guy-iam commented Dec 25, 2023

codecov bot commented Dec 25, 2023 •

edited

Loading

yoavkatz Dec 28, 2023

yoavkatz Dec 28, 2023

matanor Dec 31, 2023

yoavkatz Dec 28, 2023

sam-data-guy-iam Dec 31, 2023 •

edited

Loading

yoavkatz Dec 28, 2023

yoavkatz Dec 28, 2023

yoavkatz left a comment

matanor left a comment

matanor Dec 31, 2023

matanor Dec 31, 2023

sam-data-guy-iam commented Jan 1, 2024

yoavkatz commented Jan 8, 2024

matanor commented Jan 9, 2024

sam-data-guy-iam commented Jan 9, 2024 via email

matanor commented Jan 9, 2024 •

edited

Loading

Aggregation metrics #430

Aggregation metrics #430

Conversation

sam-data-guy-iam commented Dec 25, 2023

codecov bot commented Dec 25, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-data-guy-iam Dec 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoavkatz left a comment

Choose a reason for hiding this comment

matanor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-data-guy-iam commented Jan 1, 2024

yoavkatz commented Jan 8, 2024

matanor commented Jan 9, 2024

sam-data-guy-iam commented Jan 9, 2024 via email

matanor commented Jan 9, 2024 • edited Loading

codecov bot commented Dec 25, 2023 •

edited

Loading

sam-data-guy-iam Dec 31, 2023 •

edited

Loading

matanor commented Jan 9, 2024 •

edited

Loading