Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouped instance metric inherit from InstanceMetrics #452

Merged
merged 112 commits into from
Feb 22, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
b32d782
add tests for grouped instance metrics
Jan 8, 2024
a797cdc
modify InstanceMetric to accept grouped_mean reduction
Jan 8, 2024
0d63164
initial commit
Jan 8, 2024
0f3f828
apply ruff formatting
Jan 8, 2024
a316972
apply ruff formatting, reduce complexity
Jan 8, 2024
d21ab3a
merge with main
Jan 8, 2024
ffa4e1d
merge with main
Jan 8, 2024
01914ff
initial commit
Jan 8, 2024
7d98ec5
rename grouped instance metrics so artifact type and name correspond
Jan 8, 2024
b99694a
rename grouped instance metrics so artifact type and name correspond
Jan 8, 2024
735ce41
rename grouped instance metrics so artifact type and name correspond
Jan 8, 2024
1dca6aa
Merge branch 'main' into aggregation_inherit_instance
Jan 8, 2024
730ff45
remove newline formatting
Jan 8, 2024
eeada82
commits from merge
Jan 10, 2024
0aaa1da
remove (catalog from removed metric)
Jan 10, 2024
2d52c54
fix some variation in expected values
Jan 11, 2024
510d6e8
add catching of nanmean warning; fix InstanceMetric verification func…
Jan 11, 2024
bb6b46b
merge with main
Jan 11, 2024
8f5ce10
InstanceMetric need to specify ci_scores for fields that have calcula…
Jan 16, 2024
998bbdb
Merge branch 'main' into aggregation_inherit_instance
Jan 16, 2024
55e559d
add ci_scores to several InstanceMetrics
Jan 16, 2024
ac1fe8c
Merge branch 'main' into aggregation_inherit_instance
Jan 16, 2024
7047fd7
ruff formatting
Jan 16, 2024
67e05e4
add test_grouped_instance_metric_errors for code coverage
Jan 17, 2024
ea82024
Merge branch 'main' into aggregation_inherit_instance
Jan 17, 2024
4cc38cb
add grouped instance metrics with normalized Cohen's h aggregation fu…
Jan 17, 2024
3631171
add normalized Cohen's h
Jan 17, 2024
f738202
marge with main
Jan 17, 2024
1a570de
Merge branch 'main' into aggregation_inherit_instance
Jan 17, 2024
dd6bcfe
change description of group_instance_metrics test since is no longer …
Jan 18, 2024
2ce1067
checkout from main
Jan 18, 2024
2664de2
slight difference in results for confidence interval between Travis a…
Jan 21, 2024
c3ddd8f
Merge branch 'main' into aggregation_inherit_instance
Jan 21, 2024
b6ed90e
add note for grouped instance CI for Cohen + StringContainment
Jan 21, 2024
16a17cc
add documentation to InstanceMetric group_mean reduction validation
Jan 21, 2024
83104db
rename field as group_aggregation_func;
Jan 23, 2024
4b5281d
rename field as group_aggregation_func;
Jan 23, 2024
dfa08d4
Merge branch 'main' into aggregation_inherit_instance
Jan 23, 2024
f7eca81
use same predictions and references for tokenoverlap as the other met…
Jan 23, 2024
e25dedd
add additional comments to resample_from_non_nan from original version
Jan 23, 2024
438aee2
use same references and predictions for tokenoverlap as for other gro…
Jan 23, 2024
3c3a04b
Merge branch 'main' into aggregation_inherit_instance
Jan 23, 2024
9cb48dc
return global result to CI test for grouped instance because of token…
Jan 23, 2024
288e29c
add interpretation option and comment to cohen's h
Jan 24, 2024
1535e85
add group_mean_subgroup_comparison reduction to InstanceMetric; updat…
Jan 28, 2024
fc2f6c2
Merge branch 'main' into aggregation_inherit_instance
Jan 28, 2024
8650e9b
modify test_grouped_instance_metric_errors to take into account boole…
Jan 29, 2024
2db6550
class InstanceMetric can have group reductions done either taking the…
Jan 29, 2024
e718694
add FixedGroupMeanAccuracy. Modify expected global results to take i…
Jan 29, 2024
456848b
add notes to cohen's h
Jan 30, 2024
47c0760
Merge branch 'main' into aggregation_inherit_instance
Jan 30, 2024
9d907a8
Merge branch 'main' into aggregation_inherit_instance
matanor Jan 31, 2024
ed66ed7
add other_mean and baseline_mean functions. Combine the subgroup_com…
Jan 31, 2024
75c9072
Merge branch 'main' into aggregation_inherit_instance
Jan 31, 2024
afeee78
Merge branch 'aggregation_inherit_instance' of https://github.com/IBM…
Jan 31, 2024
84332dd
import statistics.mean at the top
Jan 31, 2024
3ae446c
remove __name__
Jan 31, 2024
bf72218
Delete src/unitxt/catalog/metrics/group_mean_accuracy.json
sam-data-guy-iam Feb 1, 2024
d443846
remove from catalog
Feb 1, 2024
bd65681
move to own directory
Feb 1, 2024
5933f33
return class name
Feb 1, 2024
ccced31
write metrics to robustness directory in catalog
Feb 1, 2024
f008190
Merge branch 'main' into aggregation_inherit_instance
Feb 1, 2024
430c1a5
rename others to paraphrase; use variant_score_dict rather than is_ba…
Feb 5, 2024
94d0528
initial commit
Feb 5, 2024
c057009
Merge branch 'main' into aggregation_inherit_instance
Feb 5, 2024
5747630
fix type hint in validate_variant_types
Feb 5, 2024
1ca5317
fix type hint in validate_variant_types
Feb 5, 2024
bdb8b04
Delete src/unitxt/catalog/metrics/robustness/fixed_group_mean_others_…
sam-data-guy-iam Feb 5, 2024
101c08d
Delete src/unitxt/catalog/metrics/robustness/fixed_group_mean_others_…
sam-data-guy-iam Feb 5, 2024
92ae38e
initial commit
Feb 11, 2024
0278f87
implement PR changes; rename variant to subgroup; add Cohen's d metric
Feb 11, 2024
d960af0
merge with main
Feb 11, 2024
1c6a252
correct condition on cohen's d sample sizes
Feb 11, 2024
c7236f9
adapt PDR, Cohens' D and H to accept a list of list of labels (so tha…
Feb 12, 2024
0d50b66
Merge branch 'main' into aggregation_inherit_instance
Feb 12, 2024
9ad6aa7
Delete src/unitxt/catalog/metrics/robustness/fixed_group_cohens_d_acc…
sam-data-guy-iam Feb 12, 2024
a74ab93
Delete src/unitxt/catalog/metrics/robustness/fixed_group_cohens_d_str…
sam-data-guy-iam Feb 12, 2024
b0c3f5e
Delete src/unitxt/catalog/metrics/robustness/fixed_group_norm_cohens_…
sam-data-guy-iam Feb 12, 2024
e7b0e00
Delete src/unitxt/catalog/metrics/robustness/fixed_group_norm_cohens_…
sam-data-guy-iam Feb 12, 2024
f061b1f
Delete src/unitxt/catalog/metrics/robustness/fixed_group_pdr_accuracy…
sam-data-guy-iam Feb 12, 2024
ca38302
Delete src/unitxt/catalog/metrics/robustness/fixed_group_pdr_string_c…
sam-data-guy-iam Feb 12, 2024
8634264
rename to include string 'paraphrase' to distinguish from 'all variants'
Feb 12, 2024
13155c7
Merge branch 'aggregation_inherit_instance' of https://github.com/IBM…
Feb 12, 2024
7b16dd5
Delete src/unitxt/catalog/metrics/robustness/fixed_group_cohens_d_par…
sam-data-guy-iam Feb 13, 2024
e5c71cc
Delete src/unitxt/catalog/metrics/robustness/fixed_group_cohens_d_par…
sam-data-guy-iam Feb 13, 2024
273c389
redefine Cohen's d as Hedge's g, with correction.
Feb 13, 2024
5f1205e
merge with main
Feb 13, 2024
a480683
rename Cohen's d
Feb 13, 2024
fe1bde6
add ZeroDivisionError in Hedge's g
Feb 13, 2024
18ec140
rename Hedges g to Norm Hedges g, and divide by maximum to rescale to…
Feb 14, 2024
7552e59
initial commit, rename from hedges_g
Feb 14, 2024
4dbd997
Delete src/unitxt/catalog/metrics/robustness/fixed_group_hedges_g_par…
sam-data-guy-iam Feb 14, 2024
f014bc6
Delete src/unitxt/catalog/metrics/robustness/fixed_group_hedges_g_par…
sam-data-guy-iam Feb 14, 2024
95b1587
Merge branch 'aggregation_inherit_instance' of https://github.com/IBM…
Feb 14, 2024
04cec38
fix PDR so if both means are 0, return 0 rather than NaN
Feb 14, 2024
be7410e
final PR changes, remove agg_func definition
Feb 14, 2024
3a98a47
remove checks on instances in get_group_scores that were already vali…
Feb 14, 2024
81dbff4
remove deepcopy
Feb 15, 2024
bcc1ae0
fix some comments and parameter names. Make TokenOverlap do conversi…
Feb 20, 2024
1afb458
Merge branch 'main' into aggregation_inherit_instance
Feb 20, 2024
ccf447f
initial commit
Feb 20, 2024
7472623
add absolute value version of Hedges G / Cohens H
Feb 20, 2024
3593542
add absolute value version of Hedges G / Cohens H to tests
Feb 20, 2024
ce207b7
merge with main
Feb 20, 2024
6249538
changes to global metric confidence interval now resample non-NaN val…
Feb 20, 2024
4c28f1a
Merge branch 'main' into aggregation_inherit_instance
Feb 20, 2024
651dd84
Revert "changes to global metric confidence interval now resample non…
Feb 20, 2024
34af53b
changes to global metric confidence interval now resample non-NaN val…
Feb 20, 2024
36edb77
Merge branch 'main' into aggregation_inherit_instance
matanor Feb 21, 2024
45ee96c
Merge branch 'main' into aggregation_inherit_instance
matanor Feb 22, 2024
3d4c712
Merge branch 'main' into aggregation_inherit_instance
matanor Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 312 additions & 0 deletions prepare/metrics/grouped_instance_metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,312 @@
from src.unitxt import add_to_catalog
from src.unitxt.metrics import (
GroupMeanAccuracy,
GroupMeanStringContainment,
GroupMeanTokenOverlap,
GroupPDRAccuracy,
GroupPDRStringContainment,
)
from src.unitxt.test_utils.metrics import test_metric

predictions = [
"A B",
"BC D",
"C",
"123",
"BCD",
10,
" BD",
"AB",
"I am a dog",
"AB C",
"AB 1",
"GMA",
0.123,
"BD",
"abc",
]

references = [
["B", "AB", "A"],
["A", "BC D", "BC DF"],
["c", " C"],
[13, 23, 234],
[" ", " BD", " BDA"],
[1, 10, 100],
["A", "B", "BD"],
["ABC", "ab", "BC"],
["I am a person", "I AM A DOG", "ABC"],
["AB CD", "AB", "ab"],
["AB 1", "AB1"],
[" GMA 123", "GMA"],
["123", 0.12],
["BDE", "BCE", "bdefs"],
[" abcdefg", "AB", "abcd"],
]

# possibly multi-column group identifier
additional_inputs = (
[{"group": "grp1", "id": 0, "ignore": 1}] * 5
+ [{"group": "grp1", "id": 1, "ignore": 1}] * 5
+ [{"group": "grp2", "id": 0, "ignore": 1}] * 4
+ [{"group": "grp2", "id": 1, "ignore": 0}] * 1
)

group_by_fields = ["group", "id"]
sam-data-guy-iam marked this conversation as resolved.
Show resolved Hide resolved
# construct grouping_field by combining two other fields (and ignoring one); mimics what you would do in cards
for ai in additional_inputs:
ai.update({"group_id": "_".join([str(ai[ff]) for ff in group_by_fields])})


instance_targets_string_containment = [
sam-data-guy-iam marked this conversation as resolved.
Show resolved Hide resolved
{"score": 1.0},
{"score": 1.0},
{
"score": 0.0,
},
{
"score": 1.0,
},
{
"score": 0.0,
},
{
"score": 1.0,
},
{
"score": 1.0,
},
{
"score": 0.0,
},
{
"score": 0.0,
},
{
"score": 1.0,
},
{
"score": 1.0,
},
{
"score": 1.0,
},
{
"score": 1.0,
},
{
"score": 0.0,
},
{
"score": 0.0,
},
]

for instance in instance_targets_string_containment:
instance.update(
{"string_containment": instance["score"], "score_name": "string_containment"}
)

instance_targets_accuracy = [
sam-data-guy-iam marked this conversation as resolved.
Show resolved Hide resolved
{"score": 0.0},
{"score": 1.0},
{"score": 0.0},
{"score": 0.0},
{"score": 0.0},
{"score": 1.0},
{"score": 0.0},
{"score": 0.0},
{"score": 0.0},
{"score": 0.0},
{"score": 1.0},
{"score": 1.0},
{"score": 0.0},
{"score": 0.0},
{"score": 0.0},
]

for instance in instance_targets_accuracy:
instance.update({"accuracy": instance["score"], "score_name": "accuracy"})

metric = GroupMeanAccuracy()
global_target = {
"group_mean_accuracy": 0.22,
"score": 0.22,
"score_name": "group_mean_accuracy",
"score_ci_low": 0.02,
"score_ci_high": 0.44,
"group_mean_accuracy_ci_low": 0.02,
"group_mean_accuracy_ci_high": 0.44,
}


outputs = test_metric(
metric=metric,
predictions=predictions,
references=references,
instance_targets=instance_targets_accuracy,
global_target=global_target,
additional_inputs=additional_inputs,
)

add_to_catalog(metric, "metrics.group_mean_accuracy", overwrite=True)


metric = GroupMeanStringContainment()
global_target = {
"group_mean_string_containment": 0.49,
"score": 0.49,
"score_name": "group_mean_string_containment",
"score_ci_low": 0.16,
"score_ci_high": 0.71,
"group_mean_string_containment_ci_low": 0.16,
"group_mean_string_containment_ci_high": 0.71,
}


outputs = test_metric(
metric=metric,
predictions=predictions,
references=references,
instance_targets=instance_targets_string_containment,
global_target=global_target,
additional_inputs=additional_inputs,
)

add_to_catalog(metric, "metrics.group_mean_string_containment", overwrite=True)


# PDR
metric = GroupPDRAccuracy()
global_target = {
"group_pdr_accuracy": 0.83,
"score": 0.83,
"score_name": "group_pdr_accuracy",
"score_ci_low": 0.38,
"score_ci_high": 1.0,
"group_pdr_accuracy_ci_low": 0.38,
"group_pdr_accuracy_ci_high": 1.0,
}


outputs = test_metric(
metric=metric,
predictions=predictions,
references=references,
instance_targets=instance_targets_accuracy,
global_target=global_target,
additional_inputs=additional_inputs,
)

add_to_catalog(metric, "metrics.group_pdr_accuracy", overwrite=True)


metric = GroupPDRStringContainment()
global_target = {
"group_pdr_string_containment": 0.44,
"score": 0.44,
"score_name": "group_pdr_string_containment",
"score_ci_low": 0.14,
"score_ci_high": 1.0,
"group_pdr_string_containment_ci_low": 0.14,
"group_pdr_string_containment_ci_high": 1.0,
}


outputs = test_metric(
metric=metric,
predictions=predictions,
references=references,
instance_targets=instance_targets_string_containment,
global_target=global_target,
additional_inputs=additional_inputs,
)

add_to_catalog(metric, "metrics.group_pdr_string_containment", overwrite=True)


# create references and predictions with only 3 unique values
short_predictions = [
"A",
"B",
"B",
"A",
"B",
"B",
"A",
"A",
"B",
"B",
"A",
"B",
"A",
"A",
"B",
]

short_references = [
["A", "B"],
["A", "C"],
["B", "C", "A"],
["A"],
["B", "A"],
["C", "B"],
["A"],
["B", "C"],
["A", "B", "C"],
["A", "B"],
["B", "C"],
["C"],
["C", "B"],
["B", "A"],
["B"],
]


global_target = {
"group_mean_f1": 0.5,
"score": 0.5,
"score_name": "group_mean_f1",
"group_mean_f1_ci_low": 0.32,
"group_mean_f1_ci_high": 0.79,
"score_ci_low": 0.32,
"score_ci_high": 0.79,
"group_mean_precision": 0.5,
"group_mean_precision_ci_low": 0.32,
"group_mean_precision_ci_high": 0.79,
"group_mean_recall": 0.5,
"group_mean_recall_ci_low": 0.32,
"group_mean_recall_ci_high": 0.79,
}

instance_targets_token_overlap = [
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 0, "recall": 0, "f1": 0, "score": 0, "score_name": "f1"},
{"precision": 1.0, "recall": 1.0, "f1": 1.0, "score": 1.0, "score_name": "f1"},
]


metric = GroupMeanTokenOverlap()

outputs = test_metric(
metric=metric,
predictions=short_predictions,
references=short_references,
instance_targets=instance_targets_token_overlap,
global_target=global_target,
additional_inputs=additional_inputs,
)

add_to_catalog(metric, "metrics.group_mean_token_overlap", overwrite=True)
3 changes: 3 additions & 0 deletions src/unitxt/catalog/metrics/group_mean_accuracy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"type": "group_mean_accuracy"
}
3 changes: 3 additions & 0 deletions src/unitxt/catalog/metrics/group_mean_string_containment.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"type": "group_mean_string_containment"
}
3 changes: 3 additions & 0 deletions src/unitxt/catalog/metrics/group_mean_token_overlap.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"type": "group_mean_token_overlap"
}
3 changes: 3 additions & 0 deletions src/unitxt/catalog/metrics/group_pdr_accuracy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"type": "group_pdr_accuracy"
}
3 changes: 3 additions & 0 deletions src/unitxt/catalog/metrics/group_pdr_string_containment.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"type": "group_pdr_string_containment"
}
Loading
Loading