Metrics with different analysis levels have the same name #530

qjiang002 · 2022-10-04T21:49:09Z

Some tasks may use the same metric on different analysis levels. Although they metric functions at different levels are different, they share the same name, and this leads to the same metric names in the report's overall performance. Such tasks are NER and argument pair extraction (APE).

This may cause problem in sorting the systems using the metric score. When changing list[(name, thing)] to dict[name]=thing, analysis levels should be one level above metric name in issue #491 .

NER default metrics

defaults: dict[str, dict[str, MetricConfig]] = {
    "example": {
        "F1": SeqF1ScoreConfig(
            source_language=source_language,
            target_language=target_language,
            tag_schema="bio",
        )
    },
    "span": {
        "F1": F1ScoreConfig(
            source_language=source_language,
            target_language=target_language,
            ignore_classes=[cls._DEFAULT_TAG],
        )
    },
}

NER analysis report

  "results": {
    "overall": [
      {
        "F1": {
          "value": 0.9221652220060144,
          "confidence_score_low": null,
          "confidence_score_high": null,
          "auxiliary_result": null
        }
      },
      {
        "F1": {
          "value": 0.9221652220060145,
          "confidence_score_low": null,
          "confidence_score_high": null,
          "auxiliary_result": null
        }
      }
    ]
}

APE default metrics

defaults: dict[str, dict[str, MetricConfig]] = {
    'example': {
        "F1": APEF1ScoreConfig(
            source_language=source_language,
            target_language=target_language,
        )
    },
    'block': {
        "F1": F1ScoreConfig(
            source_language=source_language,
            target_language=target_language,
            ignore_classes=[cls._DEFAULT_TAG],
        )
    },
}

APE analysis report

  "results": {
    "overall": [
      {
        "F1": {
          "value": 0.25625192960790366,
          "confidence_score_low": null,
          "confidence_score_high": null,
          "auxiliary_result": null
        }
      },
      {
        "F1": {
          "value": 0.25625192960790366,
          "confidence_score_low": null,
          "confidence_score_high": null,
          "auxiliary_result": null
        }
      }
    ]
}

The text was updated successfully, but these errors were encountered:

odashi · 2022-10-06T00:58:46Z

This was actually a potential problem revealed by recent changes, which is caused by the fact that the final report does not have explicit information of mappings between analysis levels and performances.

I think it would be essentially better to give a unique name for every metric:

default_metric_configs: dict[str, MetricConfig] = {
    "example_foo": FooConfig(...),
    "block_foo": FooConfig(...),
}

And then a specific analysis level name is used to choose a set of metrics:

level_to_metrics: dict[str, list[str]] = {
    "example": ["example_foo", ...],
    "block": ["block_foo", ...],
}

return {k: default_metric_configs[k] for k in level_to_metrics[level]}

odashi · 2022-10-06T05:15:59Z

Anyway, I will fix this by changing Result.overall to a dict.

odashi · 2022-10-06T05:31:22Z

I found that some meta-analysis code is unable to be fixed quickly. Since it heavily relies on the order of the original list.
I think meta_analyses directory is not tested appropriately at all and it doesn't work for now.

odashi · 2022-10-06T05:31:42Z

@neubig

odashi · 2022-10-06T05:56:40Z

I proposed #534, which doesn't include fixes for meta_analysis.

odashi mentioned this issue Oct 6, 2022

list[dict[Performance]] to dict[dict[Performance]] #534

Merged

odashi closed this as completed in #534 Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics with different analysis levels have the same name #530

Metrics with different analysis levels have the same name #530

qjiang002 commented Oct 4, 2022

odashi commented Oct 6, 2022 •

edited

Loading

odashi commented Oct 6, 2022 •

edited

Loading

odashi commented Oct 6, 2022

odashi commented Oct 6, 2022

odashi commented Oct 6, 2022 •

edited

Loading

Metrics with different analysis levels have the same name #530

Metrics with different analysis levels have the same name #530

Comments

qjiang002 commented Oct 4, 2022

odashi commented Oct 6, 2022 • edited Loading

odashi commented Oct 6, 2022 • edited Loading

odashi commented Oct 6, 2022

odashi commented Oct 6, 2022

odashi commented Oct 6, 2022 • edited Loading

odashi commented Oct 6, 2022 •

edited

Loading

odashi commented Oct 6, 2022 •

edited

Loading

odashi commented Oct 6, 2022 •

edited

Loading