Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list[(name, Thing)] to dict[name, Thing], and remove MetricConfig and directly store configs in Metric #491

Open
odashi opened this issue Sep 14, 2022 · 9 comments

Comments

@odashi
Copy link
Contributor

odashi commented Sep 14, 2022

EDIT: This issue is currently represents two separate topics:

  • Change list[(name, Thing)] data structure to dict[name, Thing] if the name is the primary key.
  • Discussion to directly represent all configs by Metric itself.

Original proposal:

Metric is instantiated by appropriate MetricConfig. This strategy is used only in Metric and any other classes are instantiated by directly calling __init__ with specific arguments. This special architecture around Metric actually causes some confusion (e.g., #455).
This separation also requires explicit circular dependency: Metric holds MetricConfig, while MetricConfig knows corresponding Metric (in to_metric). This is somewhat annoying because it always lets us to refactor both concepts at the same time.

As long as I checked the all subclasses of Metric, there is no heavy implementation: they simply hold configs without any special initialization processes. It looks we have no any reason to separate MetricConfig from Metric at interface level. Integrating them into the Metric could simplify the architecture.

RFC: @neubig @pfliu-nlp @tetsuok

@odashi
Copy link
Contributor Author

odashi commented Sep 14, 2022

MetricConfig is stored in MetricResult too, but it seems this information is not necessarily required. There is also Performance that has almost the same information with MetricResult, and the only difference is that it stores metric's name instead.
I think using Performance rather than MetricResult is sufficient. We can pass the original Metric with the corresponding Performance if the both should be referred.

@odashi
Copy link
Contributor Author

odashi commented Sep 24, 2022

Potential change would be:

  • Remove MetricConfig, store every information necessary to perform metrics to Metric.
  • Integrate MetricResult and Performance into a single class:
    class MetricResult:
      metric_name: str
      values: dict[str, MetricValue]

I'm still waiting for comments @neubig @pfliu-nlp @tetsuok

@tetsuok
Copy link
Contributor

tetsuok commented Sep 24, 2022

@odashi I agree with the proposal and the potential changes. Removing the circular dependency seems a good idea to make things simpler.

@pfliu-nlp
Copy link
Collaborator

@odashi

"Remove MetricConfig, store every information necessary to perform metrics to Metric."

In general, I think it looks good. I think the reason why we have two separate classes originally (Metric and MetricConfig) is to handle the following case (ref:

HitsConfig(name='Hits1', hits_k=1),
)

            HitsConfig(name='Hits1', hits_k=1),
            HitsConfig(name='Hits2', hits_k=2),
            HitsConfig(name='Hits3', hits_k=3),
            HitsConfig(name='Hits5', hits_k=5),
            HitsConfig(name='Hits10', hits_k=10),

When we merge MetricConfig into Metric, I think we will have two names for each Metric class: metric_name and metric_class_name.

(Again, I think this refactoring will potentially affect explainaboard web and I will follow PRs on this discussion.)

@odashi
Copy link
Contributor Author

odashi commented Sep 27, 2022

When we merge MetricConfig into Metric, I think we will have two names for each Metric class: metric_name and metric_class_name.

It looks natural to me, though it also looks the name field needs to be separated from Metric because it behaves a key of the objects.
Currently, metric configss are stored as list[MetricConfig] with internal key name, but it is more natural to reform it to dict[str, Metric] with explicit dictinary key so that metric_name in other classes point to the key of this dict, not the member of Metric.

@pfliu-nlp
Copy link
Collaborator

pfliu-nlp commented Sep 27, 2022

Hi, @odashi

"Currently, metric configss are stored as list[MetricConfig] with internal key name, but it is more natural to reform it to dict[str, Metric]"

Yes, this is another good point: do we need to shift the way of storing MetricConfig (or Metric if we have merged them) from List[Metric] to dict[str, Metric]? In the explainaboard web, sometimes we need to manually get dict[str, MetricConfig] from List[MetricConfig] for some convenience.

A similar thing happens in the AnalysisLevels.

@odashi
Copy link
Contributor Author

odashi commented Sep 27, 2022

@pfliu-nlp I think we can first implement dict[str, MetricConfig] and then come back to this issue.
(Generally I recommend dict[key, something] over list[(key, something)])

@odashi
Copy link
Contributor Author

odashi commented Sep 27, 2022

I found that EaaSMetric has some abuse of the name field: it is used to specify external metric.

if self.config.name in {'bleu', 'chrf'}:

EaaSMetricConfig should have a field to specify this value (it may have the same name name)

@odashi odashi changed the title Remove MetricConfig and directly store configs in Metric list[(name, Thing)] to dict[name, Thing], and remove MetricConfig and directly store configs in Metric Sep 30, 2022
@neubig
Copy link
Contributor

neubig commented Oct 13, 2022

Sorry about the late comment on this, but I agree with getting rid of MetricConfig. Sounds good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants