-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v cool
parlai/core/metrics.py
Outdated
@@ -34,6 +34,100 @@ | |||
} | |||
ALL_METRICS = DEFAULT_METRICS | ROUGE_METRICS | BLEU_METRICS | DISTINCT_METRICS | |||
|
|||
MetricDisplayData = namedtuple('MetricDisplayData', ('title', 'description')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about a data class instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what's the advantage you see? I was using a namedtuple because this isn't really data that should be mutable.
If it's the pretty syntax you're looking for, though, I just found out there's a nicer syntax for namedtuples:
class MetricDisplayData(NamedTuple):
title: str
description: str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used NamedTuples before and it gave me headaches and now i stay away with them. That syntax is nicer and fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what headaches you encountered! I haven't used them a ton (just for small things like this here and there) so maybe there're headaches incoming I'm ignorant of.
} | ||
|
||
|
||
def get_metric_display_data(metric: str) -> MetricDisplayData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe as a utility of MetricsDisplayData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of nice to keep this functional, though, since there isn't any state we should be keeping around. Also, it needs access to METRICS_DISPLAY_DATA which I think makes more sense scoped to the namespace than a class. What's the advantage you see from putting it in MetricDisplayData?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a classmethod (and maybe the global too), just to keep everything in a tight namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol prolly the global can't be in there so long as it's self-typed...
anyway saul goodman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think if we went that path, the metrics/titles/descriptions would live in a separate json/yaml file. And we'd have a separate function that loads them up as MetricDisplayData
s into a global. But then there'd be a disconnect between the source of truth and the global which is a little weird. I guess we could make MetricDisplayData
a singleton and load them up on instantiation, but then we have to instantiate an object just to get this static list of strings which feels heavy.
Another way to keep them in a tight namespace would be to just create a metrics_list.py
module.
Idk let me know if any of those options sound better, I see plenty of advantages and disadvantages to each so not super opinionated lol
Patch description
The abbreviated metric titles are hard to get used to, but using longer metric identifiers makes the stdout print hard to parse.
This PR adds a separation between the metric identifier and the display name of the metric. It also adds a description. The title and description are included in tensorboard, but the abbreviated identifiers are used in the logs.
It moves the source of truth for metric titles and descriptions from the docs to the code, and generates the metrics table from this.
There are a few things I don't really have time to tackle right now, but would be nice for a later PR:
Testing steps
Ran a basic test run locally:
Verified the metrics were short and fit on the screen:
Verified that they were long and had descriptions on Tensorboard:
Built website:
Verified the metrics list was rendered: