Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

When does AverageMetric reset? #3974

Closed
ethanjperez opened this issue Aug 24, 2021 · 4 comments
Closed

When does AverageMetric reset? #3974

ethanjperez opened this issue Aug 24, 2021 · 4 comments

Comments

@ethanjperez
Copy link

I see that the AverageMetric() data structure is often used to compute/log various training metrics (e.g., here) - Is the averaging computed over a single SGD batch, GPU batch (possibly smaller than an SGD batch), the training epoch thus far, or something else?

cc @stephenroller @hadasah

@stephenroller
Copy link
Contributor

The precise answer is it's being averaged between resetting metrics. This time is variable.

During training, training metric averages depend on your exact settings of -lstep, -ltim, and -lesps. During validation, they are across the entire validation run.

@stephenroller
Copy link
Contributor

I will say that the reason for this complex data structure is that it remains correct no matter where you look at it: per process, per gpu, per validation run, etc. that's the reason we keep numerator and denominator separate: so we can aggregate correctly always

@stephenroller
Copy link
Contributor

The last comment I'll add: the reason for "local metrics" is it lets us delay aggregating per-task metrics when training on multiple datasets. Essentially, metrics are associated with the dataset the came from until printing time.

@ethanjperez
Copy link
Author

Got it, thanks for clarifying!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants