-
Notifications
You must be signed in to change notification settings - Fork 2.1k
When does AverageMetric reset? #3974
Comments
The precise answer is it's being averaged between resetting metrics. This time is variable. During training, training metric averages depend on your exact settings of -lstep, -ltim, and -lesps. During validation, they are across the entire validation run. |
I will say that the reason for this complex data structure is that it remains correct no matter where you look at it: per process, per gpu, per validation run, etc. that's the reason we keep numerator and denominator separate: so we can aggregate correctly always |
The last comment I'll add: the reason for "local metrics" is it lets us delay aggregating per-task metrics when training on multiple datasets. Essentially, metrics are associated with the dataset the came from until printing time. |
Got it, thanks for clarifying! |
I see that the
AverageMetric()
data structure is often used to compute/log various training metrics (e.g., here) - Is the averaging computed over a single SGD batch, GPU batch (possibly smaller than an SGD batch), the training epoch thus far, or something else?cc @stephenroller @hadasah
The text was updated successfully, but these errors were encountered: