Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Log batch metrics #5362

Merged
merged 9 commits into from
Aug 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
`self.ddp_accelerator` during distributed training. This is useful when, for example, instantiating submodules in your
model's `__init__()` method by wrapping them with `self.ddp_accelerator.wrap_module()`. See the `allennlp.modules.transformer.t5`
for an example.
- We now log batch metrics to tensorboard and wandb.
- Added Tango components, to be explored in detail in a later post
- Added `ScaledDotProductMatrixAttention`, and converted the transformer toolkit to use it
- Added tests to ensure that all `Attention` and `MatrixAttention` implementations are interchangeable
Expand All @@ -46,7 +47,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
with a default value of `False`. `False` means gradients are not rescaled and the gradient
norm is never even calculated. `True` means the gradients are still not rescaled but the gradient
norm is calculated and passed on to callbacks. A `float` value means gradients are rescaled.
- `TensorCache` now supports more concurrent readers and writers.
- `TensorCache` now supports more concurrent readers and writers.
- We no longer log parameter statistics to tensorboard or wandb by default.


## [v2.6.0](https://github.com/allenai/allennlp/releases/tag/v2.6.0) - 2021-07-19
Expand Down
12 changes: 10 additions & 2 deletions allennlp/training/callbacks/log_writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,8 @@ def log_batch(

# Now collect per-batch metrics to log.
metrics_to_log: Dict[str, float] = {}
for key in ("batch_loss", "batch_reg_loss"):
batch_loss_metrics = {"batch_loss", "batch_reg_loss"}
for key in batch_loss_metrics:
if key not in metrics:
continue
value = metrics[key]
Expand All @@ -241,6 +242,13 @@ def log_batch(
self._batch_loss_moving_items[key]
)

for key, value in metrics.items():
if key in batch_loss_metrics:
continue
key = "batch_" + key
if key not in metrics_to_log:
metrics_to_log[key] = value

self.log_scalars(
metrics_to_log,
log_prefix="train",
Expand All @@ -253,7 +261,7 @@ def log_batch(

if self._batch_size_interval:
# We're assuming here that `log_batch` will get called every batch, and only every
# batch. This is true with our current usage of this code (version 1.0); if that
# batch. This is true with our current usage of this code (version 1.0); if that
# assumption becomes wrong, this code will break.
batch_group_size = sum(get_batch_size(batch) for batch in batch_group) # type: ignore
self._cumulative_batch_group_size += batch_group_size
Expand Down
2 changes: 1 addition & 1 deletion allennlp/training/callbacks/tensorboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def __init__(
summary_interval: int = 100,
distribution_interval: Optional[int] = None,
batch_size_interval: Optional[int] = None,
should_log_parameter_statistics: bool = True,
should_log_parameter_statistics: bool = False,
should_log_learning_rate: bool = False,
) -> None:
super().__init__(
Expand Down