@@ -34,21 +34,20 @@ class DeviceStatsMonitor(Callback):
3434 r"""Automatically monitors and logs device stats during training, validation and testing stage.
3535 ``DeviceStatsMonitor`` is a special callback as it requires a ``logger`` to passed as argument to the ``Trainer``.
3636
37- Logged Metrics:
38- Device statistics are logged with keys prefixed as
39- ``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
40- ``DeviceStatsMonitor.on_train_batch_start/cpu_percent``).
41- The source of these metrics depends on the ``cpu_stats`` flag
42- and the active accelerator.
37+ Device statistics are logged with keys prefixed as ``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
38+ ``DeviceStatsMonitor.on_train_batch_start/cpu_percent``).
39+ The source of these metrics depends on the ``cpu_stats`` flag and the active accelerator.
4340
4441 CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``.
4542 All are percentages (%).
4643 CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from
4744 PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes).
4845 GPU compute utilization is not logged by default.
49- Other Accelerators (e.g., TPU, MPS): Logs device-specific stats.
46+ Other Accelerators (e.g., TPU, MPS): Logs device-specific stats:
47+
5048 - TPU example: ``avg. free memory (MB)``.
5149 - MPS example: ``mps.current_allocated_bytes``.
50+
5251 Observe logs or check accelerator documentation for details.
5352
5453 Args:
0 commit comments