Closed
Description
🚀 Feature
Toggle switch to turn off EarlyStopping logging for processes other than rank 0
Motivation
EarlyStopping logging can be a bit spammy when viewing aggregate logs across all processes. For example, with my custom CloudWatch logger:
xnpww4j62d-algo-1-vr8o9 | 14:17:49 [INFO] Epoch 9: [ Training | 100% iter# 49/49 19.28 batches/s ] train/loss_step=0.764418, train/loss_epoch=0.773, train/acc=0.68356
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] Epoch 9: [ Validation | 100% iter# 10/10 2.34 batches/s ] val/loss_step=1.253475, val/loss_epoch=1.278802, val/acc=0.6107
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 0] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 2] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 1] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 3] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 4] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 5] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 6] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:17:55 [INFO] [rank: 7] Metric val/acc improved by 0.195 >= min_delta = 0.0. New best score: 0.611
xnpww4j62d-algo-1-vr8o9 | 14:18:20 [INFO] Epoch 14: [ Training | 100% iter# 49/49 18.94 batches/s ] train/loss_step=0.611876, train/loss_epoch=0.55, train/acc=0.80096
xnpww4j62d-algo-1-vr8o9 | 14:18:26 [INFO] Epoch 14: [ Validation | 100% iter# 10/10 2.29 batches/s ] val/loss_step=0.748429, val/loss_epoch=0.828285, val/acc=0.726
Pitch
It would be nice if we could turn off printing of this message on processes other than rank 0. I understand that this is actually useful to monitor in some cases, so maybe this toggle could be set to False by default.
Alternatives
Custom EarlyStopping callback?