-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tqdm progress bar #765
Comments
Another nice addition would be a global progress bar to have an ETA for the end of the whole training. Maybe a more general way to address this issue is to abstract the use of the progress bar in |
@hadim sounds interesting, do you have any particular implementation in mind? |
I think the progress bar should not be hardcoded in the trainer but abstracted in a callback. Once #776 is merged I could have a look if it's possible with the current API. More generally the loggers should also be callbacks IMO. That being said it's easy to propose when you're not in charge :-) I'll try to make a PR once #776 is merged. |
@hadim are still interested in implementing this progress bar? |
I've made a custom progress bar as a callback and it works well for my needs. Not sure it will fit everyone's needs. from tqdm.auto import tqdm
import torch
from pytorch_lightning.callbacks import Callback
class ProgressBar(Callback):
"""Global progress bar.
TODO: add progress bar for training, validation and testing loop.
"""
def __init__(self, global_progress: bool = True, leave_global_progress: bool = True):
super().__init__()
self.global_progress = global_progress
self.global_desc = "Epoch: {epoch}/{max_epoch}"
self.leave_global_progress = leave_global_progress
self.global_pb = None
def on_fit_start(self, trainer, pl_module):
desc = self.global_desc.format(epoch=trainer.current_epoch + 1, max_epoch=trainer.max_epochs)
self.global_pb = tqdm(
desc=desc,
total=trainer.max_epochs,
initial=trainer.current_epoch,
leave=self.leave_global_progress,
disable=not self.global_progress,
)
def on_fit_end(self, trainer, pl_module):
self.global_pb.close()
self.global_pb = None
def on_epoch_end(self, trainer, pl_module):
# Set description
desc = self.global_desc.format(epoch=trainer.current_epoch + 1, max_epoch=trainer.max_epochs)
self.global_pb.set_description(desc)
# Set logs and metrics
logs = pl_module.logs
for k, v in logs.items():
if isinstance(v, torch.Tensor):
logs[k] = v.squeeze().item()
self.global_pb.set_postfix(logs)
# Update progress
self.global_pb.update(1) Only a global progress bar is implemented at the moment. I could make a PR but some people might prefer the original one so I don't know if it's worth it. |
yeah it looks the much cleaner way that using the callback driven progress bar then checking the for loop wrapped by |
May I also add that I find the tqdm progress bar starting weirdly, with a percentage equal with 6% just after a single batch. And the progress bar shows final value of 790, but if I am to calculate it by hand an epoch either has 528 or 1056 (either one pass or one forward, one backward). |
the bar shows the sum of train + val |
Sorry, I do not follow, I was referring to the progress counter being of, like after a single batch it shows:
|
50/790 = 6%. you can change that argument from 50 to 1 (bar refresh rate) |
@hadim i think abstracting the current progress bar into a callback would be cool. then as you said, the user can modify it however they want by overriding parts of the callback. |
Yes, but that jump to 50 happens after only 1 batch. Shouldn't it stay at 0 until batch no 50? |
@williamFalcon: I agree this should be done in a callback. Not sure I'll have time to do that in the short term but anyone is free to use my code above. |
A bit related question, should a progress bar look like below? It creates a "list of progress bars" when it switches to evaluation mode.
|
I was observing something similar in other projects and it is hard to determine, sometimes id cased by debug mode (eg in PyCharm)... but this it TQDM related thing, I think that we can't do anything about it... :[ |
@hadim still willing to implement #765 (comment) ? |
Sorry @Borda but this is not a good moment for me to do that. |
@awaelchli may you self-assign also this one as they are almost the same... |
@Borda yes, could you assign me (can't self-assign) :) |
@awaelchli I would assume to be closed by #1450 and if we find we need something else we will all it later... anyway feel to reopen you we are (I am) missing something 🐰 |
Any suggestions on how to resolve this? |
In which terminal emulator are you running this? |
I ran it on zsh and bash. tqdm==4.48.2, pytorch-lightning==1.0.0 |
I am seeing this behavior in jupyterlab as well:
The progress bar seems to work well when testing, in |
It's because of the stacking. progress bar stacking has never worked well in jupyter and google colab. As far as we know, it's a tqdm issue. Try running a stacked tqdm progress bar (without Lightning) in a Jupyter and you will see the same. |
In method Got the idea from here. |
If we set it to leave=True, it will stay and fill up the terminal. But we want it to go away once validation is over because it's only a temporary bar that runs in parallel with the main bar. The main bar should stay always because it shows the epoch counter for the whole training. Maybe I'm missing something. Appreciate you trying to look for the fix. |
I ran the following code to test if the setting
I then ran my model with the custom callback, and after a few steps (~50% epoch) the screen was packed again with multiple printed lines :( As a temporary fix I will disable the validation progress bar with a custom callback, at least when running with Jupyter. Thanks for the help! |
I just have a problem about the rewrite of the tqdm progressbar, I want to keep the train and val progressbar ,so I set both of them the leave==True. But when I print some information about the result in the val_epoch_end, it rewrite the progressbar like it : |
I don't understand exactly what you are trying to achieve.
|
Where can we use call this class ProgressBar. Is it called in pl.Trainer()? |
It's a callback, so you can add it to the callback list in the Trainer: |
I'm new to PyTorch Lightning and still would like the global ETA functionality. I've read through this thread and this one and it's still unclear to me how to get an ETA for how long training will take. I've tried copying the code above for a global ETA, but right now I'm getting the error What do I need to do in order to get the global ETA functionality? EDIT: Nevermind, I just removed the |
Here's an updated version of the code that should work for the newer callback functions. It also includes a lower-level training progress bar in addition to the global progress bar: from tqdm.auto import tqdm
from pytorch_lightning.callbacks import Callback
class GlobalProgressBar(Callback):
"""Global progress bar.
Originally from: https://github.com/Lightning-AI/lightning/issues/765
"""
def __init__(self, global_progress: bool = True, leave_global_progress: bool = True):
super().__init__()
self.global_progress = global_progress
self.global_desc = "Epoch: {epoch}/{max_epoch}"
self.leave_global_progress = leave_global_progress
self.global_pb = None
self.step_pb = None
def on_fit_start(self, trainer, pl_module):
desc = self.global_desc.format(epoch=trainer.current_epoch + 1, max_epoch=trainer.max_epochs)
self.global_pb = tqdm(
desc=desc,
total=trainer.max_epochs,
initial=trainer.current_epoch,
leave=self.leave_global_progress,
disable=not self.global_progress,
)
def on_train_epoch_start(self, trainer, pl_module):
self.step_pb = tqdm(
desc="Training",
total=len(trainer.train_dataloader),
leave=False,
)
def on_train_epoch_end(self, trainer, pl_module):
self.step_pb.close()
self.step_pb = None
# Set description
desc = self.global_desc.format(epoch=trainer.current_epoch + 1, max_epoch=trainer.max_epochs)
self.global_pb.set_description(desc)
# # Set logs and metrics
# logs = pl_module.logs
# for k, v in logs.items():
# if isinstance(v, torch.Tensor):
# logs[k] = v.squeeze().item()
# self.global_pb.set_postfix(logs)
# Update progress
self.global_pb.update(1)
def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx):
self.step_pb.update(1)
def on_fit_end(self, trainer, pl_module):
self.global_pb.close()
self.global_pb = None To use, do
|
At the moment the progress bar is initialized with the arg
leave=False
: https://github.com/PyTorchLightning/pytorch-lightning/blob/deffbaba7ffb16ff57b56fe65f62df761f25fbd6/pytorch_lightning/trainer/trainer.py#L861Sometimes, it's nice to be able to see the previous progress bar to look at the evolution of the loss and metrics.
Would that be possible to add an arg to the trainer to be able to override default tqdm parameters?
Also, another point: tqdm progress bars can be nested (https://github.com/tqdm/tqdm#nested-progress-bars). Could we imagine having a global progress bar and then a nested progress bar for each epoch loop?
The text was updated successfully, but these errors were encountered: