-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add total training steps as a property to trainer #10760
Comments
I have definitely seen this question being asked over and over. So I believe we should do it ! @awaelchli @carmocca @ananthsub ? |
The last time this was discussed, I remember somebody mentioning that the problem with this is that it will not be correct in all circumstances and there's no way for us to know it. A problem of silent correctness. Also, this property cannot be called from anywhere, the attributes used inside need to be computed first. Just pointing out possible problems, not saying we shouldn't add it. |
I raised the concern for correctness earlier. However, I do believe it becomes more important to provide such utility because Lightning is getting more complex and harder to understand what's happening under the hood for the regular user. And if users ask for it, we should provide we should provide the most accurate estimation that is possible, then at the same time document what the unknowns are (e.g., accumulation scheduling via callback). Also, which implementation are the users asking for? The number of training steps, i.e., the number of times the training_step will be called / the size of the "dataloader", OR the number of optimization steps? |
Thanks for picking this up @rohitgr7 again, I'm very for this :) I've added both functionality manually into a class for Flash and Transformers and it would be nice to have a single function available through the trainer. Regarding correctness, I think the cases this function breaks down I detailed above. However in my opinion I think the gains outweigh the issue of correctness. It may be worth putting a disclaimer in the docstring that in certain cases we cannot estimate correctly! |
I think except for the dataloader, every other argument will already be there when we initialize the trainer. but yeah dataloader might need a few more things that might not be there for eg. dataset, batch_size... so won't be able to access it inside
accumulation via callback will still be available right? since we will have it during Trainer init?
its number of optimization steps. We can rename the method to |
AFAIK for lr scheduling you only use the number of optimization steps, since this is the only thing that may influence the model and the optimizer :) So I think it's fine to go with that and not provide the size of the "dataloader" |
In that case, after one epoch, |
I think yes.. it should be. any edge case we missed? |
I got |
@mariomeissner in which hook/method did you call this function? |
@rohitgr7 I call it inside |
You probably need to change this line
(the error |
One thing I noticed with the provided solution, if passing |
good catch! I think we should use |
@rohitgr7 can we add this as a property now? Definitely would be a great feature I'm interested in :) |
yes! just waiting for final approvals. |
Sounds good to me. |
|
|
For the name, I would suggest something like |
I believe |
🚀 Feature
See title:
Similar issues: #5449, #10430.. I'll keep linking more.
Motivation
This is a highly requested feature from the lightning community. Total training steps is being used by some of the lr schedulers, especially when using transformer models, but since there are a lot of arguments/flags involved for computing it, it's not easy for a user to create one that can work on all the possible edge cases with no-code change. Also we make updates to some of these flags and accessibility to some of its components (for eg. train_datalaoders in v1.5), so there is a possibility that a custom one created by a user might get outdated soon, and one has to write a new one which is only possible if they are well aware with the codebase internals. But we as core-contributors and maintain it with some tests of course.
Pitch
would like to credit @SeanNaren for helping out :)
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda
cc @PyTorchLightning/core-contributors
The text was updated successfully, but these errors were encountered: