-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why I have to write .epoch_end
and all the *_step
#2619
Comments
Hi! thanks for your contribution!, great first issue! |
See #2615 PR that tries to simplify this. |
@awaelchli Thank you for your reply. IMHO the new syntax proposed in #2615 seems even weirded and does not seem to solve the issue. Why do you have to reinvent the wheel? Most libraries have something like: tr = Trainer(..., batch_metrics=[Accuracy()], epoch_metrics=[Accuracy(), ROCAUC()]
... This has three advantages:
Then one can also choose to manually return a metric from the To save memory and compute the epochs metrics you can have default names to the metric and store the batch ones only if the epoch one exists. What do you think? Cheers, Francesco |
I disagree. PyTorchLighting is not trying to be like other libraries. Which parts do you see as "reinvented"?
Not to me by a large margin, but your example code for me perfectly highlights the design concepts behind LightningModule: Everything that is specific to the training and evaluation to a system goes into the LightningModule. The Trainer can be thought of the encapsulation of all the "boilerplate" PyTorch training loop etc and does not contain any research specific properties (like metrics or how losses are computed etc.)
It looks like the docs will become simpler with the structured results feature, but remain backward compatible.
With the structured results, you won't have to implement the *_end and *_epoch_end unless you want to do crazy things as I understand it. |
Thank you @awaelchli for the fast reply!
It remembers me of web development some years ago, everybody claiming they were doing different things ... but this is out of topic Well, in my opinion, if I had to rewrite three times the same code
You can pass them to the Anyway, thank you again for the fast reply. I really appreciate it! Cheers, Francesco |
My same argument applies there. It's not different from the Trainer args case. I want the metrics etc. in my Lightningmodule, and not outside. If I wanted to pass them in from the outside, I would make it a hyperparameter. @PyTorchLightning/core-contributors any thoughts here? |
@awaelchli Thank you for your reply. At this point, I don't see any point in continuing the discussion. I will just copy and paste the code over and over again hoping somebody will implement a new library with better practices soon. Thank you. Be safe! Francesco |
About the code repetitions you mention, there is a way to get rid of them. You can define your default val- and test_step methods in a separate class and then inherit it in your various LightningModules. Btw I like the idea of the general purpose step method. |
in the latest version on master you don’t need step_end or epoch_end. to use the same step you can just call validation_step from training_step. docs will be updated in a few days. Before(to be fair, only training_step was required) def training_step(...):
return loss
def training_step_end(...):
...
def training_epoch_end(...):
... Now:def training_step(...):
result = TrainResult(minimize=loss)
return result To use the same loop for val, test and traindef step(...):
return something
def training_step(...):
something = self.step(...)
result = TrainResult(minimize=something)
return result
def validation_step(...):
something = self.step(...)
result = EvalResult(checkpoint_on=something)
result.log('val_loss', something)
return result
def test_step(...):
result = self.validation_step(...)
result.log('test_loss', something)
return result Maybe for your applications you use the same loop for all three... however, that's very unusual and usually what happens in the train step is different than the other loops. And the metrics need to be names according to what loop you are in... For instance in train loop you log or track things on every batch. On val/test loop you only care about full epoch statistics. So, I'm curious @FrancescoSaverioZuppichini. In what application are you able to use the same loop for everything? |
Ok, docs have been updated! Quick start guidehttps://pytorch-lightning.readthedocs.io/en/latest/new-project.html Or the walk-throughhttps://pytorch-lightning.readthedocs.io/en/latest/introduction_guide.html#validating @FrancescoSaverioZuppichini is that more in line with what you were hoping for? :) |
Hi @williamFalcon, I hope you are doing great. Thank you for the reply, I am doing image classification and I want to return the same metrics from all my dataset splits. I have found it very hard to understand what I have to return from each Let me know if I can help or you are fine with the current implementation. Thanks :) Francesco |
Could you give me an example of how it might work? if it meets the following requirements we could consider it:
i think your metrics passed to trainer suggestion is interesting but i think this has a few pitfalls:
also, what about when i may want accuracy but calculated in a weird way or for a different type of batch structure? ie: metric learning or something. then my model is no longer self contained thanks! |
@williamFalcon sorry for the late reply, I was super busy and this topic went under the radar, my bad. I see your point but (with all due respect) I think you are wrong. Unfortunately, I don't have the time know to show my counter-arguments and I suspect the discussion will go on forever. Very quickly, you can have default behavior and allow users to implement specific methods as you are doing now, this is classic OOP. I am down for a quick voice chat if you would like to hear my suggestion in detail or I will comment here in the not so near future. In the meantime, I will drop PyTorch lighting since it doesn't speed up my productivity, and, sorry to say, bugs are everywhere! Thank you for the discussion ;) Bests |
I hear you! This is why it's used to create new state of the art methods and new research. You might be looking for something else that doesn't require as much pytorch knowledge or is meant for simpler cases. This is unfortunately not the aim of lightning. |
Hi @williamFalcon, thank you for the reply! I have a lot of programming knowledge and I have been using PyTorch for years. But if I have to write tons of code to just do a very basic image classification, then probably there is something not so well design in your library (+ bugs). Don't get me wrong, there are lots of good features. Tensorflow is also used to create sota methods, but it is a terrible tool. IMHO it should be easy to use for the easy task and hard to use for the hard task, not hard for both. I hope no offense is taken, but because you are the creator you are not so willing to hear :) Thank you :) Bests, P.S. You can close the issue now |
🚀 Feature
Dear all,
Thank you for this amazing library. However, there are some features that leads to avoidable boilerplate code.
Let's take an example
Why I have to write all the
*_step
methodsMost of the time you want to use the same loss and return the same metrics you use in training_step.
We should have a
.step
the method that, if one of the*_step
methods are not implemented, it will be used instead, this will remove most of the boilerplate code.Why I have to manually compute the average metrics for each epoch.
I was expecting Pytorch Lighting to compute the average of each metric by itself as all other libraries do. Is there a specific reason for not doing it?
Thank you,
Best Regards,
Francesco
The text was updated successfully, but these errors were encountered: