-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partially missing training_step outputs in training_epoch_end #2320
Comments
I dig the PL training_loop, may be I found a possible issue. In the
It seems to me that the Moreover, in the run_training_epoch method, it seems to me that only the last iteration batch_output is passed to the
@williamFalcon @Borda could you please give a look into it? |
I debug a bit I think the problem is with the multi optimizer only the last iteration (split batch and optimizer iteration) output is passed. If you can confirm this problem I can resolve and request a pull request. Thanks |
@mmiakashs can you check that this works for you now on master? |
@williamFalcon Thanks for the update. Sorry I missed that notification. I will check this after my deadline next week :) |
Learning Model: the model consists of two branches: teacher and student. Two losses have been used to train two branches of the learning model using two optimizers.
ISSUE: In the
training_step
, I produced two set of metrics outputs for teacher (optimizer_idx=0) and student (optimizer_idx=1) branches (teacher:{loss,log:{acc, f1}}
and student:{loss,log:{acc,f1,precision}}
). But in thetraining_epoch_end
, I only got the combined outputs for only student outputs (optimizer_idx=1). All the teacher outputs (optimizer_idx=0) are missing.I also see the training loop of PL and didn't observe any issue when the training loop tries to combine the
training_step
outputs. I am not sure what am I missing?The text was updated successfully, but these errors were encountered: