-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow validation after N training steps #461
Conversation
2a0c775
to
f6fc7a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this Sam - that was quick! I find the difference between check_val_every_n_epoch
and val_check_interval
a bit confusing. Looking at this documentation I think we need to account for the scenario when the value of val_check_interval
is greater than the number of steps in an epoch. In that scenario, check_val_every_n_epoch
should be None
. In fact, I think that's what our default should be: change check_val_every_n_epoch
to None
and then val_check_interval
should be set to say 1000
. What do you think?
@roedoejet , I also had to read We could also rename I think it is valuable to know what performance we are getting when we have trained on exactly all examples the same number of times aka at the end of an epoch. But, there is the danger that the user sets
I'll set But I think |
f6fc7a6
to
2165727
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just said 1000 because that's what HiFiGAN and FastSpeech2 use by default, but yes, I think 500 is fine. I updated the schemas which is needed to pass CI
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #461 +/- ##
=======================================
Coverage 74.15% 74.15%
=======================================
Files 44 44
Lines 2766 2767 +1
Branches 428 428
=======================================
+ Hits 2051 2052 +1
Misses 630 630
Partials 85 85 ☔ View full report in Codecov by Sentry. |
TODO
val_check_interval
available since 2.0.0 which the version that EveryVoice currently requires?PR Goal?
Change to save checkpoint and run validation on N steps instead of epochs.
Fixes?
Fixes: #204
Feedback sought?
Merge request.
Priority?
Part of alpha so medium/high.
Tests added?
No
How to test?
Command
We can see a checkpoint every 9 steps
Taking
pitch_loss
as an example, we can see that a validation occurred every 9 training steps.Confidence?
Good
Version change?
Don't think so because if
config/everyvoice-text-to-spec.yaml
is missing, thentraining.val_check_interval
will default to 1.0 just like before.Related PRs?