You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Working with cml_tts_dataset_french_v0.1, which has 110k sentences/audio files, the ValidateWavsStep takes a surprisingly long time to run without providing any feedback to the user that it's doing anything. Noticed by @roedoejet while testing #464
Agreed with AP: we'll sample 100 wav files randomly to check.
The main use case for this feature was that you gave the parent or child directory of the intended directory. and that'll get caught with the first file.
A second use case is if you added some data to the file list and forgot to add it to your wavs dir. If you're missing >10% of the wav files, a 100 wav file sample will find one missing file with a very high probability. If you're missing <1% of the wav files, we probably don't actually care, you'll just have a warning in the preprocessing logs, should you happen to look at them.
I'll run some tests, and increase 100 to something bigger if it's fast enough: my goal is that the validation delay should not be too noticeable, yet we're likely to catch problems we care about.
The text was updated successfully, but these errors were encountered:
For a large corpus, e.g., our 110k French sentence corpus, checking for the
presence of all audio files takes a long time and is pointless. So check only a
sample of 1000 when there are more than 1000.
Fixes#466
Working with
cml_tts_dataset_french_v0.1
, which has 110k sentences/audio files, the ValidateWavsStep takes a surprisingly long time to run without providing any feedback to the user that it's doing anything. Noticed by @roedoejet while testing #464Agreed with AP: we'll sample 100 wav files randomly to check.
The text was updated successfully, but these errors were encountered: