-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add recipe for the yes_no dataset. #16
Conversation
The code for selecting the training set and test set can be found in wave_files = list(corpus_dir.glob("*.wav"))
assert len(wave_files) == 60
wave_files.sort()
train_set = wave_files[::2]
test_set = wave_files[1::2]
assert len(train_set) == 30
assert len(test_set) == 30 |
Cool! |
|
||
first_token_disambig_id = lexicon.token_table["#0"] | ||
first_word_disambig_id = lexicon.word_table["#0"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to make the following k2 operations run on GPU if there are devices available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the yesno dataset, the graphs are tiny. It's ok to run them on CPU.
For the librispeech dataset, I think it's worthwhile to have some benchmarks. If GPU is faster, we can switch to it.
|
||
""" | ||
This file computes fbank features of the yesno dataset. | ||
Its looks for manifests in the directory data/manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its -> It ?
shuffle=self.args.shuffle, | ||
num_buckets=self.args.num_buckets, | ||
bucket_method="equal_duration", | ||
drop_last=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need make these two arguments configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will make it configurable.
I just wrote a Colab notebook to run the yesno recipe, with CPU. The training time for 50 epochs is within 2 minutes (with CPU). You will see the following in the above Colab notebook:
Part of the training log is given below:
The decoding log is:
|
Could you have a look at the above Colab notebook about the installation of lhotse? The [EDITED]: If I don't, it throws the following error while running
|
Yes I’ll have a look tomorrow. |
There are 60 sound files in the dataset. 30 sound files are used for training and the other 30 files are used for testing.
The decoding log is below:
You see there is only 1 deletion error.
The dataset is so small that it can run on the CPU.
It is useful for education and demonstration purposes as it involves almost all concepts used in the training and decoding, i.e.,
(It does not contain LM rescoring)
Requires lhotse-speech/lhotse#380
--
TODOs:
Use a colab notebook to run itSee