Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running with public dataset #2

Open
matthijsvk opened this issue Jan 27, 2021 · 4 comments
Open

running with public dataset #2

matthijsvk opened this issue Jan 27, 2021 · 4 comments

Comments

@matthijsvk
Copy link

Hi,
I saw you contributed the CTC loss function in [https://github.com/FluxML/Flux.jl/pull/1287]. Thanks for all that work :).
There you mentioned you had an example with a publicly available speech corpus.
Which one was that and would you be willing to upload the code for that?

@maetshju
Copy link
Owner

It was a subset of the Massive Auditory Lexical Decision database, which the lab I work in released in 2019. The full data set is over 3 hours, I think, of isolated English words recorded by a single young male speaker, in addition to nearly 10,000 recorded fake English words. I had switched away from it to TIMIT during testing because I didn't have an a priori idea of what the accuracy level should be to determine if the CTC loss function was working correctly.

The code is actually in this repo already here. The 00-data.jl script should download and extract the subset I used, which was 10,000 (about a third) of the real English words. The MFCCs have already been extracted, but the labels are given in onehot matrix form because I had originally used this subset for cross entropy. The model code in 01-model.jl is a bit oudated and expects the onehot matrix, while the final version of the CTC loss function committed expects a onecold vector.

If you are interested in the full data set, it is available here. The transcriptions are given as TextGrid files to use with the Praat program. If you don't already have a library to process the files to extract the transcriptions, you may want to use the textgrid(https://pypi.org/project/TextGrid/) Python library.

At some point, I may try to update the code and possibly submit it to the Flux model zoo, but our semester started recently, so I'm low on spare time for a while. Let me know if you have any questions though!

@maetshju
Copy link
Owner

maetshju commented Jan 27, 2021

Oh, the actual model file is missing. Well, let me see if I can track that down. I will see if I can update it now anyway.

@maetshju
Copy link
Owner

@matthijsvk I have been able to re-create the code I was using for these demos and put it here. The data set is a bit funky for CTC because of it being onehot encoded. I am planning to make something for the model zoo, where I will re-extract the input and output values to be more appropriate to a CTC-style recognition system. Hopefully the stuff in this repo is still somewhat helpful until then.

@matthijsvk
Copy link
Author

Great, thanks!
I think it would be awesome to have an easy-to-reproduce example of real-life application like speech recognition with RNNs in Julia. The model zoo contains mostly toy examples so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants