-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Streaming Zipformer-Transducer recipe for KsponSpeech #1651
Add Streaming Zipformer-Transducer recipe for KsponSpeech #1651
Conversation
Number of model parameters: 79,022,891, i.e., 79.02 M | ||
|
||
##### Training on KsponSpeech (with MUSAN) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you upload the models to huggingface and put their links here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a symlink if it is copied from the librispeech recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a symlink if it is copied from the librispeech recipe?
Fix it in ed9cc83
Hi @whsqkaak, I am currently looking into Korean ASR datasets. Could you please share some information about the commonly used Korean ASR training datasets and popular benchmarks like CommonVoice and FLEURS? Thanks in advance. |
Thank you for your contribution! |
Hi, @yfyeung! Open source Korean ASR datasets are very rare. KsponSpeech is publicly available on an open data hub site of the Korea government. But as far i know, only koreans can use it. zeroth-korean dataset is free to use. But the dataset is very small. There are 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). You can find some korean ASR datasets in Common Voice and FLEURS. |
@whsqkaak only streaming model? can we have non-streaming model? |
@whsqkaak korean non-streaming zipformer2 model like this one (thai) is what I want. |
我们都需要!欢迎开源 |
军锅 球球不回我 github回的这么快 |
KsponSpeech is a large-scale spontaneous speech corpus of Korean.
This corpus contains 969 hours of open-domain dialog utterances,
spoken by about 2,000 native Korean speakers in a clean environment.
All data were constructed by recording the dialogue of two people
freely conversing on a variety of topics and manually transcribing the utterances.
The transcription provides a dual transcription consisting of orthography and pronunciation,
and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments.
The original audio data has a pcm extension.
During preprocessing, it is converted into a file in the flac extension and saved anew.
KsponSpeech is publicly available on an open data hub site of the Korea government.
The dataset must be downloaded manually.
For more details, please visit:
Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)
Number of model parameters: 79,022,891, i.e., 79.02 M
Training on KsponSpeech (with MUSAN)
The CERs are:
Note:
simulated streaming
indicates feeding full utterance during decoding usingdecode.py
,while
chunk-size
indicates feeding certain number of frames at each time usingstreaming_decode.py
.