Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is GOPT designed to take only complete words and sentences? What about a phoneme? #5

Open
TheoSeo93 opened this issue Aug 24, 2022 · 1 comment
Labels
question Further information is requested

Comments

@TheoSeo93
Copy link

TheoSeo93 commented Aug 24, 2022

Hi again :) Thanks to your response, I was able to replicate the whole process of pronunciation assessment.
I became curious whether this gopt can provide the accuracy of a phoneme separately without giving a complete word or sentence. So I understand this can take a sentence or a word to provide the accuracy of the level deep down to the phonemes, but can it still generate the accuracy of phoneme, given that I manually pass a text like "A" along with the correct phonetic transcription of "AH" ?
I wasn't so sure if this is designed to take only complete words or sentences, or it can take singular phoneme separately.

Thank you very much
Best regards
Theo Seo

@TheoSeo93 TheoSeo93 changed the title Is GOPT designed to take only complete words or sentences? Is GOPT designed to take only complete words and sentences? What about a phoneme? Aug 24, 2022
@YuanGongND
Copy link
Owner

YuanGongND commented Aug 24, 2022

Yes - I think the main reason why GOPT outperforms the baseline is it takes context into consideration, i.e., it takes input sequence longer than a single phone. The input sequence, however, doesn't have to be a full sentence, it can be a word or a phrase.

If you are interested in context-independent single-phone classification, that is our baseline, and is implemented in the original Kaldi/gop recipe, which I believe is an implementation of the following paper.

Hu, W., Qian, Y., Soong, F. K., & Wang, Y. (2015). Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 67(January), 154-166.

-Yuan

@YuanGongND YuanGongND added the question Further information is requested label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants