Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About recognized text based on HCLr.fst, Gr.fst #1661

Open
donaldos opened this issue Nov 20, 2024 · 4 comments
Open

About recognized text based on HCLr.fst, Gr.fst #1661

donaldos opened this issue Nov 20, 2024 · 4 comments

Comments

@donaldos
Copy link

donaldos commented Nov 20, 2024

Dear Nickolay V. Shmyrev

I have been using the vosk api to generate and recognise dynamic grammars well.
In particular, I have been testing a lot with a customised engine configuration by generating HCLr.fst and Gr.fst using compile-graph.sh. The model I used was based on the final.mdl file of ‘vosk-model-small-en-us-0.15’ and generated the final file.
For the acoustic model upgrade, we used the final.mdl of vosk-model-en-us-0.22 to generate HCLr.fst and Gr.fst.

INFO (2024-11-20 09:25:06,800:main): [EnumaeduSREngineFile.py:57] - proc_sr() - [‘sit down please’, ‘[unk]’]
LOG (VoskAPI:UpdateGrammarFst():recogniser.cc:287) [‘sit down please’, ‘[unk]’]
LOG (VoskAPI:Estimate():language_model.cc:142) Estimating language model with ngram-order=2, discount=0.5
LOG (VoskAPI:OutputToFst():language_model.cc:209) Created language model with 5 states and 10 arcs.
As you can see in the log, when you utter the phrase ‘sit down please’, it recognises only [{‘start’: 0.570133, ‘end’: 1.409742, ‘word’: ‘down’, ‘conf’: 1.0}].
However, if you use the full resource vosk-model-en-us-0.22-lgraph, which is the example, you will not encounter any problems.

What could be the cause of this, and is there a methodology to validate it?

and Where do I need to copy resources from?
"exp>tdnn>lgraph" or "exp>tdnn>lgraph_orig"

@donaldos
Copy link
Author

Directory structure

.
├── am
│ ├── final.mdl
│ └── tree
├── conf
│ ├── mfcc.conf
│ └── model.conf
├── graph
│ ├── Gr.fst
│ ├── HCLr.fst
│ ├── disambig_tid.int
│ ├── phones
│ │ ├── align_lexicon.int
│ │ ├── align_lexicon.txt
│ │ ├── disambig.int
│ │ ├── disambig.txt
│ │ ├── optional_silence.csl
│ │ ├── optional_silence.int
│ │ ├── optional_silence.txt
│ │ ├── silence.csl
│ │ ├── word_boundary.int
│ │ └── word_boundary.txt
│ ├── phones.txt
│ └── words.txt
└── ivector
├── final.dubm
├── final.ie
├── final.mat
├── global_cmvn.stats
├── online_cmvn.conf
└── splice.conf

@nshmyrev
Copy link
Collaborator

Probably pronunciation issue. Please provide an audio sample.

@donaldos
Copy link
Author

donaldos commented Nov 21, 2024

Can I forward speech data and models via email?
nshmyrev@gmail.com

@nshmyrev
Copy link
Collaborator

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants