About recognized text based on HCLr.fst, Gr.fst #1661

donaldos · 2024-11-20T00:41:55Z

Dear Nickolay V. Shmyrev

I have been using the vosk api to generate and recognise dynamic grammars well.
In particular, I have been testing a lot with a customised engine configuration by generating HCLr.fst and Gr.fst using compile-graph.sh. The model I used was based on the final.mdl file of ‘vosk-model-small-en-us-0.15’ and generated the final file.
For the acoustic model upgrade, we used the final.mdl of vosk-model-en-us-0.22 to generate HCLr.fst and Gr.fst.

INFO (2024-11-20 09:25:06,800:main): [EnumaeduSREngineFile.py:57] - proc_sr() - [‘sit down please’, ‘[unk]’]
LOG (VoskAPI:UpdateGrammarFst():recogniser.cc:287) [‘sit down please’, ‘[unk]’]
LOG (VoskAPI:Estimate():language_model.cc:142) Estimating language model with ngram-order=2, discount=0.5
LOG (VoskAPI:OutputToFst():language_model.cc:209) Created language model with 5 states and 10 arcs.
As you can see in the log, when you utter the phrase ‘sit down please’, it recognises only [{‘start’: 0.570133, ‘end’: 1.409742, ‘word’: ‘down’, ‘conf’: 1.0}].
However, if you use the full resource vosk-model-en-us-0.22-lgraph, which is the example, you will not encounter any problems.

What could be the cause of this, and is there a methodology to validate it?

and Where do I need to copy resources from?
"exp>tdnn>lgraph" or "exp>tdnn>lgraph_orig"

donaldos · 2024-11-20T00:43:23Z

Directory structure

.
├── am
│ ├── final.mdl
│ └── tree
├── conf
│ ├── mfcc.conf
│ └── model.conf
├── graph
│ ├── Gr.fst
│ ├── HCLr.fst
│ ├── disambig_tid.int
│ ├── phones
│ │ ├── align_lexicon.int
│ │ ├── align_lexicon.txt
│ │ ├── disambig.int
│ │ ├── disambig.txt
│ │ ├── optional_silence.csl
│ │ ├── optional_silence.int
│ │ ├── optional_silence.txt
│ │ ├── silence.csl
│ │ ├── word_boundary.int
│ │ └── word_boundary.txt
│ ├── phones.txt
│ └── words.txt
└── ivector
├── final.dubm
├── final.ie
├── final.mat
├── global_cmvn.stats
├── online_cmvn.conf
└── splice.conf

nshmyrev · 2024-11-21T00:16:48Z

Probably pronunciation issue. Please provide an audio sample.

donaldos · 2024-11-21T06:22:35Z

Can I forward speech data and models via email?
nshmyrev@gmail.com

nshmyrev · 2024-11-22T00:39:30Z

Sure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About recognized text based on HCLr.fst, Gr.fst #1661

About recognized text based on HCLr.fst, Gr.fst #1661

donaldos commented Nov 20, 2024 •

edited

Loading

donaldos commented Nov 20, 2024

nshmyrev commented Nov 21, 2024

donaldos commented Nov 21, 2024 •

edited

Loading

nshmyrev commented Nov 22, 2024

About recognized text based on HCLr.fst, Gr.fst #1661

About recognized text based on HCLr.fst, Gr.fst #1661

Comments

donaldos commented Nov 20, 2024 • edited Loading

donaldos commented Nov 20, 2024

nshmyrev commented Nov 21, 2024

donaldos commented Nov 21, 2024 • edited Loading

nshmyrev commented Nov 22, 2024

donaldos commented Nov 20, 2024 •

edited

Loading

donaldos commented Nov 21, 2024 •

edited

Loading