-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[recipe] LibriSpeech zipformer_ctc #941
Conversation
@desh2608 Not sure but it looks like your G is a token ngram while rescore_with_whole_lattice is expecting a word ngram, could it be possible ? |
Ahh, of course. I should pass words.txt for the symbol table. Thanks! Update: Actually, on looking at my command history, I see that I did use words.txt (not tokens.txt) to create G.fst.txt. |
It turns out that I had the wrong G.pt in my lang directory, so the correct G_4_gram.fst.txt was not being used. Here are the steps in case someone is interested.
python -m kaldilm --read-symbol-table="data/lang_bpe_500/words.txt" --disambig-symbol="#0" --max-order=3 data/lm/3-gram.pruned.1e-7.arpa > data/lm/G_3_gram.fst.txt
python -m kaldilm --read-symbol-table="data/lang_bpe_500/words.txt" --disambig-symbol="#0" --max-order=4 data/lm/4-gram.arpa > data/lm/G_4_gram.fst.txt
Now run |
@csukuangfj please review when you have some time. |
|
||
| decoding method | test-clean | test-other | comment | | ||
|-------------------------|------------|------------|---------------------| | ||
| ctc-decoding | 2.50 | 5.86 | --epoch 30 --avg 9 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also post the result for HLG decoding, i.e., one-best decoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting the following WERs for 1best:
| 1best | 2.01 | 4.61 | --epoch 30 --avg 9 |
This seems much better than other decoding methods. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is strange that 1best (HLG) is better than whole-lattice-rescoring (HLG + 4-gram G).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking the same. I'll verify the numbers again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@desh2608 It seems that you don't have a parameter to adjust the scale of the HLG decoding graph. Could you please add this parameter like here:
icefall/egs/librispeech/ASR/conformer_ctc3/decode.py
Lines 250 to 254 in 05e7435
parser.add_argument( | |
"--hlg-scale", | |
type=float, | |
default=0.8, | |
help="""The scale to be applied to `hlg.scores`. |
I tested your model and I got 2.46/5.36 with
hlg_scale=0.5
for 1best decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking the same. I'll verify the numbers again.
Are you able to reproduce it, i.e., WER for test clean = 2.01 ?
@desh2608
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I did not find time to check it. Let me try to do it this week.
@MarcoYang thanks for the pointer. I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW something else that is different in this recipe compared to other LibriSpeech recipes is that I keep cuts shorter than 25s (instead of 20s), to avoid throwing away more data. With the quadratic_duration
option in DynamicBucketingSampler, this seems to be working fine (I could train on V100 with batch size 800).
Address comments from @csukuangfj
Hi, |
hi, you can have a look at its huggingface repo where desh has the
pre-trained model uploaded
…On Fri, Oct 27, 2023 at 8:03 PM armusc ***@***.***> wrote:
Hi,
looking at this conversation after the merge, were those numbers from
1best decoding then confirmed?
thanks
—
Reply to this email directly, view it on GitHub
<#941 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOON42HMJJGUMWHKRY6BZDTYBOPIZAVCNFSM6AAAAAAVVWL7V2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSG44TQMRRGE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I trained a zipformer based CTC model (with aux. attention head) on LibriSpeech. The following are results on test clean/other.
Tensorboard: https://tensorboard.dev/experiment/IjPSJjHOQFKPYA5Z0Vf8wg
Pretrained model: https://huggingface.co/desh2608/icefall-asr-librispeech-zipformer-ctc
SOLVED
I am having some trouble with the other decoding methods. I created
G.fst.txt
by first downloading the 4-gram.arpa.gz file, unzipping it, and then running the following:The
G.pt
should get created insidedecode.py
. But during decoding, I get the following AssertionError:I am guessing I did something wrong in creating G.pt. I would appreciate if someone can help with this.