-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: Implement token level shallow fusion #609
Conversation
@csukuangfj Look very promising. Ping me if you need an extra hand |
Thanks! I will draft a version without batch size support. If it gives promising results, we need your help to implement a version that supports batches. |
@csukuangfj Do you have any update on this issue ? I am very eager to try it out ! |
Yes. But the results are not good so far. I will post them tonight. |
Steps for reproducing the following results: cd egs/librispeech/ASR
git lfs install
git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
mkdir tmp3-3
cd tmp3-3
ln -s $PWD/../https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt epoch-99.pt
cd ..
./generate-lm.sh
for lm_scale in 0.01 0.2 0.4 ; do
./lstm_transducer_stateless2/decode.py \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--exp-dir ./tmp3-3 \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024 \
--decoding-method modified_beam_search2 \
--beam 8 \
--max-contexts 4 \
--ngram-lm-scale $lm_scale
done You will find the results inside
I am using a I will recheck the code in case it contains some bugs. |
@csukuangfj Thanks ! |
I expect that unless there is some kind of domain mismatch, we will not see much or any improvement. (Unless we try super-large LMs. I seem to remember Liyong had some experiment with a 5-gram or something like that?) |
I think Liyong was using fast_beam_search + (L, or LG) in #472 We have never tried to use a token-level G with modified beam search, I think. |
My 2cts is that we need a very large LM (like 5gram). I will try it tomorrow and let you know |
@glynpu Liyoug did try using a token-level G with beam search, he did not make a PR though, the results are in our weekly meeting notes (the 20th week), as the follows: The results show that we can not get improvement from a pruned LM. |
The results came from a word level LM. |
@csukuangfj Quick update : I am still doing some tests and do a more thorough review of the code. |
Ngram : 5
Ngram : 5
|
Thanks! Are you using |
I am using |
@csukuangfj |
I think the main use-case of this is when there is a domain mismatch from the training corpus to the target domain. |
Sorry for the late replay. I though I have replied last night. I think 7gram is more than enough. Thanks for your experiments. The result shows that the code works with an n-gram LM, though |
I agree, I think 5gram is enough. I was thinking to use it for detecting OOV words. I will let you know once I have more results. (except if you have something in mind) |
By the way, @marcoyang1998 is using the RNN-LM model that you provided for conformer CTC for shallow fusion |
Sounds interesting ! If I am not mistaken, we can't add new word on the fly to an already trained RNN-LM isn't it ? |
The RNN-LM is at token level, so as long as the new word can be represented by the bpe tokens, it can be rescored by the RNN-LM, I think. |
Indeed, but we can't "boost" specific words (or combination of specific tokens) |
Yes, you are right. That is why we are trying to integrate FST into decoding. |
@csukuangfj I have a batch version (à la modified_beam_search), I took your commits and added mine on top of it (with a rebase), I will create a new PR if that's ok |
Yes, thanks! I will close this PR once you create a new PR. |
See #630 |
We have been trying to use word-level G and LG for RNN-T decoding, but we have only tried this for fast_beam_search. However, using a word-level G or an LG cannot handle OOV words.
This PR tries to use a token-level G for shallow fusion with modified_beam_search. I am using OpenFst to manipulate the n-gram G on the CPU as it is easier to implement.