Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inverse text normalization for non-streaming ASR #1017

Merged
merged 3 commits into from
Jun 17, 2024

Conversation

csukuangfj
Copy link
Collaborator

Usage example

Download the test model

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
tar xvf sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2
rm sherpa-onnx-paraformer-zh-2023-03-28.tar.bz2

Download rule fst

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/itn_zh_number.fst

Please refer to
https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/itn_zh_number.ipynb
for how itn_zh_number.fst is generated.

Download test wav

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/itn-zh-number.wav

Run

python3 ./python-api-examples/inverse-text-normalization-offline-asr.py

The output is given below:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/offline-recognizer-impl.cc:ApplyInverseTextNormalization:428 After inverse text normalizing: 中文数字转阿拉伯数字的测试365天
./itn-zh-number.wav
{"text": "中文数字转阿拉伯数字的测试365天", "timestamps": [], "tokens":["中", "文", "数", "字", "转", "阿", "拉", "伯", "数", "字", "的", "测", "试", "三", "百", "六", "十", "五", "天"], "words": []}

@csukuangfj csukuangfj merged commit b0f7ed3 into k2-fsa:master Jun 17, 2024
154 of 207 checks passed
@csukuangfj csukuangfj deleted the itn branch June 17, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant