This repository contains our code of our publication at interspeech 2022
Li, Xinjian, et al. "ASR2K: Speech Recognition for Around 2000 Languages without Audio" Interspeech 2022. 2022
We plan to release ASR models for 2k languages (currently 1909 languages). The architecture is as follows:
See README in egs/commonvoice
for a simple recipe example
Once you trained a model or download one of our pretrained model (not available yet).
You should be able to run it using python as follows
In [1]: from import read_app
In [2]: app = read_app('eng', './data')
In [3]: app.predict('utt.wav')
or run inference from bash
python -m --lang=eng --lang_dir=./data --input=./test --output=./test
To train a ASR2K model, you need the following packages:
# k2
# in my env, it is the following
pip install k2==1.24.4.dev20240223+cpu.torch1.13.1 -f
# lhotse
pip install git+
# icefall
git clone
cd icefall
pip install -r requirements.txt
# srilm
# download from
# follow the instruction in the INSTALL file in the package
# in my env, they are
# - tar -xvzf srilm-1.7.3.tar.gz
# - set SRILM variable in Makefile
# - make
# - add bin/i686-m64 to your PATH
# make sure ngram-count is available in your env
# ASR2k
# install this package
pip install -e .
author={Xinjian Li and Florian Metze and David R. Mortensen and Alan W Black and Shinji Watanabe},
title={{ASR2K: Speech Recognition for Around 2000 Languages without Audio}},
booktitle={Proc. Interspeech 2022},