Speech Recognition Papers

Speech Recognition Papers

Speech Recognition Papers

List of hot directions in industrial speech recognition, i.e., Streaming ASR (RNA-based || RNN-T based || Attention based || unified streaming/non-streaming) / Non-autoregressive ASR ...

If you are interested in this repo, any pull request is welcomed.

Streaming ASR

Non-autoregressive (NAR) ASR

MASK-Predict: Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition (arXiv 2019)
Imputer: Imputer: Sequence modelling via imputation and dynamic programming (arXiv 2020)
Insertion-based: Insertion-Based Modeling for End-to-End Automatic Speech Recognition (arXiv 2020)
MASK-CTC: Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict (Interspeech 2020)
Spike Triggered: Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition (Interspeech 2020)
Similar to MASK-Predict: Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition (Interspeech 2020)
Improved MASK-CTC: Improved Mask-CTC for Non-Autoregressive End-to-End ASR (arXiv 2020, submitted to ICASSP 2021)
Refine CTC Alignments over Latent Space: Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment (arXiv 2020)
Also Refine CTC Alignments over Latent Space: CASS-NAT: CTC Alignment-based Single Step Non-autoregressive Transformer for Speech Recognition (arXiv 2020, submitted to ICASSP 2021)
Refine CTC Alignments over Output Space: Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input (arXiv 2020, submitted to ICASSP 2021)

ASR Rescoring / Spelling Correction (2-pass decoding)

Review: Automatic Speech Recognition Errors Detection and Correction: A Review (N/A)
LAS based: A Spelling Correction Model For E2E Speech Recognition (ICASSP 2019)
Transformer based: An Empirical Study Of Efficient ASR Rescoring With Transformers (arXiv 2019)
Transformer based: Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition (Interspeech 2019)
Transformer based: Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model (ICASSP 2020)
BERT based: Effective Sentence Scoring Method Using BERT for Speech Recognition (ACML 2019)
BERT based: Spelling Error Correction with Soft-Masked BERT (ACL 2020)
Parallel Rescoring: Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition (Interspeech 2020)

On-device ASR

Review: A review of on-device fully neural end-to-end automatic speech recognition algorithms (arXiv 2020)
Lightweight Low-Rank transformer: Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer (ICASSP 2020)
Attention replacement: How Much Self-Attention Do We Need ƒ Trading Attention for Feed-Forward Layers (ICASSP 2020)
Lightweight transducer with WFST based decoding: Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices (ICASSP 2021)
Cascade transducer: Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter (SLT 2021)

Noisy Student Training(Self Training)

Self training with filtering and ensembles: Self-training for end-to-end speech recognition (ICASSP 2020)
Improved Noisy Student Training by gradational filtering: Improved Noisy Student Training for Automatic Speech Recognition (Interspeech 2020)

Self Supervised Learning(SSL)

APC(Autoregressive Predictive Coding)

An Unsupervised Autoregressive Model for Speech Representation Learning (Interspeech 2019)
Generative Pre-Training for Speech with Autoregressive Predictive Coding (ICASSP 2020)

CPC(Contrastive Predictive Coding)

wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations (Baevski et al., 2019)
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition Papers

Streaming ASR

RNA based

RNN-T based

Attention based

Unified Streaming/Non-streaming models

Non-autoregressive (NAR) ASR

ASR Rescoring / Spelling Correction (2-pass decoding)

On-device ASR

Noisy Student Training(Self Training)

Self Supervised Learning(SSL)

APC(Autoregressive Predictive Coding)

CPC(Contrastive Predictive Coding)

About

Releases

Packages

Contributors 4

License

wenet-e2e/speech-recognition-papers

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition Papers

Streaming ASR

RNA based

RNN-T based

Attention based

Unified Streaming/Non-streaming models

Non-autoregressive (NAR) ASR

ASR Rescoring / Spelling Correction (2-pass decoding)

On-device ASR

Noisy Student Training(Self Training)

Self Supervised Learning(SSL)

APC(Autoregressive Predictive Coding)

CPC(Contrastive Predictive Coding)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages