Publications

Here, we list a collection of research articles that utilize the NeMo Toolkit. If you would like to include your paper in this collection, please submit a PR updating this document.

Automatic Speech Recognition (ASR)

2023

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

2022

Multi-blank Transducers for Speech Recognition

2021

Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
CarneliNet: Neural Mixture Model for Automatic Speech Recognition
CTC Variations Through New WFST Topologies
A Toolbox for Construction and Analysis of Speech Datasets

2020

Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition
Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model
Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech Recognition

2019

Jasper: An End-to-End Convolutional Neural Acoustic Model
QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

Speaker Recognition (SpkR)

2022

TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context

2020

SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification

Speech Classification

2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

2021

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection

2020

MatchboxNet - 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition

Speech Translation

2022

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022

Natural Language Processing (NLP)

Language Modeling

2022

Evaluating Parameter Efficient Learning for Generation
Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models

2021

BioMegatron: Larger Biomedical Domain Language Model

Neural Machine Translation

2022

Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

2021

NVIDIA NeMo Neural Machine Translatio Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

Dialogue State Tracking

2021

SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services

2020

A Fast and Robust BERT-based Dialogue State Tracker for Schema-Guided Dialogue Dataset

--------

Text To Speech (TTS)

2022

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

2021

TalkNet: Fully-Convolutional Non-Autoregressive Speech Synthesis Model
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Hi-Fi Multi-Speaker English TTS Dataset
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings

(Inverse) Text Normalization

2022

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

2021

NeMo Inverse Text Normalization: From Development to Production
A Unified Transformer-based Framework for Duplex Text Normalization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PUBLICATIONS.md

PUBLICATIONS.md

Publications

Automatic Speech Recognition (ASR)

Speaker Recognition (SpkR)

Speech Classification

Speech Translation

Natural Language Processing (NLP)

Language Modeling

Neural Machine Translation

Dialogue State Tracking

Text To Speech (TTS)

(Inverse) Text Normalization

Files

PUBLICATIONS.md

Latest commit

History

PUBLICATIONS.md

File metadata and controls

Publications

Automatic Speech Recognition (ASR)

Speaker Recognition (SpkR)

Speech Classification

Speech Translation

Natural Language Processing (NLP)

Language Modeling

Neural Machine Translation

Dialogue State Tracking

Text To Speech (TTS)

(Inverse) Text Normalization