Skip to content

dqqcasia/awesome-speech-translation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

A Paper List for Speech Translation

🤗 We are hiring interns and full-time employees researching on speech translation, please contact me at dongqianqian@bytedance.com

This is a paper list for speech translation.

Keyword: Speech Translation, Spoken Language Processing, Natural Language Processing

Tutorials and Surveys

  • Jan Niehues. Spoken Language Translation, InterSpeech-2019, [video]
  • Matthias Sperber and Matthias Paulik. Speech Translation and the End-to-End Promise:Taking Stock of Where We Are, ACL-2020 theme track, [paper]
  • Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, and Jörg Tiedemann. Multimodal Machine Translation through Visuals and Speech, Machine Translation journal-2020 (Springer), [paper]
  • Jan Niehues, Elizabeth Salesky, Marco Turchi, Matteo Negri. Speech Translation Tutorial, EACL-2021, [link], [slides]

Codebase

  • ESPnet-ST: All-in-One Speech Translation Toolkit, ACL-2020 Demo, [paper], [code]
  • FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ, AACL-2020 demo, [paper], [code]
  • NeurST: Neural Speech Translation Toolkit, Arxiv-2020, [paper], [code]

Dataset

  • Construction and Utilization of Bilingual Speech Corpus for Simultaneous Machine Interpretation Research, InterSpeech-2005,[paper]
  • Approach to Corpus-based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus), MuTra-2005, [paper]
  • Automatic Translation from Parallel Speech: Simultaneous Interpretation as MT Training Data, ASRU-2009, [paper]
  • The KIT Lecture Corpus for Speech Translation, LREC-2012, [paper]
  • Improved Speech-to-Text Translation with the Fisher and Callhome Spanish–English Speech Translation Corpus, IWSLT-2013, [paper]
  • Collection of a Simultaneous Translation Corpus for Comparative Analysis, LREC-2014, [paper]
  • Microsoft Speech Language Translation (MSLT) Corpus: The IWSLT 2016 release for English, French and German, IWSLT-2016, [paper]
  • The Microsoft Speech Language Translation (MSLT) Corpus for Chinese and Japanese: Conversational Test data for Machine Translation and Speech Recognition, Machine_Translation-2017, [paper]
  • Amharic-English Speech Translation in Tourism Domain, SCNLP-2017, [paper]
  • A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiment, LREC-2018, [paper]
  • Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation, LREC-2018, [paper]
  • A Small Griko-Italian Speech Translation Corpus, SLTU-2019, [paper]
  • MuST-C: a Multilingual Speech Translation Corpus, NAACL-2019, [paper]
  • MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible, Arxiv-2019, [paper]
  • How2: A Large-scale Dataset for Multimodal Language Understanding, NIPS-2018, [paper]
  • LibriVoxDeEn: A Corpus for German-to-English Speech Translation and Speech Recognition, LREC-2020, [paper]
  • Clotho: An Audio Captioning Dataset, Arxiv-2019, [paper]
  • Europarl-St: A Multilingual Corpus For Speech Translation Of Parliamentary Debates, ICASSP-2020, [paper]
  • CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus, Arxiv-2020, [paper]
  • MuST-Cinema: a Speech-to-Subtitles corpus, Arxiv-2020, [paper]
  • CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus, Arxiv-2020, [paper], [code]
  • The Multilingual TEDx Corpus for Speech Recognition and Translation, Arxiv-2021, [paper]
  • mintzai-ST: Corpus and Baselines for Basque-Spanish Speech Translation,IberSPEECH-2021,[paper]
  • BSTC: A Large-Scale Chinese-English Speech Translation Dataset, Arixv-2021, [paper]
  • MultiSubs: A Large-scale Multimodal and Multilingual Dataset, Arxiv-2021, [paper]
  • Kosp2e: Korean Speech to English Translation Corpus, InterSpeech-2021, [paper]

Paper List

Pipeline ST

  • Phonetically-Oriented Word Error Alignment for Speech Recognition Error Analysis in Speech Translation, ASRU-2015,[paper]
  • Learning a Translation Model from Word Lattices, InterSpeech-2016, [paper]
  • Learning a Lexicon and Translation Model from Phoneme Lattices, EMNLP-2016, [paper]
  • Neural Lattice-to-Sequence Models for Uncertain Inputs, EMNLP-2017, [paper]
  • Using Spoken Word Posterior Features in Neural Machine Translation, IWSLT-2018, [paper]
  • Towards robust neural machine translation, ACL-2018, [paper]
  • Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors, InterSpeech-2019, [paper]
  • Lattice Transformer for Speech Translation, ACL-2019, [paper]
  • Self-Attentional Models for Lattice Inputs, ACL-2019, [paper]
  • Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training, IWSLT-2019, [paper]
  • Neural machine translation with acoustic embedding, ASRU-2019
  • Machine Translation in Pronunciation Space, Arxiv-2020, [paper]
  • Diversity by Phonetics and its Application in Neural Machine Translation, AAAI-2020, [paper]
  • Robust Neural Machine Translation for Clean and Noisy Speech Transcripts, IWSLT-2019, [paper]
  • ELITR Non-Native Speech Translation at IWSLT 2020, IWSLT-2020, [paper]
  • Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines, CLSST@LREC 2020, [paper]
  • Cascaded Models With Cyclic Feedback For Direct Speech Translation, Arxiv-2020, [paper]
  • Sentence Boundary Augmentation For Neural Machine Translation Robustness, Arxiv-2020, [paper]
  • A Technical Report: But Speech Translation Systems, Arxiv-2020, [paper]
  • Direct Segmentation Models for Streaming Speech Translation, EMNLP-2020, [paper]
  • Lost in Interpreting: Speech Translation from Source or Interpreter?, InterSpeech-2021, [paper]
  • Is “moby dick” a Whale or a Bird? Named Entities and Terminology in Speech Translation, EMNLP-2021, [paper]

End-to-end ST

  • Towards Speech Translation of Non Written Languages, IEEE-2006, [paper]
  • Towards speech-to-text translation without speech recognition, EACL-2017, [paper]
  • Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation, NIPS-2016, [paper]
  • An Attentional Model for Speech Translation Without Transcription, NAACL-2016, [paper]
  • An Unsupervised Probability Model for Speech-to-Translation Alignment of Low-Resource Languages, EMNLP-2016, [paper]
  • A Case Study on Using Speech-to-translation Alignments for Language Documentation, ComputEL-2017, [paper]
  • Spoken Term Discovery for Language Documentation Using Translations, SCNLP-2017, [paper]
  • Sequence-to-sequence Models Can Directly Translate Foreign Speech, InterSpeech-2017, [paper]
  • Structured-based Curriculum Learning for End-to-end English-Japanese Speech Translation, InterSpeech-2017, [paper]
  • End-to-End Speech Translation with the Transformer, IberSPEECH-2018, [paper]
  • Towards Fluent Translations from Disfluent Speech, SLT-2018, [paper]
  • Low-resource Speech-to-text Translation, InterSpeech-2018, [paper]
  • End-to-End Automatic Speech Translation of Audiobooks, ICASSP-2018, [paper]
  • Tied Multitask Learning for Neural Speech Translation, NAACL-2018, [paper]
  • Towards Unsupervised Speech to Text Translation, ICASSP-2019, [paper]
  • Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation, ICASSP-2019, [paper]
  • Towards End-to-end Speech-to-text Translation with Two-pass Decoding, ICASSP-2019, [paper]
  • Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation, TACL-2019, [paper]
  • End-to-End Speech Translation with Knowledge Distillation, InterSpeech-2019, [paper]
  • Fluent Translations from Disfluent Speech in End-to-End Speech Translation, NAACL-2019, [paper]
  • Pre-Training On High-Resource Speech Recognition Improves Low-Resource Speech-To-Text Translation, NAACL-2019, [[paper]
  • Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation, ACL-2019, [paper]
  • Leveraging Out-of-Task Data for End-to-End Automatic Speech Translation, Arxiv-2019, [paper]
  • Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation, AAAI-2020, [paper]
  • Adapting Transformer to End-to-end Spoken Language Translation, InterSpeech-2019, [paper]
  • Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages, InterSpeech-2019, [paper]
  • A comparative study on end-to-end speech to text translation, ASRU-2019, [paper]
  • Instance-Based Model Adaptation For Direct Speech Translation, ICASSP-2020, [paper]
  • Analyzing Asr Pretraining For Low-Resource Speech-To-Text Translation, ICASSP-2020, [paper]
  • ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task, IWSLT-2019, [paper]
  • Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade, IWSLT-2019, [paper]
  • Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning, ICASSP-2020, [paper]
  • Enhancing Transformer for End-to-end Speech-to-Text Translation, EAMT-2019, [paper]
  • On Using SpecAugment for End-to-End Speech Translation, IWSLT-2019, [paper]
  • Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding, AAAI-2020, [paper]
  • From Speech-To-Speech Translation To Automatic Dubbing, Arxiv-2020, [paper]
  • Skinaugment: Auto-Encoding Speaker Conversions For Automaticspeech Translation, ICASSP-2020, [paper]
  • Curriculum Pre-training for End-to-End Speech Translation, ACL-2020, [paper]
  • Jointly Trained Transformers models for Spoken Language Translation, Arxiv-2020, [paper]
  • Relative Positional Encoding for Speech Recognition and Direct Translation, Arxiv-2020, [paper]
  • Worse WER, but Better BLEU? Leveraging Word Embedding asIntermediate in Multitask End-to-End Speech Translation, ACL-2020, [paper]
  • Phone Features Improve Speech Translation, ACL-2020, [paper]
  • Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection, Arxiv-2020, [paper]
  • End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020, IWSLT2020, [paper]
  • Self-Training for End-to-End Speech Translation, INTERSPEECH2020 (submitted), [paper]
  • CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning, INTERSPEECH2020 (submitted), [paper]
  • Is 42 the Answer to Everything in Subtitling-oriented Speech Translation?, IWSLT2020, [paper]
  • End-To-End Speech Translation With Self-Contained Vocabulary Manipulation, ICASSP2020
  • End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs, TASLP-2020, [paper]
  • UWSpeech: Speech to Speech Translation for Unwritten Languages, Arxiv-2020, [paper]
  • Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus, ACL-2020, [paper]
  • Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation, INTERSPEECH2020 (submitted), [paper]
  • Self-Supervised Representations Improve End-to-End Speech Translation, Arxiv-2020, [paper]
  • Consistent Transcription and Translation of Speech, TACL-2020, [paper]
  • Contextualized Translation of Automatically Segmented Speech, INTERSPEECH-2020, [paper]
  • On Target Segmentation for Direct Speech Translation, AMTA-2020, [paper]
  • End-to-End Speech Translation with Adversarial Training, WAST-2020, [paper]
  • SDST: Successive Decoding for Speech-to-text Translation, Arxiv-2020, [paper]
  • TED: Triple Supervision Decouples End-to-end Speech-to-text Translation, Arxiv-2020, [paper]
  • Investigating Self-supervised Pre-training for End-to-end Speech Translation, ICML-2020 workshop, [paper], [code]
  • Adaptive Feature Selection for End-to-End Speech Translation, EMNLP2020 Findings, [paper], [code]
  • A General Multi-Task Learning Framework To Leverage Text Data For Speech To Text Tasks, Arxiv-2020, [paper]
  • MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation, Arxiv-2020, [paper]
  • Evaluating Gender Bias In Speech Translation, ICASSP-2021 (submitted), [paper]
  • Bridging the Modality Gap for Speech-to-Text Translation, Arxiv-2020, [paper]
  • Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation, COLING-2020, [paper], [code]
  • Effectively pretraining a speech translation decoder with Machine Translation data, EMNLP-2020, [paper]
  • Tight Integrated End-to-End Training for Cascaded Speech Translation, SLT-2021, [paper]
  • Breeding Gender-aware Direct Speech Translation Systems, COLING-2020, [paper]
  • On Knowledge Distillation for Direct Speech Translation, CLiC-IT-2020, [paper]
  • Streaming Models for Joint Speech Recognition and Translation, EACL-2021, [paper]
  • CTC-based Compression for Direct Speech Translation, EACL-2021, [paper]
  • Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation, ICML-2021, [paper]
  • Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation, NAACL-2021, [paper]
  • Large-Scale Self- and Semi-Supervised Learning for Speech Translation, Arxiv-2021, [paper]
  • End-to-end Speech Translation via Cross-modal Progressive Training, InterSpeech2021-2021, [paper]
  • Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation, Arxiv-2021, [paper]
  • AlloST: Low-resource Speech Translation without Source Transcription, InterSpeech2021-2021, [paper]
  • Learning Shared Semantic Space for Speech-to-Text Translation, ACL-2021 Findings, [paper]
  • Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders, ACL-2021, [paper]
  • How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation, ACL-2021 Findings, [paper]
  • Cascade versus Direct Speech Translation: Do the Differences Still Make a Difference?, ACL-2021, [paper]
  • Efficient Transformer for Direct Speech Translation, Arxiv-2021, [paper]
  • Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task, ACL-2021, [paper]
  • Beyond Sentence-Level End-to-End Speech Translation: Context Helps, ACL-2021, [paper]
  • AdaST: Dynamically Adapting Encoder States in the Decoder for End-to-End Speech-to-Text Translation, ACL-2021, [paper]
  • Speechformer: Reducing Information Loss in Direct Speech Translation, EMNLP-2021, [paper]
  • Fast-Md: Fast Multi-Decoder End-To-End Speech Translation With Non-Autoregressive Hidden Intermediates, ASRU-2021, [paper]
  • Mutual-Learning Improves End-to-End Speech Translation, EMNLP-2021, [paper]

End-to-end Streaming ST

  • Simuls2s: End-to-end Simultaneous Speech To Speech Translation, ICLR-2019(under review), [paper]
  • ON-TRAC Consortium for End-to-End and Simultaneous SpeechTranslation Challenge Tasks at IWSLT 2020, IWSLT-2020, [paper]
  • SimulSpeech: End-to-End Simultaneous Speech to Text Translation, ACL-2020, [paper]
  • Streaming Simultaneous Speech Translation With Augmented Memory Transformer, ICASSP-2021(submitted), [paper]
  • SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation, Arxiv-2020, [paper]
  • Simultaneous Speech-To-Speech Translation System With Neural Incremental Asr, Mt, And Tts, Arxiv-2020, [paper]
  • An Empirical Study Of End-To-End Simultaneous Speech Translation Decoding Strategies, ICASSP 2021, [paper]
  • RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer, ACL-2021 Findings, [paper]
  • Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR, ACL-2021 Findings, [paper]
  • Simultaneous Speech Translation for Live Subtitling: from Delay to Display, Arxiv-2021, [paper]
  • UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation, Arxiv-2021, [paper]
  • Decision Attentive Regularization To Improve Simultaneous Speech Translation Systems, ICASSP-2022 submitted, [paper]

End-to-end NA ST

  • Orthros: Non-Autoregressive End-To-End Speech Translation With Dual-Decoder, Arxiv-2020, [paper]
  • Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation, ACL-2021 Findings, [paper]
  • Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring, Arxiv-2021, [paper]

End-to-end Multilingual ST

  • Multilingual End-To-End Speech Translation, ASRU-2019, [paper]
  • One-To-Many Multilingual End-To-End Speech Translation, ASRU-2019, [paper]
  • Multilingual Speech Translation with Efficient Finetuning of Pretrained Models, ACL-2021, [paper]
  • Lightweight Adapter Tuning for Multilingual Speech Translation, [paper]

End-to-end S2ST

  • Direct speech-to-speech translation with a sequence-to-sequence model, InterSpeech-2019, [paper]
  • Speech-To-Speech Translation Between Untranscribed Unknown Languages, ASRU-2019, [paper]
  • Transformer-Based Direct Speech-To-Speech Translation With Transcoder, SLT-2021, [paper]
  • Direct Speech-To-Speech Translation With Discrete Units, Arxiv-2021, [paper]
  • Translatotron 2: Robust Direct Speech-To-Speech Translation, Arxiv-2021, [paper]
  • Direct Simultaneous Speech To Speech Translation, Arxiv-2021, [paper]

End-to-end Zero-shot ST

  • Zero-shot Speech Translation, Arxiv-2021, [paper]

Multimodal MT

  • Transformer-based Cascaded Multimodal Speech Translation, Arxiv-2019, [paper]
  • Towards Multimodal Simultaneous Neural Machine Translation, Arxiv-2020, [paper]
  • Towards Automatic Face-to-Face Translation, Arxiv-2020, [paper], [code]
  • Keyframe Segmentation and Positional Encoding for Video-guided Machine Translation Challenge 2020, ALVR-2020, [paper]
  • DeepFuse: HKU’s Multimodal Machine Translation System for VMT’20, ALVR-2020, [paper]
  • Team RUC AI·M3 Technical Report at VMT Challenge 2020: Enhancing Neural Machine Translation with Multimodal Rewards, ALVR-2020, [paper]
  • Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation,EACL-2021,[paper]
  • Cross-lingual Visual Pre-training for Multimodal Machine Translation, EACL-2021, [paper]
  • Generative Imagination Elevates Machine Translation, NAACL-2021, [[https://arxiv.org/abs/2009.09654]]
  • Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding, AAAI-2021, [paper]
  • Improving Translation Robustness with Visual Cues and Error Correction, Arxiv-2021, [paper]
  • Gumbel-Attention for Multi-modal Machine Translation, Arxiv-2021, [paper]
  • Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation, ACL-2021, [paper]

Streaming MT

  • Simultaneous translation of lectures and speeches, Machine Translation-2007, [paper]
  • Real-time incremental speech-to-speech translation of dialogs, NAACL-2012, [paper]
  • Incremental segmentation and decoding strategies for simultaneous translation, IJCNLP-2013, [paper]
  • Don't Until the Final Verb Wait: Reinforcement learning for simultaneous machine translation, EMNLP-2014, [paper]
  • Segmentation strategies for streaming speech translation, NAACL-2013, [paper]
  • Optimizing segmentation strategies for simultaneous speech translation, ACL-2014, [paper]
  • Syntax-based simultaneous translation through prediction of unseen syntactic constituents, ACL-IJCNLP-2015, [paper]
  • Simultaneous machine translation using deep reinforcement learning, ICML-2016, [paper]
  • Interpretese vs. translationese: The uniqueness of human strategies in simultaneous interpretation, NAACL-2016, [paper]
  • Can neural machine translation do simultaneous translation?, Arxiv-2016, [paper]
  • Learning to translate in real-time with neural machine translation, EACL-2017, [paper]
  • Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation, NAACL-2018, [paper]
  • Prediction Improves Simultaneous Neural Machine Translation, EMNLP-2018, [paper]
  • STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework, ACL-2019, [paper]
  • Simultaneous Translation with Flexible Policy via Restricted Imitation Learning, ACL-2019, [paper]
  • Monotonic Infinite Lookback Attention for Simultaneous Machine Translation, ACL-2019, [paper]
  • Thinking Slow about Latency Evaluation for Simultaneous Machine Translation, Arxiv-2019, [paper]
  • DuTongChuan: Context-aware Translation Model for Simultaneous Interpreting, Arxiv-2019, [paper]
  • Monotonic Multihead Attention, ICLR-2020(under review), [paper]
  • How To Do Simultaneous Translation Better With Consecutive Neural Machine Translation, Arxiv-2019, [paper]
  • Simultaneous Neural Machine Translation using Connectionist Temporal Classification, Arxiv-2019, [paper]
  • Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation, ICASSP-2020, [paper]
  • Learning Coupled Policies for Simultaneous Machine Translation, Arxiv-2020, [paper]
  • Re-translation versus Streaming for Simultaneous Translation, Arxiv-2020, [paper]
  • Efficient Wait-k Models for Simultaneous Machine Translation, Arxiv-2020, [paper]
  • Opportunistic Decoding with Timely Correction for Simultaneous Translation, ACL-2020, [paper]
  • Neural Simultaneous Speech Translation Using Alignment-Based Chunking, IWSLT2020, [paper]
  • Dynamic Masking for Improved Stability in Spoken Language Translation, Arxiv-2020, [paper]
  • Learn to Use Future Informationin Simultaneous Translation, Arxiv-2020, [paper]
  • Presenting Simultaneous Translation in Limited Space, ITAT WAFNL 2020, [paper]
  • Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training, EMNLP2020 Findings, [paper]
  • Improving Simultaneous Translation with Pseudo References, Arxiv-2020, [paper]
  • Future-Guided Incremental Transformer for Simultaneous Translation, AAAI-2021, [paper]
  • Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation, Arxiv-2021, [paper]
  • Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning, EACL-2021, [paper]
  • Simultaneous Multi-Pivot Neural Machine Translation, Arxiv-2021, [paper]
  • Stream-level Latency Evaluation for Simultaneous Machine Translation, Arxiv-2021, [paper]
  • Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation, Interspeech 2021, [paper]
  • Full-Sentence Models Perform Better in Simultaneous Translation Using the Information Enhanced Decoding Strategy, Arxiv-2021, [paper]
  • Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy, EMNLP-2021, [paper]

Related Works

Automated Audio Captioning

  • Effects Of Word-Frequency Based Pre- Annd Post- Processings For Audio Captioning, DCASE-2020, [paper]

Named Entity Recognition

  • End-to-end Named Entity Recognition from English Speech, INTERSPEECH2020(submitted), [paper]

Text Normalization

  • A Hybrid Text Normalization System Using Multi-Head Self-Attention For Mandarin, ICASSP-2020, [paper]
  • A Unified Sequence-To-Sequence Front-End Model For Mandarin Text-To-Speech Synthesis, ICASSP-2020, [paper]
  • Naturalization of Text by the Insertion of Pauses and Filler Words, Arxiv-2020, [paper]

Disfluency Detection

  • Semi-Supervised Disfluency Detection, COLING-2018, [paper]
  • Adapting Translation Models for Transcript Disfluency Detection, AAAI-2019, [paper]
  • Giving Attention to the Unexpected:Using Prosody Innovations in Disfluency Detection, Arxiv-2019, [paper]
  • Multi-Task Self-Supervised Learning for Disfluency Detection, AAAI-2020, [paper]
  • Improving Disfluency Detection by Self-Training a Self-Attentive Model, Arxiv-2020, [paper]
  • Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection, EMNLP-2020, [paper], [code]
  • Auxiliary Sequence Labeling Tasks For Disfluency Detection, Arxiv-2020, [paper]

Punctuation Prediction

  • Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection, ICASSP-2020,[paper]
  • Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings, INTERSPEECH-2020 (submitted), [paper]
  • Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech, INTERSPEECH-2020, [paper]

Workshop

Copyright

By volunteers from Institute of Automation,Chinese Academy of Sciences & ByteDance AI Lab.

Welcome to open an issue or make a pull request!