Speech Resource List

This resource list is mantained by Ramon Sanabria, Edinburgh NLP and The Centre for Speech Technology Research, The University of Edinburgh. Ex Language Technologies Institute and Robotics Institute, CMU.

This list is probably biased towards my current research directions. So, please, if there is anythin missing, please let me know. Suggestions are super welcome :)

Content Table

Papers
Tutorials
Courses
Journals
Datasets

Papers

Speech Processing and Recognition

Unsupervised
- Segmentation
- Representation
  - High-level Representation
  - Low-level Representation
Supervised
Zero-shot
- Acoustic Modelling
- Keyword Spotting
Multimodal

Unsupervised

Segmentation

Kamper H, Jansen A, Goldwater S. Fully unsupervised small-vocabulary speech recognition using a segmental bayesian model INTERSPEECH 2015

Representation

High-level Representation

Kamper H, Jansen A, King S, Goldwater S. Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings. SLT 2014

Low-level Representation

Oord AV, Li Y, Vinyals O. Representation learning with contrastive predictive coding NIPS 2019
Chung YA, Hsu WN, Tang H, Glass J. An unsupervised autoregressive model for speech representation learning INTERSPEECH 2019
Pascual S, Ravanelli M, Serrà J, Bonafonte A, Bengio Y. Learning problem-agnostic speech representations from multiple self-supervised tasks INTERSPEECH 2019

Supervised

Acoustic Modelling

Metalearning

Klejch O, Fainberg J, Bell P, Renals S. Speaker Adaptive Training using Model Agnostic Meta-Learning ASRU 2019
Klejch O, Fainberg J, Bell P. Learning to adapt: a meta-learning approach for speaker adaptation Interspeech 2018

Pretraining

Wiesner M, Renduchintala A, Watanabe S, Liu C, Dehak N, Khudanpur S. Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings INTERSPEECH 2019

Augmentation

Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition INTERSPEECH 2019

End-to-end

Baskar MK, Burget L, Watanabe S, Karafiát M, Hori T, Černocký JH. Promising Accurate Prefix Boosting for sequence-to-sequence ASR ICASSP 2019
Collobert R, Hannun A, Synnaeve G. A fully differentiable beam search decoder arxiv 2019

Multimodal

Adaptation

Moriya Y, Jones GJ. Multimodal Speaker Adaptation of Acoustic Model and Language Model for Asr Using Speaker Face Embedding ICASSP 2019
Caglayan O, Sanabria R, Palaskar S, Barraul L, Metze F. Multimodal Grounding for Sequence-to-sequence Speech Recognition ICASSP 2019
Palaskar S, Sanabria R, Metze F. End-to-end multimodal speech recognition ICASSP 2018
Gupta A, Miao Y, Neves L, Metze F. Visual features for context-aware speech recognition ICASSP 2017
Sun F, Harwath D, Glass J. Look, Listen, And Decode: Multimodal Speech Recognition With Images ICASSP 2016

Representation

Pasad, Ankita, et al. On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval INTERSPEECH 2019
Owens A, Wu J, McDermott JH, Freeman WT, Torralba A. Learning sight from sound: Ambient sound provides supervision for visual learning. IJCV 2018

Unsupervised

Kamper H, Settle S, Shakhnarovich G, Livescu K. Visually grounded learning of keyword prediction from untranscribed speech INTERSPEECH 2017

Other (maybe not speech, but still relevant)

Shi H, Mao J, Gimpel K, Livescu K. Visually Grounded Neural Syntax Acquisition arxiv 2019

Supervised

Inaguma H, Duh K, Kawahara T, Watanabe S. Multilingual End-to-End Speech Translation arXiv 2019
Bansal S, Kamper H, Livescu K, Lopez A, Goldwater S. Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. NAACL 2019

Unsupervised

Chung YA, Weng WH, Tong S, Glass J. Towards unsupervised speech-to-text translation ICASSP 2019

Machine Translation

Low-Resource
Multimodal

Low-Resource

Gu J, Wang Y, Chen Y, Cho K, Li VO. Meta-learning for low-resource neural machine translation VIDEO EMNLP 2018

Multimodal

Caglayan O, Madhyastha P, Specia L, Barrault L. Probing the Need for Visual Context in Multimodal Machine Translation NAACL 2019

Machine Learning

Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks ICML 2017

Others

Unsupervised
Computer Vision

Unsupervised

Kawakami, K., Dyer, C. and Blunsom, P. Learning to Discover, Ground and Use Words with Segmental Neural Language Models ACL 2019 * Kawakami, K., Dyer, C. and Blunsom, P. Unsupervised Word Discovery with Segmental Neural Language Models ACL 2019

Computer Vision

Mahajan D, Girshick R, Ramanathan V, He K, Paluri M, Li Y, Bharambe A, van der Maaten L. Exploring the limits of weakly supervised pretraining EECV 2018
Patterson G, Hays J. Coco attributes: Attributes for people, animals, and objects EECV 2016

Courses

The University of Edinburgh, AUTOMATIC SPEECH RECOGNITION

Journals

Datasets

Speech Recognition
- Low-Resource
Speech To Translation
- Low-Resource
- High-Resource
Machine Translation
- Low-Resource
- High-Resource
Multimodal
Other

Speech Recognition

Low Resource

Black AW. CMU Wilderness Multilingual Speech Dataset. ICASSP 2019

Speech To Translation

Low Resource

Boito MZ, Havard WN, Garnerin M, Ferrand ÉL, Besacier L. MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible. arXiv 2019
Godard P, Adda G, Adda-Decker M, Benjumea J, Besacier L, Cooper-Leavitt J, Kouarata GN, Lamel L, Maynard H, Müller M, Rialland A. A very low resource language speech corpus for computational language documentation experiments. LREC 2018

High Resource

Di Gangi MA, Cattoni R, Bentivogli L, Negri M, Turchi M. MuST-C: a Multilingual Speech Translation Corpus. NAACL 2019
Salesky E, Burger S, Niehues J, Waibel A. Towards fluent translations from disfluent speech. SLT 2018
Post M, Kumar G, Lopez A, Karakos D, Callison-Burch C, Khudanpur S. Improved speech-to-text translation with the Fisher and Callhome Spanish–English speech translation corpus. IWSLT 2013

Machine Translation

Low Resource

Guzmán F, Chen PJ, Ott M, Pino J, Lample G, Koehn P, Chaudhary V, Ranzato MA. Two new evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English EMNLP 2019

Multimodal

Sanabria R, Caglayan O, Palaskar S, Elliott D, Barrault L, Specia L, Metze F. How2: a large-scale dataset for multimodal language understanding NIPS 2018 Workshop

Other

Great resource from @josh_meyer here
Great collection of video datasets here

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
LICENSE		LICENSE
README.md		README.md

License

ramonsanabria/speech-reading-list

Folders and files

Latest commit

History

Repository files navigation

Speech Resource List

Content Table

Papers

Speech Processing and Recognition

Unsupervised

Segmentation

Representation

High-level Representation

Low-level Representation

Supervised

Acoustic Modelling

Metalearning

Pretraining

Augmentation

End-to-end

Transformer-like

Zero-shot

Keyword Spotting

Multimodal

Adaptation

Representation

Unsupervised

Other (maybe not speech, but still relevant)

Speech Translation

Supervised

Unsupervised

Machine Translation

Low-Resource

Multimodal

Machine Learning

Others

Unsupervised

Computer Vision

Courses

Journals

Datasets

Speech Recognition

Low Resource

Speech To Translation

Low Resource

High Resource

Machine Translation

Low Resource

Multimodal

Other

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages