In speech processing, keyword spotting deals with the identification of keywords in utterances. This repo is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection) papers.
- Deep Spoken Keyword Spotting: An Overview, JOHN H.L. HANSEN (Fellow, IEEE), 2021.11
- Efficient dynamic filter for robust and low computational feature extraction, Korea University & Drexel university, 2022.05
- Improving Feature Generalizability with Multitask Learning in Class Incremental Learning, University of Cambridge & Singapore Management University.2022.04
- Understanding Audio Features via Trainable Basis Functions, Singapore University of Technology and Design & Agency for Science, Technology and Research, 2022.04
- Depth Pruning with Auxiliary Networks for TinyML, University of the Philippines & Samsung Research Philippines, 2022.04
- AB/BA analysis: A framework for estimating keyword spotting recall improvement while maintaining audio privacy, Amazon, 2022.04
- Production federated keyword spotting via distillation, filtering, and joint federated-centralized training, Google LLC & University of Washington, 2022.04
- Small Footprint Multi-channel ConvMixer for Keyword Spotting with Centroid Based Awareness, Alibaba Group & Nanyang Technological University, 2022.04
- Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention, Technical University of Denmark & KU Leuven, 2022.04
- On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting, National Taiwan University & intelliGo Technology inc. 2022.04
- Target-aware Neural Architecture Search and Deployment for Keyword Spotting, University of Cagliari & Concept BU,2022.04
- Learning Decoupling Features Through Orthogonality Regularization, Peking University & Xiaomi Inc., 2022.03
- Rainbow Keywords: Efficient Incremental Learning for Online Spoken Keyword Spotting, Nanyang Technological University, 2022.03
- BiFSMN: Binary Neural Network for Keyword Spotting, Beihang University & Bytedance AI Lab,2022.02
- A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting, University of Maryland, Baltimore County, 2022.02
- Progressive Continual Learning for Spoken Keyword Spotting, A*STAR, Singapore & Nanyang Technological University, 2022.01
- ConvMixer: Feature Interactive Convolution with Curriculum Learning for Small Footprint and Noisy Far-field Keyword Spotting, Alibaba & Nanyang Technological University, 2022.01
- BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge, Netease, 2021.12
- Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection, Universita Politecnica delle Marche & Amazon, 2021.11
- WaveSense: Efficient Temporal Convolutions with Spiking Neural Networks for Keyword Spotting, SynSense AG, 2021.11
- End-to-end Keyword Spotting using Xception-1d, University of Valencia, 2021.10
- Multi-task Voice Activated Framework using Self-supervised Learning, University of California & Qualcomm, 2021.10
- Lightweight dynamic filter for keyword spotting, Korea University & Drexel University, 2021.09
- Audiomer: A Convolutional Transformer for Keyword Spotting, George Mason University, 2021.09
- Behavior of Keyword Spotting Networks Under Noisy Conditions, Indian Institute Technology & University of T¨ubingen, 2021.09
- A Separable Temporal Convolution Neural Network with Attention for Small-Footprint Keyword Spotting, Beijing Institute of Technology & Xiaomi Inc., 2021.09
- Text Anchor Based Metric Learning for Small-footprint Keyword Spotting, ADSPLAB Peking University, 2021.08
- Multi-task Learning with Cross Attention for Keyword Spotting, Apple & The University of Hong Kong, 2021.07
- AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data, Northwestern Polytechnical University, 2021.07
- An Integrated Framework for Two-pass Personalized Voice Trigger, Xiamen University, 2021.06
- Zero-Shot Federated Learning with New Classes for Audio Classification, Global AI Accelerator, Ericsson, 2021.06
- Broadcasted Residual Learning for Efficient Keyword Spotting, Qualcomm AI Research, 2021.06
- Encoder-Decoder Neural Architecture Optimization for Keyword Spotting, University of Alberta & University of Montreal, 2021.06
- Teaching keyword spotters to spot new keywords with limited examples, Google Research, 2021.06
- Noisy student-teacher training for robust keyword spotting, Google Inc., 2021.06
- A Streaming End-to-End Framework For Spoken Language Understanding, University of Waterloo & HuaWei & Tsinghua University, 2021.05
- Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting, Kumoh National Institute of Technology (KIT), 2021.05
- Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation, Apple, 2021.05
- Efficient Keyword Spotting through long-range interactions with Temporal Lambda Networks, Universitat Polit ́ecnica de Catalunya, 2021.04
- End-to-end Keyword Spotting using Neural Architecture Search and Quantization, Graz University of Technology, 2021.04
- The DKU System Description for The Interspeech 2021 Auto-KWS Challenge, Duke Kunshan University, 2021.04
- Few-Shot Keyword Spotting in Any Language, Harvard University & Coqui & Google, 2021.04
- Keyword Transformer A Self-Attention Model for Keyword Spotting, Arm ML Research Lab, 2021.04
- Learning Efficient Representations for Keyword Spotting with Triplet Loss, Tomsk State University & NTR Labs, 2021.01
- The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems, Duke Kunshan University, 2021.01
- Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric, Apple, 2020.11
- Training Wake Word Detection with Synthesized Speech Data on Confusion Words, Duke Kunshan University, 2020.11
- Ieee slt 2021 alpha-mini speech challenge: Open datasets, tracks, rules and baselines, Northwestern Polytechnical University, 2020.11
- A depthwise separable convolutional neural network for keyword spotting on an embedded system, Technical University of Denmark, 2020.10
- Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution, University of Science and Technology of China, 2020.10
- Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers, Arm ML Research & Harvard University, 2020.10
- Neural Architecture Search For Keyword Spotting, University of Alberta & Huawei Technologies, 2020.09
- Seeing wake words: Audio-visual keyword spotting, University of Oxford, 2020.09
- AutoKWS: Keyword Spotting with Differentiable Architecture Search, Xiaomi AI Lab, 2020.09
- Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware, Applied Brain Research Inc., 2020.09
- Neural ODE with Temporal Convolution and Time Delay Neural Networks for Small-Footprint Keyword Spotting, AIST, 2020.08
- WSRNet: Joint Spotting and Recognition of Handwritten Words, National Technical University of Athens, 2020.08
- Domain Aware Training for Far-field Small-footprint Keyword Spotting, Duke Kunshan University, 2020.08
- Very Fast Keyword Spotting System with Real Time Factor Below 0.01, Technical University of Liberec, 2020.07
- Few-Shot Keyword Spotting With Prototypical Networks, The University of North Carolina at Charlotte, 2020.07
- Exploring Filterbank Learning for Keyword Spotting, Aalborg University, 2020.06
- Mining Effective Negative Training Samples for Keyword Spotting, Northwestern Polytechnical University & Mobvoi Inc., 2020.05
- Training Keyword Spotting Models on Non-IID Data with Federated Learning, Google LLC, 2020.05
- Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention, KAIST, 2020.05
- Streaming keyword spotting on mobile devices, Google Research, 2020.05
- Metric Learning for Keyword Spotting, Naver Corporation, 2020.05
- End-to-End Multi-Look Keyword Spotting, Tencent AI Lab, 2020.05
- Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting, Northwestern Polytechnical University, 2020.04
- Phoneme boundary detection using learnable segmental features, Bar-Ilan University & Facebook Inc., 2020.02
- Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks, Sonos Inc., 2020.02
- Training Keyword Spotters with Limited and Synthesized Speech Data, Google Research, 2020.02
- Learning to detect keyword parts and whole by smoothed max pooling, Google Inc. 2020.01
- Multi-Task Learning for Speaker Verification and Voice Trigger Detection, Apple, 2020.01
- Performance-Oriented Neural Architecture Search, Trinity College Dublin, 2020.01
- Small-footprint keyword spotting with graph convolutional network, Tsinghua University, 2019.12
- Predicting detection filters for small footprint open-vocabulary keyword spotting, 2019.12
- Temporal feedback convolutional recurrent neural networks for keyword spotting, KAIST, 2019.11
- Small-footprint keyword spotting on raw audio data with sinc-convolutions, Technische Universität München, 2019.11
- Orthogonality constrained multi-head attention for keyword spotting, Qualcomm AI Research, 2019.10
- Query-by-example on-device keyword spotting, Qualcomm AI Research, 2019.10
- Adversarial example detection by classification for deep speech recognition, Aalborg University, 2019.10
- A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting, UC Irvine, 2019.09
- Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification, Amazon, 2019.07
- Improving reverberant speech training using diffuse acoustic simulation, University of Maryland & Tencent, 2019.07
- Multi-layer Attention Mechanism for Speech Keyword Recognition, Sichuan University, 2019.07
- A Monaural Speech Enhancement Method for Robust Small-Footprint Keyword Spotting, Inner Mongolia University, 2019.06
- Keyword Spotting for Hearing Assistive Devices Robust to External Speakers, Aalborg University, 2019.06
- Temporal Convolution for Real-time Keyword Spotting on Mobile Devices, Hyperconnect, 2019.04
- SpeechYOLO: Detection and Localization of Speech Objects, Bar-Ilan University, 2019.04
- Ternary hybrid neural-tree networks for highly constrained iot applications, Arm ML Research Lab. 2019.03
- Stochastic Adaptive Neural Architecture Search for Keyword Spotting, Paris & Facebook AI Research, 2019.03
- Region Proposal Network Based Small-Footprint Keyword Spotting, Northwestern Polytechnical University & Mobvoi, 2019.08
- An In-Vehicle Keyword Spotting System with Multi-Source Fusion for Vehicle Applications, Beijing University of Posts and Telecommunications, 2019.02
- Efficient keyword spotting using dilated convolutions and gating,Snips, 2019.01
- End-to-end streaming keyword spotting, Google Inc. 2019.01
- Prototypical metric transfer learning for continuous speech keyword spotting with limited training data, ParallelDots, Inc. 2019.01
- Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware, Applied Brain Research, Inc., 2018.12
- Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks, University of Waterloo, 2018.12
- Efficient Voice Trigger Detection for Low Resource Hardware, Siri Speech, 2018.11 Apple,
- Sequence-to-sequence models for small-footprint keyword spotting, Xiaomi Inc., 2018.11
- End-to-end Models with auditory attention in Multi-channel Keyword Spotting, Xiaomi Inc., 2018.11
- Hierarchical Neural Network Architecture In Keyword Spotting, NIO Co., Ltd, 2018.11
- Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders, Stellenbosch University, 2018.11
- DONUT: CTC-based Query-by-Example Keyword Spotting, Fluent.ai, 2018.11
- Federated learning for keyword spotting, Snips, 2018.10
- JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis, University of Waterloo, 2018.10
- Data augmentation for robust keyword spotting under playback interference, Amazon Alexa & Google Inc & Purdue University, 2018.08
- Sequence discriminative training for deep learning based acoustic keyword spotting, Zhehuai Chen, 2018.08
- Weight-importance sparse training in keyword spotting, NIO Co., Ltd, 2018.07
- Efficient keyword spotting using time delay neural networks, Fluent.ai Inc., 2018.07
- Zero-shot keyword spotting for visual speech recognition in-the-wild, University of Nottingham, 2018.07
- ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages, Stellenbosch University, 2018.07
- Resource-Efficient Neural Architect, Baidu, 2018.06
- Visually grounded cross-lingual keyword spotting in speech, Stellenbosch University, 2018.06
- Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, Google Brain, 2018.04
- Developing far-field speaker system via teacher-student learning, Microsoft AI & Research, 2018.04
- An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, Carnegie Mellon University, 2018.03
- Speech recognition: keyword spotting through image recognition, UCSU, 2018.03
- Attention-based End-to-End Models for Small-Footprint Keyword Spotting, Xiaomi Inc., 2018.03
- A Cascade Architecture for Keyword Spotting on Mobile Devices, Google Inc. 2017.12
- Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio, Voicera, 2017.11
- An experimental analysis of the power consumption of convolutional neural networks for keyword spotting, University of Waterloo, 2017.11
- Hello Edge: Keyword Spotting on Microcontrollers, Arm & Stanford University, 2017.11
- Deep residual learning for small-footprint keyword spotting, University of Waterloo, 2017.10
- Streaming small-footprint keyword spotting using sequence-to-sequence models, 2017.10
- Small-footprint keyword spotting using deep neural network and connectionist temporal classifier, Ant Financial Group, 2017.09
- Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting, Amazon, 2017.08
- Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting, Amazon & Google Brain, 2017.05
- Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting, Baidu, 2017.03
- An End-to-End Architecture for Keyword Spotting and Voice Activity Detection, Mindori, 2016.11
- Trainable Frontend For Robust and Far-Field Keyword Spotting, Google, 2016.07
- Low Resource High Accuracy Keyword Spotting, Guoguo Chen, 2016
- Online keyword spotting with a character-level recurrent neural network, 2015.12
- Structured Transforms for Small-Footprint Deep Learning, Google, 2015.10
- Small-footprint keyword spotting using deep neural networks, Guoguo Chen, 2014
- Github: Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting ( State-of-the-Art )
- Github: Mining Effective Negative Training Samples for Keyword Spotting
- Github: A depthwise separable convolutional neural network for keyword spotting on an embedded system
- Github: Hello Edge: Keyword spotting on Microcontrollers
- Github: Few-Shot Keyword Spotting in Any Language
- Github: Learning Efficient Representations for Keyword Spotting with Triplet Loss
- Github: The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems
- Github: Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers
- Github: Neural ODE with Temporal Convolution and Time Delay Neural Networks for Small-Footprint Keyword Spotting
- Github: Few-Shot Keyword Spotting With Prototypical Networks
- Region Proposal Network Based Small-Footprint Keyword Spotting
- Official code: Improving reverberant speech training using diffuse acoustic simulation
- Github: Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
- Github: Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware
-
WeKws (Production First and Production Ready End-to-End Keyword Spotting Toolkit)
Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.
-
Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications.
-
Sonus lets you quickly and easily add a VUI (Voice User Interface) to any hardware or software project. Just like Alexa, Google Assistant, and Siri, Sonus is always listening offline for a customizable hotword. Once that hotword is detected your speech is streamed to the cloud recognition service of your choice - then you get the results in realtime.
-
Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Google services, Picovoice runs entirely on-device while being more accurate.
-
A lightweight, simple-to-use, RNN wake word listener.
Precise is a wake word listener. The software monitors an audio stream ( usually a microphone ) and when it recognizes a specific phrase it triggers an event. For example, at Mycroft AI the team has trained Precise to recognize the phrase "Hey, Mycroft". When the software recognizes this phrase it puts the rest of Mycroft's software into command mode and waits for a command from the person using the device. Mycroft Precise is fully open source and can be trined to recognize anything from a name to a cough.
-
Speech Commands
-
Homepage: Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
-
Description: An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.
-
Download:
-
-
Mobvoi Hotwords
-
Homepage: Region Proposal Network Based Small-Footprint Keyword Spotting
-
Description:
- The MobvoiHotwords is a corpus of wake-up words collected from a commercial smart speaker of Mobvoi. It consists of keyword and non-keyword utterances.
- For keyword data, keyword utterances contain either 'Hi xiaowen' or 'Nihao Wenwen' are collected. For each keyword, there are about 36k utterances. All keyword data is collected from 788 subjects, ages 3-65, with different distances from the smart speaker (1, 3 and 5 meters). Different noises (typical home environment noises like music and TV) with varying sound pressure levels are played in the background during the collection. The keyword data is identical to the keyword data used in the paper below:
-
Download: MobvoiHotwords
-
-
HI-MIA
-
Description:
- The data is used in AISHELL Speaker Verification Challenge 2019. It is extracted from a larger database called AISHELL-WakeUp-1.
- The contents are wake-up words "Hi, Mia" in both Chinese and English. The data is collected in real home environment using microphone arrays and Hi-Fi microphone. The collection process and development of a baseline system was described in the paper below. The data used in the challenge is extracted from 1 Hi-Fi microphone and 16-channel circular microphone arrays for 1/3/5 meters. And the contents are the Chinese wake-up words. The whole set is divided into train (254 people), dev (42 people) and test (44 people) subsets. Test subset is provided with paired target/non-target answer to evaluate verification results.
-
Download: HI-MIA
-
In this challenge, we further propose the Automated Speech (AutoSpeech) competition which aims at proposing automated solutions for speech-related tasks. This challenge is restricted to multi-label classification problems, which come from different speech classification domains. The provided solutions are expected to discover various kinds of paralinguistic speech attribute information, such as speaker, language, emotion, etc, when only raw data (speech features) and meta information are provided. There are two kinds of datasets, which correspond to public and private leaderboard respectively. Five public datasets (without labels in the testing part) are provided to the participants for developing AutoSpeech solutions. Afterward, solutions will be evaluated on private datasets without human intervention. The results of these private datasets determine the final ranking.
Officical Code: AutoSpeech
-
The 2020 Personalized Voice Trigger Challenge (PVTC2020)
Recently, personalized voice trigger or wake-up word detection is gaining popularity among speech researchers and developers. Conventionally, the wake-up word detection and speaker verification are carried out separately in pipeline, where a wake-up word detection system is used to generate successful trigger followed by a speaker verification system used to perform identity authentication. In such case, the wake-up word detection system and the speaker verification system are optimized separately, not through an overall joint optimization with a unified goal. As a consequence, their respective network parameters and extracted information are not effectively shared and jointly utilized. Generally the wake-up word detection system needs to run all the time,but the network of speaker verification is relatively large and may not meet the requirements of computing resources on embedding devices. The joint learning or multi-task learning network might be either very light at a small scale as a single always on system, or with a much larger scale at the second stage after a successful wake-up by the first stage voice trigger.
Paper: The DKU System Description for The Interspeech 2021 Auto-KWS Challenge
Officical Code: PVTC2020