-
Notifications
You must be signed in to change notification settings - Fork 0
/
speech-recognition.yaml
60 lines (55 loc) · 3.24 KB
/
speech-recognition.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
- name: DeepSpeech
link: https://github.com/PaddlePaddle/DeepSpeech
description: |
DeepSpeech2 on PaddlePaddle is an open-source implementation of end-to-end Automatic
Speech Recognition (ASR) engine, based on Baidu's Deep Speech 2 paper, with PaddlePaddle
platform. Our vision is to empower both industrial application and academic research on speech
recognition, via an easy-to-use, efficient and scalable implementation, including training,
inference & testing module, and demo deployment. Besides, several pre-trained models for
both English and Mandarin are also released.
references:
- https://github.com/PaddlePaddle/Paddle
- name: wav2letter
link: https://github.com/facebookresearch/wav2letter
description: |
wav2letter++ is a fast, open source speech processing toolkit from the Speech team at Facebook
AI Research built to facilitate research in end-to-end models for speech recognition. It is written
entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for
maximum efficiency.
references:
- https://github.com/facebookresearch/wav2letter/wiki
- name: julius
link: https://github.com/julius-speech/julius
description: |
"Julius" is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR)
decoder software for speech-related researchers and developers. Based on word N-gram and context-dependent
HMM, it can perform real-time decoding on various computers and devices from micro-computer to cloud
server. The algorithm is based on 2-pass tree-trellis search, which fully incorporates major decoding
techniques such as tree-organized lexicon, 1-best / word-pair context approximation, rank/score pruning,
N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning,
Gaussian selection, etc. Besides search efficiency, it is also modularized to be independent from model
structures, and wide variety of HMM structures are supported such as shared-state triphones and tied-mixture
models, with any number of mixtures, states, or phone sets. It also can run multi-instance recognition,
running dictation, grammar-based recognition or isolated word recognition simultaneously in a single thread.
Standard formats are adopted for the models to cope with other speech/language modeling toolkit such as
HTK, SRILM, etc. Recent version also supports Deep Neural Network (DNN) based real-time decoding.
references:
- https://github.com/julius-speech/dictation-kit
- https://github.com/julius-speech/grammar-kit
- https://github.com/julius-speech/segmentation-kit
- https://github.com/julius-speech/prompter
- name: kaldi
link: https://github.com/kaldi-asr/kaldi
description: |
This is the official location of the Kaldi project
references:
- http://kaldi-asr.org/
- name: DeepSpeech
link: https://github.com/mozilla/DeepSpeech
description: |
DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based
on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation
easier.
references:
- http://deepspeech.readthedocs.io/