-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ctc beam search decoder #59
Conversation
The algorithm in prefix beam search paper is found to be very confusing and may have some problem in details. So here is a modification, which the code is based on |
inputs_t = [ops.convert_to_tensor(x) for x in inputs] | ||
inputs_t = array_ops.stack(inputs_t) | ||
|
||
# run CTC beam search decoder in tensorflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问为什么单测用tensorflow来写呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
只是为了和TensorFlow对比结果
Validate the implementation To affirm the correctness, the implementation is compared with the ctc_beam_search_decoder in TensorFlow under the same input probability matrix and beam size. An independent repo is provided to test the logic. Run the script
More validation can be done by setting different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make the interface of
ctc_beam_search_decoder
more general to allow any external custom scorer to be used. - Please carefully clean and check the codes before committing.
import random | ||
import numpy as np | ||
|
||
# vocab = blank + space + English characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unnecessary comment lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return ids_str | ||
|
||
|
||
def language_model(ids_list, vocabulary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a "toy" language model just for testing. Please replace it with a "real" one build in the pull request #71.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
beam_size, | ||
vocabulary, | ||
max_time_steps=None, | ||
lang_model=language_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lang_model --> external_scoring_function.
- Please use "language_model" instead of lang_model for clarity.
- Not only LM, but also other custom scoring function are also allowed. Please rename it to make this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vocabulary, | ||
max_time_steps=None, | ||
lang_model=language_model, | ||
alpha=1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lang_model --> external_scoring_function, these parameters should be moved to external_scoring_function creator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
space_id=1, | ||
num_results_per_sample=None): | ||
''' | ||
Beam search decoder for CTC-trained network, adapted from Algorithm 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Adapted" means there is a difference? Could you please explain what the difference is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vocab = ['-', '_', 'a'] | ||
|
||
|
||
def ids_list2str(ids_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Line 13 - 20. Please clean codes before commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
deep_speech_2/decoder.py
Outdated
vocabulary, | ||
method, | ||
beam_size=None, | ||
num_results_per_sample=None): | ||
""" | ||
CTC-like sequence decoding from a sequence of likelihood probablilites. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now we have more than one type of decoders. Please add comments to simply explain each one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
import numpy as np | ||
import tensorflow as tf | ||
from tensorflow.python.framework import ops | ||
from tensorflow.python.ops import array_ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not proper to include tensor-flow dependency. It would be better to paste ground-truth results and just compare our results with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,69 @@ | |||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put it in a ./test folder?What is the best practice for a python unit test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the test code. Done
## This is a prototype of ctc beam search decoder | ||
|
||
import copy | ||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used. Remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
deep_speech_2/decoder.py
Outdated
@@ -36,25 +38,164 @@ def ctc_best_path_decode(probs_seq, vocabulary): | |||
return ''.join([vocabulary[index] for index in index_list]) | |||
|
|||
|
|||
def ctc_decode(probs_seq, vocabulary, method): | |||
class Scorer(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should consider the expandability. KenLM is only one of the language model tools and each tool have its special interface. We can define a unify base class, and derivate KenLMScore from the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more language models are involved, the Scorer will be redesigned. Temporarily we use one class to avoid redundancy.
deep_speech_2/decoder.py
Outdated
self._beta = beta | ||
self._language_model = kenlm.LanguageModel(model_path) | ||
|
||
def language_model_score(self, sentence, bos=True, eos=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special tokens should be replaced by KenLM's internal usage format like end token、unknown token etc. Start token should be removed from the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoded prefix in ctc decoder doesn't contain any special tokens. So the reprocessing is simplified.
deep_speech_2/decoder.py
Outdated
return ctc_best_path_decode(probs_seq, vocabulary) | ||
else: | ||
raise ValueError("Decoding method [%s] is not supported.") | ||
max_time_steps = len(probs_seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replace max_time_steps to other name (like time_step_num) ? Feel confused somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
## initialize | ||
# the set containing selected prefixes | ||
prefix_set_prev = {'-1': 1.0} | ||
probs_b, probs_nb = {'-1': 1.0}, {'-1': 0.0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rename probs_b and probs_nb to probs_b_prev and probs_nb_prev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Used grid search to find out optimal parameters alpha=0.26, beta=0.1, decreasing WER to ~0.17 |
5c4751e
to
3d292d0
Compare
Passed CI. With a rebuilt more powerful language model, the WER has been decreased to 13%. #115 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
deep_speech_2/decoder.py
Outdated
cutoff_prob=1.0, | ||
ext_scoring_func=None, | ||
nproc=False): | ||
'''Beam search decoder for CTC-trained network, using beam search with width |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use """
instead of '''
for consistency.
Please also check other places for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
from itertools import groupby | ||
import numpy as np | ||
import multiprocessing | ||
|
||
|
||
def ctc_best_path_decode(probs_seq, vocabulary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctc_best_path_decode
--> ctc_best_path_decoder
. Please also modify the function comments' decoding
to decoder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
ext_scoring_func=None, | ||
nproc=False): | ||
'''Beam search decoder for CTC-trained network, using beam search with width | ||
beam_size to find many paths to one label, return beam_size labels in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, using beam search with width find many paths to one label, return beam_size labels in the descending order
--> ". It utilizes beam search to approximately select top best decoding paths and returning results in the descending order`
原句不是一个完整的句子,尤其注意标点的使用。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
'''Beam search decoder for CTC-trained network, using beam search with width | ||
beam_size to find many paths to one label, return beam_size labels in | ||
the descending order of probabilities. The implementation is based on Prefix | ||
Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beam Search(
--> Beam Search (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
beam_size to find many paths to one label, return beam_size labels in | ||
the descending order of probabilities. The implementation is based on Prefix | ||
Beam Search(https://arxiv.org/abs/1408.2873), and the unclear part is | ||
redesigned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the redesigned and why? Could you please add detailed explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
return np.power(10, log_cond_prob) | ||
|
||
# word insertion term | ||
def word_count(self, sentence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not expose word_count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
self._language_model = kenlm.LanguageModel(model_path) | ||
|
||
# n-gram language model scoring | ||
def language_model_score(self, sentence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to expose this score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
return len(words) | ||
|
||
# execute evaluation | ||
def __call__(self, sentence, log=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to get_score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved because by using __call__
the scorer can be called by scorer_name(prefix)
and compatible with a plain function func_name(prefix)
.
deep_speech_2/scorer.py
Outdated
:param alpha: Parameter associated with language model. | ||
:type alpha: float | ||
:param beta: Parameter associated with word count. | ||
:type beta: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain when word count is not used? e.g. "If beta = xxxx ...."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/tune.py
Outdated
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import paddle.v2 as paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Reorder the imports.
- Please modify all below according to the suggestions in infer.py and evaluate.py.
- Add descriptions to README.md for usage of tune.py and evaluate.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refined. Please review again.
deep_speech_2/decoder.py
Outdated
blank_id=0, | ||
cutoff_prob=1.0, | ||
ext_scoring_func=None, | ||
nproc=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved temporarily before fixing the problem about how to pass ext_scoring_fuc
to the multi processes.
deep_speech_2/evaluate.py
Outdated
from model import deep_speech2 | ||
from decoder import * | ||
from scorer import Scorer | ||
from error_rate import wer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/evaluate.py
Outdated
help="Manifest path for normalizer. (default: %(default)s)") | ||
parser.add_argument( | ||
"--decode_manifest_path", | ||
default='data/manifest.libri.test-clean', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
:param alpha: Parameter associated with language model. | ||
:type alpha: float | ||
:param beta: Parameter associated with word count. | ||
:type beta: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/decoder.py
Outdated
from itertools import groupby | ||
import numpy as np | ||
import multiprocessing | ||
|
||
|
||
def ctc_best_path_decode(probs_seq, vocabulary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
|
||
|
||
class Scorer(object): | ||
"""External defined scorer to evaluate a sentence in beam search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
self._language_model = kenlm.LanguageModel(model_path) | ||
|
||
# n-gram language model scoring | ||
def language_model_score(self, sentence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
return np.power(10, log_cond_prob) | ||
|
||
# word insertion term | ||
def word_count(self, sentence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
deep_speech_2/scorer.py
Outdated
return len(words) | ||
|
||
# execute evaluation | ||
def __call__(self, sentence, log=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserved because by using __call__
the scorer can be called by scorer_name(prefix)
and compatible with a plain function func_name(prefix)
.
deep_speech_2/tune.py
Outdated
from __future__ import division | ||
from __future__ import print_function | ||
|
||
import paddle.v2 as paddle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM.
of probabilities, the assignment operation is changed to accumulation for | ||
one prefix may comes from different paths; 2) the if condition "if l^+ not | ||
in A_prev then" after probabilities' computation is deprecated for it is | ||
hard to understand and seems unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that these modifications are correct?
deep_speech_2/decoder.py
Outdated
blank_id=0, | ||
cutoff_prob=1.0, | ||
ext_scoring_func=None, | ||
nproc=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix it now ?
deep_speech_2/decoder.py
Outdated
'\t': 1.0 | ||
}, { | ||
'\t': 0.0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use so many lines. Maybe you can revert it back with only two lines.
deep_speech_2/decoder.py
Outdated
beam_size, | ||
vocabulary, | ||
blank_id=0, | ||
blank_id, | ||
num_processes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set it to 'multiprocessing.cpu_count()' as default value?
deep_speech_2/evaluate.py
Outdated
help="Manifest path for normalizer. (default: %(default)s)") | ||
parser.add_argument( | ||
"--decode_manifest_path", | ||
default='data/manifest.libri.test-clean', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still 'data/manifest.libri.test-clean' ?
deep_speech_2/infer.py
Outdated
type=str, | ||
help="Manifest path for decoding. (default: %(default)s)") | ||
parser.add_argument( | ||
"--model_filepath", | ||
default='checkpoints/params.latest.tar.gz', | ||
default='checkpoints/params.tar.gz.41', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use latest as default.
deep_speech_2/lm/lm_scorer.py
Outdated
|
||
:param alpha: Parameter associated with language model. | ||
:param alpha: Parameter associated with language model. Don't use | ||
language model when alpha = 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--》 Language-model scorer is disabled when alpha=0.
deep_speech_2/lm/lm_scorer.py
Outdated
:type alpha: float | ||
:param beta: Parameter associated with word count. | ||
:param beta: Parameter associated with word count. Don't use word | ||
count when beta = 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word-count scorer is disabled when beta = 0.
deep_speech_2/lm/lm_scorer.py
Outdated
lm = self.language_model_score(sentence) | ||
word_cnt = self.word_count(sentence) | ||
lm = self._language_model_score(sentence) | ||
word_cnt = self._word_count(sentence) | ||
if log == False: | ||
score = np.power(lm, self._alpha) \ | ||
* np.power(word_cnt, self._beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible tp put L60 and L61 into a single line within 80 columns?
@@ -0,0 +1,3 @@ | |||
echo "Downloading language model." | |||
|
|||
wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace it with a real url?
resolve PaddlePaddle/Paddle#2230
In progress. Add pseudo code and test information later.