-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python wrapper for CTC greedy decoder and edit distance evaluator #7655
Add python wrapper for CTC greedy decoder and edit distance evaluator #7655
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to divide this operator into two, ctc_greedy_decoder and edit_distance_evaluator. Please always remember that there is another decoding method which is beam search. Users may want to output the decoding result directly, or they want to use beam search decoding and then evaluate the result. It would be unfeasible if wrapping greedy decoding and error evaluating in one operator.
@kuke Thanks for your reminder. |
2. Add edit distance evaluator to evaluator.py
… ctc_evaluator_py
… ctc_evaluator_py
python/paddle/v2/fluid/evaluator.py
Outdated
dtype='float32', shape=[1], suffix='total') | ||
error = layers.edit_distance(input=input, label=label) | ||
error = layers.cast(x=error, dtype='float32') | ||
mean_error = layers.mean(x=error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make consistent with Accuracy
evaluator, do not calculate the average mean error of current mini-batch. Just accumulate the batch size and mini-batch error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
python/paddle/v2/fluid/evaluator.py
Outdated
|
||
class EditDistance(Evaluator): | ||
""" | ||
Average edit distance error for multiple mini-batches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more comments, How to usage, and what the returned value by eval
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thk. Done.
python/paddle/v2/fluid/layers/nn.py
Outdated
tokens(list): Tokens that should be removed before calculating edit distance. | ||
|
||
Returns: | ||
Variable: sequence-to-sequence edit distance loss in shape [batch_size, 1]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
python/paddle/v2/fluid/layers/nn.py
Outdated
@@ -1863,6 +1864,140 @@ def matmul(x, y, transpose_x=False, transpose_y=False, name=None): | |||
return out | |||
|
|||
|
|||
def edit_distance(input, label, normalized=False, tokens=None, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens -> ignored_tokens ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Fixed.
python/paddle/v2/fluid/layers/nn.py
Outdated
|
||
normalized(bool): Indicated whether to normalize the edit distance by the length of reference string. | ||
|
||
tokens(list): Tokens that should be removed before calculating edit distance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens(list) -> tokens(list of int)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
2. Fix evaluator using 'reduce_sum' op instead of 'mean' op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Approved. Please to add the unit test in next PR.
fix #7596