Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python wrapper for CTC greedy decoder and edit distance evaluator #7655

Merged
merged 13 commits into from
Jan 22, 2018

Conversation

wanghaoshuang
Copy link
Contributor

fix #7596

Copy link
Contributor

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to divide this operator into two, ctc_greedy_decoder and edit_distance_evaluator. Please always remember that there is another decoding method which is beam search. Users may want to output the decoding result directly, or they want to use beam search decoding and then evaluate the result. It would be unfeasible if wrapping greedy decoding and error evaluating in one operator.

@wanghaoshuang
Copy link
Contributor Author

@kuke Thanks for your reminder.

@wanghaoshuang wanghaoshuang changed the title Add python wrapper for CTC evaluator Add python wrapper for CTC greedy decoder and edit distance evaluator Jan 19, 2018
dtype='float32', shape=[1], suffix='total')
error = layers.edit_distance(input=input, label=label)
error = layers.cast(x=error, dtype='float32')
mean_error = layers.mean(x=error)
Copy link
Contributor

@qingqing01 qingqing01 Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to make consistent with Accuracy evaluator, do not calculate the average mean error of current mini-batch. Just accumulate the batch size and mini-batch error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


class EditDistance(Evaluator):
"""
Average edit distance error for multiple mini-batches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more comments, How to usage, and what the returned value by eval means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thk. Done.

tokens(list): Tokens that should be removed before calculating edit distance.

Returns:
Variable: sequence-to-sequence edit distance loss in shape [batch_size, 1].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove loss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

@@ -1863,6 +1864,140 @@ def matmul(x, y, transpose_x=False, transpose_y=False, name=None):
return out


def edit_distance(input, label, normalized=False, tokens=None, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens -> ignored_tokens ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Fixed.


normalized(bool): Indicated whether to normalize the edit distance by the length of reference string.

tokens(list): Tokens that should be removed before calculating edit distance.
Copy link
Contributor

@qingqing01 qingqing01 Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens(list) -> tokens(list of int)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Approved. Please to add the unit test in next PR.

@wanghaoshuang wanghaoshuang merged commit 44561a2 into PaddlePaddle:develop Jan 22, 2018
@wanghaoshuang wanghaoshuang deleted the ctc_evaluator_py branch January 22, 2018 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add greedy CTC evaluator python API
3 participants