Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTC align op #7527

Merged
merged 10 commits into from
Jan 19, 2018
Merged

Conversation

wanghaoshuang
Copy link
Contributor

No description provided.

@wanghaoshuang wanghaoshuang changed the title Add ctc_greedy_decode_op Add CTC greedy decode op Jan 15, 2018
auto stream = ctx.cuda_device_context().stream();
ArgmaxCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<<
num_tokens, PADDLE_CUDA_NUM_THREADS, 0, stream>>>(seq_width, logits,
tokens);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个Kernel是在计算top 1吗?如果是可以调用top_k_op的实现吧~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will remove argmax content from both CPU kernel and GPU kernel.

@qingqing01
Copy link
Contributor

Please create an issue and add it to https://github.com/PaddlePaddle/Paddle/projects/39

AddInput("Input",
"(LodTensor, default: LoDTensor<float>), the unscaled "
"probabilities of variable-length sequences, which is a 2-D "
"Tensor with LoD information. It's shape is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's -> Its

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

result = []
for token in np.argmax(softmax, axis=1):
if (token != blank) and not (merge_repeated and token == prev_token):
result.append(token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be one line prev_token = token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Fixed.

def test_check_output(self):
self.check_output()


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another test case for merge_repeated = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Fixed.

CTCGreedyDecodeOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input",
"(LodTensor, default: LoDTensor<float>), the unscaled "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unscaled -> unnormalized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"merge repeated elements between two blanks. ")
.SetDefault(true);
AddComment(R"DOC(
CTCGreedyDecoder is an implementation of the simple best path decoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more detailed document here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. Done.

1. Remove 'top 1'(or argmax) from CPU and GPU kernel
2. Add a new test case
3. Refine doc
@wanghaoshuang wanghaoshuang changed the title Add CTC greedy decode op Add CTC decode op Jan 16, 2018
@kuke
Copy link
Contributor

kuke commented Jan 17, 2018

Please keep the name CTCGreedyDecode, we would add beam search decoding soon.

auto stream = ctx.cuda_device_context().stream();
MergeAndDelCudaKernel<T><<<1, 1, 0, stream>>>(
num_tokens, tokens, num_seq, input_lod[level].data(), blank,
merge_repeated, dev_out_lod0_ptr, output_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA kernel is less efficient here. We can profile the speed when training the model. Then determine whether to delete the GPU kernel in this op and editing distance op.

1. Allocate memory for output before compute.
2. Rename 'ctc_decode' to 'ctc_align'
@wanghaoshuang wanghaoshuang changed the title Add CTC decode op Add CTC align op Jan 18, 2018
@wanghaoshuang
Copy link
Contributor Author

Have removed debug code.

Copy link
Contributor

@kuke kuke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghaoshuang wanghaoshuang merged commit 47753a9 into PaddlePaddle:develop Jan 19, 2018
@wanghaoshuang wanghaoshuang deleted the ctc_greedy_decode branch January 19, 2018 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants