-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CTC align op #7527
Add CTC align op #7527
Conversation
auto stream = ctx.cuda_device_context().stream(); | ||
ArgmaxCudaKernel<T, PADDLE_CUDA_NUM_THREADS><<< | ||
num_tokens, PADDLE_CUDA_NUM_THREADS, 0, stream>>>(seq_width, logits, | ||
tokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个Kernel是在计算top 1吗?如果是可以调用top_k_op的实现吧~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will remove argmax
content from both CPU
kernel and GPU
kernel.
Please create an issue and add it to https://github.com/PaddlePaddle/Paddle/projects/39 |
AddInput("Input", | ||
"(LodTensor, default: LoDTensor<float>), the unscaled " | ||
"probabilities of variable-length sequences, which is a 2-D " | ||
"Tensor with LoD information. It's shape is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's -> Its
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
result = [] | ||
for token in np.argmax(softmax, axis=1): | ||
if (token != blank) and not (merge_repeated and token == prev_token): | ||
result.append(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be one line prev_token = token
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Fixed.
def test_check_output(self): | ||
self.check_output() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add another test case for merge_repeated = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Fixed.
CTCGreedyDecodeOpMaker(OpProto* proto, OpAttrChecker* op_checker) | ||
: OpProtoAndCheckerMaker(proto, op_checker) { | ||
AddInput("Input", | ||
"(LodTensor, default: LoDTensor<float>), the unscaled " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unscaled -> unnormalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"merge repeated elements between two blanks. ") | ||
.SetDefault(true); | ||
AddComment(R"DOC( | ||
CTCGreedyDecoder is an implementation of the simple best path decoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more detailed document here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Done.
1. Remove 'top 1'(or argmax) from CPU and GPU kernel 2. Add a new test case 3. Refine doc
Please keep the name |
paddle/operators/ctc_decode_op.cu
Outdated
auto stream = ctx.cuda_device_context().stream(); | ||
MergeAndDelCudaKernel<T><<<1, 1, 0, stream>>>( | ||
num_tokens, tokens, num_seq, input_lod[level].data(), blank, | ||
merge_repeated, dev_out_lod0_ptr, output_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CUDA kernel is less efficient here. We can profile the speed when training the model. Then determine whether to delete the GPU kernel in this op and editing distance op.
1. Allocate memory for output before compute. 2. Rename 'ctc_decode' to 'ctc_align'
a1cdeb0
to
6089b50
Compare
Have removed debug code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.