-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add viterbi decode #35778
Add viterbi decode #35778
Conversation
✅ This PR's description meets the template requirements! |
Thanks for your contribution! |
2dadd39
to
f137f6b
Compare
bf5a586
to
0a8b21d
Compare
0a8b21d
to
dcbc972
Compare
f031df6
to
6ddc7d4
Compare
54217f5
to
9525039
Compare
void Make() override { | ||
AddInput( | ||
"Input", | ||
"The unary emission tensor. The shape of Input MUST be ( batch_size," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST->must
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
"The unary emission tensor. The shape of Input MUST be ( batch_size," | ||
"sequence_length, num_tags). "); | ||
AddInput("Transition", | ||
"The transition matrix. The shape of Transition MUST be ( " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,都改一下吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
REGISTER_OP_CUDA_KERNEL( | ||
viterbi_decode, | ||
ops::ViterbiDecodeKernel<platform::CUDADeviceContext, float>, | ||
ops::ViterbiDecodeKernel<platform::CUDADeviceContext, double>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块要调研是否能支持fp16,为后面的优化做点准备,如果组合API不支持,可以先不支持
PADDLE_ENFORCE_EQ( | ||
in_dims[2], transition_dims[0], | ||
platform::errors::InvalidArgument( | ||
"The number of tags of Input and Transition should be equal.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是提示一下错误的信息,例如现在tags是多少,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/text/ops.py
Outdated
tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64. | ||
transition_params (Tensor): The input tensor of transition matrix. This is a 2-D | ||
tensor with shape of [num_tags, num_tags]. The data type is float32 or float64. | ||
sequence_length (Tensor): The input tensor of real length of each sequence. This is a 1-D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input tensor of real length -> The input tensor of length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/text/ops.py
Outdated
and the data type is float32 or float64. | ||
paths(Tensor): The output tensor containing the highest scoring tag indices. The shape is [batch_size, sequence_length] | ||
and the data type is int64. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的nn.layer的API是不是不符合规范,在forward里面没有相关文档,同时类的文档中应该是不需要Returns相关的信息
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可参考paddle.nn.LSTM,把所有相关文档写在了__init__函数之上。所以应该是符合规范的
// create int tensor buffer | ||
int buffer_size = batch_size * seq_len + batch_size * n_labels * seq_len + | ||
9 * batch_size + 10; | ||
LoDTensor int_buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的buffer_size的计算方式,以及一些超参是不是要要注释一下? & 同时解释一下使用buffer的原因吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已解释
int_buffer.mutable_data<int64_t>(ctx.GetPlace()); | ||
TensorBuffer int_tensor_buffer(int_buffer); | ||
// create float tensor buffer | ||
buffer_size = seq_len * batch_size * n_labels + 5 * batch_size * n_labels + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已解释超参含义
1, input.numel(), 1, input.data<int64_t>(), nullptr, | ||
out_data.data<int64_t>()); | ||
Tensor max_value_tensor; | ||
framework::TensorCopy(out_data, platform::CPUPlace(), &max_value_tensor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
想确认一下 TensorCopy之前不需要对max_value_tensor分配内存吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要了,TensorCopy会调用mutable_data分配显存
Tensor out_data; | ||
out_data.Resize(framework::make_ddim({1})); | ||
out_data.mutable_data<T>(platform::CUDAPlace()); | ||
ArgmaxCUDAKernel<T, T, 32><<<1, 32, 0, dev_ctx.stream()>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的grid和block数为啥1和32了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已使用ComputeBlockSize设置
const T* in_data = input.data<T>(); | ||
IndType* out_idx_data = out_idx->data<IndType>(); | ||
T* out_data = out->data<T>(); | ||
CUDA_ARGMAX(128); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的block_dim为啥是128,不是应该当前的设备的情况设定block_dim吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已使用ComputeBlockSize设置
} | ||
SubInt(dev_ctx, left_length, one, &left_length); | ||
Argmax<DeviceContext, T, int64_t> argmax; | ||
for (int64_t i = 1; i < max_seq_len; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果最大的max_seq_len = 1,这种情况是否考虑了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto alpha_argmax_temp = alpha_argmax_unbind[i - 1]; | ||
alpha_argmax_temp.Resize({batch_size, n_labels}); | ||
argmax(ctx, alpha_trn_sum, &alpha_argmax_temp, &alpha_max, 1); | ||
historys.push_back(alpha_argmax_temp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里尝试用一下emplace_back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已替换
&batch_path[actual_len - last_ids_index]); | ||
ARange<DeviceContext> arange; | ||
arange(dev_ctx, batch_offset.data<int64_t>(), batch_size, n_labels); | ||
Gather<DeviceContext, int64_t, int64_t> gather; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的逻辑相对比较复杂,可以将PaddleNLP实现的python版本的链接复制过来,后续的同学能看得懂这块的代码逻辑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
struct ARange<platform::CUDADeviceContext> { | ||
void operator()(const platform::CUDADeviceContext& dev_ctx, int64_t* data, | ||
int end, int64_t scale) { | ||
ARangeKernel<<<1, 128, 0, dev_ctx.stream()>>>(data, end, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如上,这里设置成128看起来不太正常
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已使用ComputeBlockSize设置
python/paddle/text/ops.py
Outdated
Shape: | ||
potentials (Tensor): The input tensor of unary emission. This is a 3-D | ||
tensor with shape of [batch_size, sequence_length, num_tags]. The data type is float32 or float64. | ||
length (Tensor): The input tensor of real length of each sequence. This is a 1-D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
real去掉吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已去掉
python/paddle/text/ops.py
Outdated
the last row and the last column of transitions will be considered as start tag, the the penultimate row and | ||
the penultimate column of transitions will be considered as stop tag. Else, all the rows and columns will be | ||
considered as the real tag. Defaults to ``True``. | ||
name (str|None) – A name for this layer(optional). If set None, the layer will be named automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name(str|None) -> name(str, optional) , default value is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/text/ops.py
Outdated
return scores, path | ||
|
||
|
||
class ViterbiDecoder(Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ViterbiDecoder和crf_decode,看起来调用的是同一个函数,是不是可以用统一的名字?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已将crf_decode改为viterbi_decode
python/paddle/text/ops.py
Outdated
def crf_decode(potentials, | ||
transition_params, | ||
lengths, | ||
include_start_end_tag=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参数名称一般start对应stop, begin对应end,
如果是用来表示范围时推荐用[start, stop)跟python和numpy的命名一致,
如果是用来表示句子的开始和结束符号,习惯上一般用 begin of sentence 和 end of sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include_start_end_tag已改为include_bos_eos_tag
python/paddle/text/viterbi_decode.py
Outdated
lengths (Tensor): The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64. | ||
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered | ||
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``. | ||
name(str, optional): Default value is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name参数的说明需要完整。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改为
name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please
refer to :ref:`api_guide_Name`.
python/paddle/text/viterbi_decode.py
Outdated
tensor with shape of [num_tags, num_tags]. The data type is float32 or float64. | ||
lengths (Tensor): The input tensor of length of each sequence. This is a 1-D tensor with shape of [batch_size]. The data type is int64. | ||
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered | ||
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
penultimate
这个词比较少见呀。
是不是用second to last
更好。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改second to last
python/paddle/text/viterbi_decode.py
Outdated
Example: | ||
.. code-block:: python | ||
|
||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy在示例代码中没用到吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已移除numpy
python/paddle/text/viterbi_decode.py
Outdated
transitions (`Tensor`): The transition matrix. Its dtype is float32 and has a shape of `[num_tags, num_tags]`. | ||
include_bos_eos_tag (`bool`, optional): If set to True, the last row and the last column of transitions will be considered | ||
as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Defaults to ``True``. | ||
name(str, optional): Default value is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改为
name (str, optional): The default value is None. Normally there is no need for user to set this property. For more information, please
refer to :ref:`api_guide_Name`.
python/paddle/text/viterbi_decode.py
Outdated
Example: | ||
.. code-block:: python | ||
|
||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已移除numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
* add viterbi decode cpu kernel * add viterbi decoder api in paddle.text * add a data buffer once to avoid create many small pieces of data buffer frequently * fix viterbi max_seq_length bug * fix seq_len=1 bug * fix device context * move split out of for loop * remove INVERSE_SUB * remove 2 GET_CAST_MASK * remove 1 loop * remove Functor * add to_static deploy code * use MAX_FUNC instead of ELE_MAX * add MaxFunctor * impl max_func * remove MaxFunctor * remove cast op * use REGISTER_OP_WITHOUT_GRADIENT * add viterbi cuda kernel * add FIX_BLOCKDIM_CASE macro * add MKL add, mul; add get data mask * add arange mkl impl * add CPU Argmax * add cpu gather * use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL * use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP * use SAME_DIMS_ELEMENT_BINARY_OP * add SimpleBroadcastBinaryOP * use int instead of int64_t to accelerate * optimize SimpleBroadcastBinaryOP * optimize SimpleBroadcastBinaryOP * optimize performance in both single thread and multithread situation * remove useless line * remove useless code * add CREATE_TENSOR_BUFFER macro * add INIT_REQUIRED_TENSOR macro * add comment * fix windows ci * add viterbi unittest * remove cuda add functor * remove cuda equal * remove a template function * fix windows ci * fix windows dtype * remove some template instance * remove useless header file * remove some blockdim * remove transpose impl * accelerate cpu performance on single thread situation * viterbi_decode->crf_decode * rename crf params name * add viterbi api test * remove useless import * add enable_static * use viterbi decoder * fix viterbi len=1 * fix viterbi unittest * remove useless comments * reconstruct viterbi decode * remove ADD,SUB,MUL structure * fix coverage * remove CREATE_TENSOR * add name args * crf.py->ops.py; with_start_stop_tag->include_start_end_tag * update crf_decode en docs * fix viterbi decode en docs * fix some review comments * add FIXED_BLOCK_DIM_CASE in cuda * push_back->emplace_back * crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag * paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode * fix viterbi_decode en docs
请教一下,这里返回的score是(batch_size, seq_len, num_tags)的shape吗? |
不是,返回的是(batch_size)的shape,表示每一个样本最后一步的最高得分 |
PR types
New features
PR changes
OPs
Describe
Add viterbi decode op kernel and api.
API description
Example