-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 6th No.1】Add AdaptiveLogSoftmaxWithLoss API to Paddle -part #63302
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@luotao1 @GGBond8488 |
from paddle.nn import functional as F | ||
|
||
|
||
class TestNNAdaptiveLogSoftmaxWithLossAPI(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测试建议添加一个AdaptiveLogSoftmax的应用场景,先用AdaptiveLogSoftmax组一个小但是完整的网络,包括优化器的那种,测试AdaptiveLogSoftmax的输出以及自身的权重是否有更新
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没问题,请问还有其他需要添加的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经重新加上去了,请review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我理解一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确
按照这个说法的话,结果是没有变化的啊,只是AdaptiveLogSoftmaxwithLoss的输入先经过其他模型,但是结果是不变的。我这一块很疑惑。
the index `n_classes - 1`. To compute log-probabilities for all classes, the ``log_prob`` method can be used. | ||
""" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不传weight的话,怎么主动指定初始化呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里已经修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改了话,记得rfc设计文档里面也同步一下,保持两边的参数一致
from paddle.nn import functional as F | ||
|
||
|
||
class TestNNAdaptiveLogSoftmaxWithLossAPI(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确
@luotao1 inference这个ci好像需要什么人同意?是这个意思吗 |
需要approve的,等本题研发 review 通过后,最后统一处理 |
主要是那个ci的意思是要approve吗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体没有什么问题了,细节再修一修就ok
the index `n_classes - 1`. To compute log-probabilities for all classes, the ``log_prob`` method can be used. | ||
""" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里修改了话,记得rfc设计文档里面也同步一下,保持两边的参数一致
y = paddle.to_tensor([0, 5, 10]) | ||
model(x, y) | ||
|
||
def test_forwadr(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个拼写错误改下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,我明天就把修改的推上来
for before, after in zip( | ||
tail_weights_before_training, tail_weights_after_training | ||
): | ||
assert not np.any(before != after) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的好的,我晚上改一下,验证梯度,我也想想
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问一下,其他的test中会不会有梯度检验的例子
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rfcs已经修改 PaddlePaddle/community#885
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景
已经改好了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景
已经改好了
已经弄完了
这报的错全不是我写的那部分 |
是环境出问题了吧 @luotao1 |
PR-CI-Codestyle-Check 是提交的代码有问题。
后续可以rerun CI,或merge develop重新触发 |
paddle不是支持bool数据和float32数据相乘吗 |
已经好了 @jeff41404 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
python/paddle/nn/functional/loss.py
Outdated
): | ||
r"""Compute adaptive logsoftmax result and negative log likelihood between ``input`` and ``label``. | ||
Parameter ``head``, ``tail_weights``, ``cutoffs`` are inner members of AdaptiveLogSoftmaxWithLoss | ||
Please refer to :ref:`_cn_api_paddle_nn_AdaptiveLogSoftmaxWithLoss`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to :ref:`_cn_api_paddle_nn_AdaptiveLogSoftmaxWithLoss`. | |
Please refer to :ref:`api_paddle_nn_AdaptiveLogSoftmaxWithLoss`. |
python/paddle/nn/functional/loss.py
Outdated
output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N] | ||
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N] | |
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label. | |
- output (Tensor). The tensor sotring adaptive logsoftmax result, the shape of output is [N] | |
- loss (Tensor). The tensor variable storing the adaptive_log_softmax_loss of input and label. |
python/paddle/nn/functional/loss.py
Outdated
output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N] | ||
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label. | ||
|
||
Examples:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples:: | |
Examples: |
python/paddle/nn/functional/loss.py
Outdated
Args: | ||
input (Tensor): Input tensor, the data type should be float32 or float64. | ||
label (Tensor): Label tensor, the data type should be float32 or float64. | ||
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1. | |
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be ``[input.shape[1], shortlist_size + n_clusters]``, where ``shortlist_size is`` the first element in the cutoffs list, and ``n_clusters`` is the length of the cutoffs list minus 1. |
尽量在官网展示的美观一点吧,都揉在一起了
python/paddle/nn/functional/loss.py
Outdated
input (Tensor): Input tensor, the data type should be float32 or float64. | ||
label (Tensor): Label tensor, the data type should be float32 or float64. | ||
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1. | ||
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are [input.shape[1], hsz] and [hsz, osz], where hsz is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and osz is the (i + 1) The difference between the cutoff and the ith cutoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are [input.shape[1], hsz] and [hsz, osz], where hsz is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and osz is the (i + 1) The difference between the cutoff and the ith cutoff. | |
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are ``[input.shape[1], hsz]`` and ``[hsz, osz]``, where ``hsz`` is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and ``osz`` is the (i + 1) The difference between the cutoff and the ith cutoff. |
python/paddle/nn/layer/loss.py
Outdated
class AdaptiveLogSoftmaxWithLoss(Layer): | ||
r"""Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when | ||
the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency | ||
distribution approximately follows the ``Zipf's law``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
既然是参考 pytorch 的文档,就抄全吧, Zipf's law 附上链接
distribution approximately follows the ``Zipf's law``. | |
distribution approximately follows the `Zipf's law <https://en.wikipedia.org/wiki/Zipf%27s_law>`_ . |
python/paddle/nn/layer/loss.py
Outdated
weight_attr (ParamAttr, optional): The attribute for the learnable | ||
weight of this layer. The default value is None. If the Initializer of the | ||
param_attr is not set, the parameter is initialized with Xavier. | ||
For detailed information, please refer to paddle.ParamAttr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For detailed information, please refer to paddle.ParamAttr. | |
For detailed information, please refer to :ref:`api_paddle_ParamAttr` |
python/paddle/nn/layer/loss.py
Outdated
of this layer. If it is set to False, no bias will be added to the output. | ||
If it is set to None or one kind of ParamAttr, a bias parameter will | ||
be created according to ParamAttr. For detailed information, please refer | ||
to paddle.ParamAttr. The default value is None and the bias will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to paddle.ParamAttr. The default value is None and the bias will be | |
to :ref:`api_paddle_ParamAttr`. The default value is None and the bias will be |
python/paddle/nn/layer/loss.py
Outdated
- input (Tensor): The input tensor. The shapes is [N, in_features]. N is batch size. | ||
- label (Tensor): target. The shapes is `[N]` | ||
- output1 (Tensor): The shape is `[N]` | ||
- output2 (Scalar): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- output2 (Scalar): | |
- output2 (Scalar). |
python/paddle/nn/layer/loss.py
Outdated
Returns: | ||
A callable object of AdaptiveLogSoftmaxWithLoss. | ||
|
||
Examples:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples:: | |
Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已經全部修改了
pr已经实现,能不能先不关闭? @luotao1 |
关闭什么?这个PR没有关闭呀 |
好的 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PaddlePaddle#63302) * Add AdaptiveLogSoftmaxWithLoss API * update codestyle * update loss * test * update test * add weight_attr * update forward * update forward * update * update * update * update test_gard * update * update information * update * update * codestyle * update * update * update * update
PaddlePaddle#63302) * Add AdaptiveLogSoftmaxWithLoss API * update codestyle * update loss * test * update test * add weight_attr * update forward * update forward * update * update * update * update test_gard * update * update information * update * update * codestyle * update * update * update * update
PR Category
Others
PR Types
New features
Description
添加AdaptiveLogSoftmaxWithLoss API
Link
Rfc PR: PaddlePaddle/community#856