Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 6th No.1】Add AdaptiveLogSoftmaxWithLoss API to Paddle -part #63302

Merged
merged 30 commits into from
May 23, 2024
Merged

Conversation

ADream-ki
Copy link
Contributor

PR Category

Others

PR Types

New features

Description

添加AdaptiveLogSoftmaxWithLoss API

Link

Rfc PR: PaddlePaddle/community#856

Copy link

paddle-bot bot commented Apr 8, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Apr 8, 2024
@ADream-ki
Copy link
Contributor Author

@luotao1 @GGBond8488
image
这个y也是要approve?两个问题是一样的

from paddle.nn import functional as F


class TestNNAdaptiveLogSoftmaxWithLossAPI(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试建议添加一个AdaptiveLogSoftmax的应用场景,先用AdaptiveLogSoftmax组一个小但是完整的网络,包括优化器的那种,测试AdaptiveLogSoftmax的输出以及自身的权重是否有更新

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没问题,请问还有其他需要添加的吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经重新加上去了,请review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确

按照这个说法的话,结果是没有变化的啊,只是AdaptiveLogSoftmaxwithLoss的输入先经过其他模型,但是结果是不变的。我这一块很疑惑。

the index `n_classes - 1`. To compute log-probabilities for all classes, the ``log_prob`` method can be used.
"""

def __init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不传weight的话,怎么主动指定初始化呢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里已经修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改了话,记得rfc设计文档里面也同步一下,保持两边的参数一致

from paddle.nn import functional as F


class TestNNAdaptiveLogSoftmaxWithLossAPI(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我希望的是先写一个比较完备的model组网代码,前向用到了AdaptiveLogSoftmax,然后实际运行这个model,再测试其中的数据是否正确

@luotao1
Copy link
Contributor

luotao1 commented Apr 24, 2024

image 单测行覆盖率需要到90%

@luotao1 luotao1 added the API label Apr 24, 2024
@ADream-ki
Copy link
Contributor Author

image 单测行覆盖率需要到90%

好的

@ADream-ki
Copy link
Contributor Author

image 单测行覆盖率需要到90%

好的

我的不是满足条件吗

@ADream-ki
Copy link
Contributor Author

@luotao1 inference这个ci好像需要什么人同意?是这个意思吗

@luotao1
Copy link
Contributor

luotao1 commented Apr 28, 2024

需要approve的,等本题研发 review 通过后,最后统一处理

@ADream-ki
Copy link
Contributor Author

需要approve的,等本题研发 review 通过后,最后统一处理

主要是那个ci的意思是要approve吗

Copy link
Contributor

@GGBond8488 GGBond8488 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整体没有什么问题了,细节再修一修就ok

the index `n_classes - 1`. To compute log-probabilities for all classes, the ``log_prob`` method can be used.
"""

def __init__(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里修改了话,记得rfc设计文档里面也同步一下,保持两边的参数一致

y = paddle.to_tensor([0, 5, 10])
model(x, y)

def test_forwadr(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个拼写错误改下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我明天就把修改的推上来

for before, after in zip(
tail_weights_before_training, tail_weights_after_training
):
assert not np.any(before != after)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的好的,我晚上改一下,验证梯度,我也想想

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问一下,其他的test中会不会有梯度检验的例子

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rfcs已经修改 PaddlePaddle/community#885

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景
已经改好了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个验证有点不具备说服力,这里只验证更新了,但是没验证更新是否正确,建议看看能不能加一个验证梯度的的场景
已经改好了

已经弄完了

@ADream-ki
Copy link
Contributor Author

需要过一下CI

这报的错全不是我写的那部分

@ADream-ki
Copy link
Contributor Author

是环境出问题了吧 @luotao1

@luotao1
Copy link
Contributor

luotao1 commented May 11, 2024

PR-CI-Codestyle-Check 是提交的代码有问题。

是环境出问题了吧

后续可以rerun CI,或merge develop重新触发

@ADream-ki
Copy link
Contributor Author

ADream-ki commented May 16, 2024

PR-CI-Codestyle-Check 是提交的代码有问题。

是环境出问题了吧

后续可以rerun CI,或merge develop重新触发

paddle不是支持bool数据和float32数据相乘吗
56a04d4075624815db98f810ab0ffe3
@luotao1

@ADream-ki
Copy link
Contributor Author

已经好了 @jeff41404

jeff41404
jeff41404 previously approved these changes May 20, 2024
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

):
r"""Compute adaptive logsoftmax result and negative log likelihood between ``input`` and ``label``.
Parameter ``head``, ``tail_weights``, ``cutoffs`` are inner members of AdaptiveLogSoftmaxWithLoss
Please refer to :ref:`_cn_api_paddle_nn_AdaptiveLogSoftmaxWithLoss`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please refer to :ref:`_cn_api_paddle_nn_AdaptiveLogSoftmaxWithLoss`.
Please refer to :ref:`api_paddle_nn_AdaptiveLogSoftmaxWithLoss`.

Comment on lines 4307 to 4308
output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N]
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N]
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label.
- output (Tensor). The tensor sotring adaptive logsoftmax result, the shape of output is [N]
- loss (Tensor). The tensor variable storing the adaptive_log_softmax_loss of input and label.

output (Tensor): The tensor sotring adaptive logsoftmax result, the shape of output is [N]
loss (Tensor): The tensor variable storing the adaptive_log_softmax_loss of input and label.

Examples::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Examples::
Examples:

Args:
input (Tensor): Input tensor, the data type should be float32 or float64.
label (Tensor): Label tensor, the data type should be float32 or float64.
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1.
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be ``[input.shape[1], shortlist_size + n_clusters]``, where ``shortlist_size is`` the first element in the cutoffs list, and ``n_clusters`` is the length of the cutoffs list minus 1.

尽量在官网展示的美观一点吧,都揉在一起了

input (Tensor): Input tensor, the data type should be float32 or float64.
label (Tensor): Label tensor, the data type should be float32 or float64.
head_weight (Tensor): weight tensor for linear computation, the data type should be float32 or float64, the shape should be [input.shape[1], shortlist_size + n_clusters], where shortlist_size is the first element in the cutoffs list, and n_clusters is the length of the cutoffs list minus 1.
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are [input.shape[1], hsz] and [hsz, osz], where hsz is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and osz is the (i + 1) The difference between the cutoff and the ith cutoff.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are [input.shape[1], hsz] and [hsz, osz], where hsz is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and osz is the (i + 1) The difference between the cutoff and the ith cutoff.
tail_weights (list[Tensor]): weight tensor list for linear computation, the data type should be float32 or float64. The number of elements in the tail_weights depends on the value of the n_clusters, and each element contains the weights of two linear layers, their dimensions are ``[input.shape[1], hsz]`` and ``[hsz, osz]``, where ``hsz`` is the number of input features in_features divided by div_value to the power (i + 1), where i is the cyclic variable, from 0 to n_clusters - 1, and ``osz`` is the (i + 1) The difference between the cutoff and the ith cutoff.

class AdaptiveLogSoftmaxWithLoss(Layer):
r"""Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when
the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency
distribution approximately follows the ``Zipf's law``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

既然是参考 pytorch 的文档,就抄全吧, Zipf's law 附上链接

Suggested change
distribution approximately follows the ``Zipf's law``.
distribution approximately follows the `Zipf's law <https://en.wikipedia.org/wiki/Zipf%27s_law>`_ .

weight_attr (ParamAttr, optional): The attribute for the learnable
weight of this layer. The default value is None. If the Initializer of the
param_attr is not set, the parameter is initialized with Xavier.
For detailed information, please refer to paddle.ParamAttr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For detailed information, please refer to paddle.ParamAttr.
For detailed information, please refer to :ref:`api_paddle_ParamAttr`

of this layer. If it is set to False, no bias will be added to the output.
If it is set to None or one kind of ParamAttr, a bias parameter will
be created according to ParamAttr. For detailed information, please refer
to paddle.ParamAttr. The default value is None and the bias will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to paddle.ParamAttr. The default value is None and the bias will be
to :ref:`api_paddle_ParamAttr`. The default value is None and the bias will be

- input (Tensor): The input tensor. The shapes is [N, in_features]. N is batch size.
- label (Tensor): target. The shapes is `[N]`
- output1 (Tensor): The shape is `[N]`
- output2 (Scalar):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- output2 (Scalar):
- output2 (Scalar).

Returns:
A callable object of AdaptiveLogSoftmaxWithLoss.

Examples::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Examples::
Examples:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已經全部修改了

@ADream-ki
Copy link
Contributor Author

pr已经实现,能不能先不关闭? @luotao1

@luotao1
Copy link
Contributor

luotao1 commented May 22, 2024

pr已经实现,能不能先不关闭

关闭什么?这个PR没有关闭呀

@ADream-ki
Copy link
Contributor Author

pr已经实现,能不能先不关闭

关闭什么?这个PR没有关闭呀

好的

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 changed the title 【Hackathon 6th No.1】Add AdaptiveLogSoftmaxWithLoss API to Paddle 【Hackathon 6th No.1】Add AdaptiveLogSoftmaxWithLoss API to Paddle -part May 23, 2024
@luotao1 luotao1 merged commit d0e08a8 into PaddlePaddle:develop May 23, 2024
31 checks passed
co63oc pushed a commit to co63oc/Paddle that referenced this pull request May 23, 2024
PaddlePaddle#63302)

* Add AdaptiveLogSoftmaxWithLoss API

* update codestyle

* update loss

* test

* update test

* add weight_attr

* update forward

* update forward

* update

* update

* update

* update test_gard

* update

* update information

* update

* update

* codestyle

* update

* update

* update

* update
chen2016013 pushed a commit to chen2016013/Paddle that referenced this pull request May 26, 2024
PaddlePaddle#63302)

* Add AdaptiveLogSoftmaxWithLoss API

* update codestyle

* update loss

* test

* update test

* add weight_attr

* update forward

* update forward

* update

* update

* update

* update test_gard

* update

* update information

* update

* update

* codestyle

* update

* update

* update

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants