-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.25】为 Paddle 新增 nanquantile 数学计算API #41343
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
PR-CI-Coverage没过说明代码覆盖率不够,这个CI是需要通过的噢~ |
@yang131313 你好,日志上显示超时,请问该如何处理呢。这个api本身需要覆盖的情况是比较多的,可以设置超时时间吗? |
PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。 |
@Asthestarsfalll 可以试试把输入数据变小一点来通过单测 |
python/paddle/tensor/stat.py
Outdated
|
||
import numpy as np | ||
import paddle | ||
x = np.random.randn(2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议示例不要用随机数,用一个固定构造的数组。2*3矩阵里的元素可以简单便于手动计算:
- quantile和nanquantile更容易让读者发现两者的区别
- 对于插值计算也容易理解
python/paddle/tensor/stat.py
Outdated
def nanquantile(x, q, axis=None, keepdim=False): | ||
""" | ||
Compute the quantile of the input as if NaN values in input did not exist. | ||
If all values in a reduced row are NaN then the quantiles for that reduction will be NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all values in a reduced row are NaN
后面加逗号
需要补充全NAN的示例
python/paddle/tensor/stat.py
Outdated
def quantile(x, q, axis=None, keepdim=False): | ||
""" | ||
Compute the quantile of the input along the specified axis. | ||
If any values in a reduced row are NaN then the quantiles for that reduction will be NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any values in a reduced row are NaN
后面加逗号
x = paddle.to_tensor(input_data) | ||
paddle_res = paddle.nanquantile(x, q=0.35, axis=0) | ||
np_res = np.nanquantile(x, q=0.35, axis=0) | ||
self.assertTrue(np.allclose(paddle_res.numpy(), np_res, equal_nan=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
单测和quantile几乎一摸一样,但nanquantile更要侧重对NAN的测试:
- 可以修改一下已有单测,每个Class里增加不同位置NAN的测试。包括一个或者多个NAN
- 缺少全是NAN的测试
因为两份单测非常类似,如果可以的话,看如何更好地进行复用(非强制要求),如
Paddle/python/paddle/fluid/tests/unittests/test_nanmean_api.py
Lines 79 to 87 in 1d43e2d
test_case(self.x) | |
test_case(self.x, []) | |
test_case(self.x, -1) | |
test_case(self.x, keepdim=True) | |
test_case(self.x, 2, keepdim=True) | |
test_case(self.x, [0, 2]) | |
test_case(self.x, (0, 2)) | |
test_case(self.x, [0, 1, 2, 3]) | |
paddle.enable_static() |
Paddle/python/paddle/fluid/tests/unittests/test_max_min_amax_amin_op.py
Lines 105 to 108 in 1d43e2d
_test_static_graph('amax') | |
_test_static_graph('amin') | |
_test_static_graph('max') | |
_test_static_graph('min') |
@luotao1 你好,已修改。 |
x, q=[0.1, 0.2, 0.3], axis=[1, 2], keepdim=True) | ||
np_res = np.quantile( | ||
self.input_data, q=[0.1, 0.2, 0.3], axis=[1, 2], keepdims=True) | ||
self.assertTrue(np.allclose(paddle_res.numpy(), np_res)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
99-144行的单测为什么不用API_list呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
数值正确性在TestQuantileAndNanquantile
中进行测试,我认为TestMuitlpleQ
更多的是想验证当输入多个q时结果是否正常,而quantile和nanquantile在计算方式上仅有些微差别(我相信TestQuantileAndNanquantile能覆盖绝大多数情况),对于多个q的情况,也只是将单次的结果加入列表中,输出时stack起来。另一方面也是防止测试太多超时(还是超时了),因此我认为只使用quantile是足够的(或者改为quantile和nanquantile交替使用)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
超时的话,1)可以试试把测试数据变小, 2)分成两个单测文件 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
python/paddle/tensor/__init__.py
Outdated
@@ -445,6 +446,7 @@ | |||
'numel', | |||
'median', | |||
'quantile', | |||
'nanquantile' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
又少了一个逗号了。。
可以通过看日志发现:https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/5392118/job/14041205
2022-04-14 01:51:31 There are 3 approved errors.
2022-04-14 01:51:31 ****************
2022-04-14 01:51:32 API Difference is:
2022-04-14 01:51:32 - paddle.Tensor.is_complex (ArgSpec(args=['x'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={}), ('document', '9d4dc47b098ce34e65cc23e14ad02281'))
2022-04-14 01:51:32 - paddle.Tensor.quantile (ArgSpec(args=['x', 'q', 'axis', 'keepdim'], varargs=None, varkw=None, defaults=(None, False), kwonlyargs=[], kwonlydefaults=None, annotations={}), ('document', 'd6e25fbeb7751f8e57ad215209f36e00'))
2022-04-14 01:51:32 ? ^ ^^^^^^^ ^^^^^ ^^^^^^^ ^^^^^^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
抱歉,已修改。当时看的时候还以为是__all__里的。。
实在抱歉,还是有些纰漏。之前发现时未及时修改,后面事情太多忘记了.. |
是有什么问题呢?文档还是哪块? |
文档部分 |
文档部分下一个PR修吧,这个PR涉及approve的同学比较多,先合入主体功能 |
收到(。>ㅿ<。) |
PR types
Others
PR changes
APIs
Describe
解决了issue:#40308
增加了paddle.nanquantile。其是paddle.quantile的变体,即沿给定的轴计算非nan元素的分位数。
设计文档:PaddlePaddle/community#55
修改了paddle.quantile的代码,使其与paddle.nanquantile共用绝大部分代码,所有计算逻辑都在函数_compute_quantile中。
修复了quantile不能对含NaN的输入计算正确结果的问题,修复了quantile单测代码中静态图失败的问题。
修复了quantile不能与numpy结果高精度对齐,并且在输出格式上与numpy对齐,对于任意数据类型的输入,结果都是float64