【Hackathon No.15】add RFC for Nanmedian #89

thunder95 · 2022-03-31T06:20:58Z

新增api nanmedian设计文档

paddle-bot-old · 2022-03-31T06:23:23Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

paddle-bot-old · 2022-04-01T02:01:15Z

PR格式检查通过，你的PR将接受Paddle专家以及开源社区的review，请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

jeff41404 · 2022-04-01T03:57:38Z

rfcs/APIs/20220331_api_design_for_nanmedian.md

+## 命名与参数设计
+API设计为`paddle.nanmedian(x, axis=None, keepdim=False, name=None)`
+命名与参数顺序为：形参名`input`->`x`和`dim`->`axis`,  与paddle其他API保持一致性，不影响实际功能使用。
+参数类型中，`axis`支持`int`输入， keepdim支持返回保持原来的形状。


shall we support tuple or list of int in axis like numpy? only support int is simplest case

Thanks for your reply, Sir. This API is indeed supposed to support multiple axis with tuple or list of int. I have updated this design doc.

jeff41404 · 2022-04-01T04:02:21Z

rfcs/APIs/20220331_api_design_for_nanmedian.md

+
+## API实现方案
+主要按下列步骤进行组合实现,实现位置为`paddle/tensor/math.py`与`sum`,`nansum`等方法放在一起：
+1. 使用`paddle.take_along_axis`获取axis上的元素.


if axis is int, according to the description in Chapter 2, we need to transpose. Why use paddle.take_along_axis here? Can you give us a general code similar to that in Chapter 2

@jeff41404 That was my mistake. The transpose op is needed. But I got confused there, how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis? I would appreciate a lot if you can give me more hints. My new idea is to calculate this to be transposed axis and reshape them, which could give me the same result with outputs of numpy, please take a view at my new design doc.

paddle.quantile may gave some hints

@jeff41404 This quantile op do help when I wanna extract data along one or more axises. However, I did not figure out how to sort them yet, neither sort nor argsort op can work it out when I tested with nan values. Whats your idea? is it nessecery to design a specific cpu and cuda kernel to solve this problem?

paddle.quantile may gave some hints

w.r.t your suggestion, I have adopted this methods in paddle.quantile, and updated this api design doc, pls kindly take a view at it. The calculation process has not achieved my expectation, because apis like paddle.sort, paddle.min, paddle.argsort won`t work if nan values were in there. Whats the worse is, I have to transfer memory from tensor (sum_t) to numpy, and use a for-loop further. paddle.take_along_axis also did not help that much when the non-nan value amount differs in axis and also the odd and even cases should be taken into consideration respectively.

the answer "paddle.quantile may gave some hints" is for "how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis?"
but if we want to handle the 'nan', just composed of paddle.sort, paddle.argsort and so on may not work, because these APIs are
not considered to handle the 'nan' in design like paddle.median. So the main logic is recommended to be implemented in C++(suggest researching the implementation of torch.nanmedian in detail) , not only Python

@jeff41404 The api design doc was newly updated. My rough thought is to firstly access to target axis and then calculate the nanmedian row by row in kernel function. does this api need backward gradient also, which I did not find in pytorch source code? if it do need, I will simply broadcast this median value to the original input shape, is that correct?

do need backward gradient like other Paddle APIs.
"I will simply broadcast this median value to the original input shape, is that correct?" I suggest recalling the definition of derivation and gradient, or refer to other similar APIs(max/min/topk...), believe you can find the answer.
by the way, the backward logic of some API in pytorch is generated, you need compile the code and find it in pytorch/torch/csrc/autograd/generated/Functions.cpp

@jeff41404 Hi, Sir. The doc is updated just now for your reference. Compiling the source code of Pytorch was really tough. So I did some research about the ops you mentioned above, and designed the backward logic. Please check it and more suggestions are welcome. Thank you so much for guiding me to go through this exciting task.

The key logic of nanmedian has been fully considered in the design, so approve!
one more step, suggest the implementation of kernel function of nanmedian also support median, so paddle.median can switch to it and improve performance.

jeff41404 · 2022-04-01T04:10:31Z

rfcs/APIs/20220331_api_design_for_nanmedian.md

+- `keepdim`参数的正确性，输出结果的正确性；
+- 输入含`NaN`结果的正确性；
+- 输入axis上全为`NaN`结果的正确性；
+- 测试在进行反向梯度计算时结果的正确性(包含nan值和非nan值位置的梯度)；


This test is very important. Except test normal data, we need to test some extremely special data. For example, when there are a large number of nan in the data and the counts of nan in axis are different, and even when the data in axes are all nan

@jeff41404 Hi, Sir, I added some new test cases, if I come up with more, I will add them later.

jeff41404

LGTM

api nanmedian设计文档

eea9c46

paddle-bot-old bot added contributor status: proposed labels Mar 31, 2022

dingjiaweiww mentioned this pull request Apr 1, 2022

【PaddlePaddle Hackathon 第二期】任务总览 PaddlePaddle/Paddle#40234

Closed

dingjiaweiww assigned jeff41404 and DDDivano Apr 1, 2022

dingjiaweiww added status: open review and removed status: proposed labels Apr 1, 2022

jeff41404 reviewed Apr 1, 2022

View reviewed changes

thunder95 added 4 commits April 2, 2022 00:19

support multiple axises

7a922fb

support multiple axises

d310b01

add c++ design

e737fd9

support grad kernel

7a677d9

jeff41404 approved these changes Apr 13, 2022

View reviewed changes

jeff41404 merged commit 6432a59 into PaddlePaddle:master Apr 13, 2022

jeff41404 mentioned this pull request May 5, 2022

【PaddlePaddle Hackathon 2】15 新增 API Nanmedian PaddlePaddle/Paddle#42385

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No.15】add RFC for Nanmedian #89

【Hackathon No.15】add RFC for Nanmedian #89

thunder95 commented Mar 31, 2022

paddle-bot-old bot commented Mar 31, 2022

paddle-bot-old bot commented Apr 1, 2022

jeff41404 Apr 1, 2022

thunder95 Apr 1, 2022

jeff41404 Apr 1, 2022

thunder95 Apr 1, 2022

jeff41404 Apr 6, 2022

thunder95 Apr 7, 2022

thunder95 Apr 8, 2022 •

edited

Loading

jeff41404 Apr 8, 2022

thunder95 Apr 8, 2022

jeff41404 Apr 11, 2022

thunder95 Apr 12, 2022

jeff41404 Apr 13, 2022 •

edited

Loading

jeff41404 Apr 1, 2022 •

edited

Loading

thunder95 Apr 1, 2022

jeff41404 left a comment

【Hackathon No.15】add RFC for Nanmedian #89

【Hackathon No.15】add RFC for Nanmedian #89

Conversation

thunder95 commented Mar 31, 2022

paddle-bot-old bot commented Mar 31, 2022

paddle-bot-old bot commented Apr 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thunder95 Apr 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff41404 Apr 13, 2022 • edited Loading

Choose a reason for hiding this comment

jeff41404 Apr 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

thunder95 Apr 8, 2022 •

edited

Loading

jeff41404 Apr 13, 2022 •

edited

Loading

jeff41404 Apr 1, 2022 •

edited

Loading