-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon No.15】add RFC for Nanmedian #89
Conversation
PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。 |
## 命名与参数设计 | ||
API设计为`paddle.nanmedian(x, axis=None, keepdim=False, name=None)` | ||
命名与参数顺序为:形参名`input`->`x`和`dim`->`axis`, 与paddle其他API保持一致性,不影响实际功能使用。 | ||
参数类型中,`axis`支持`int`输入, keepdim支持返回保持原来的形状。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we support tuple or list of int in axis like numpy? only support int is simplest case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply, Sir. This API is indeed supposed to support multiple axis with tuple or list of int. I have updated this design doc.
|
||
## API实现方案 | ||
主要按下列步骤进行组合实现,实现位置为`paddle/tensor/math.py`与`sum`,`nansum`等方法放在一起: | ||
1. 使用`paddle.take_along_axis`获取axis上的元素. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if axis is int, according to the description in Chapter 2, we need to transpose. Why use paddle.take_along_axis
here? Can you give us a general code similar to that in Chapter 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeff41404 That was my mistake. The transpose op is needed. But I got confused there, how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis? I would appreciate a lot if you can give me more hints. My new idea is to calculate this to be transposed axis and reshape them, which could give me the same result with outputs of numpy, please take a view at my new design doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.quantile may gave some hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeff41404 This quantile op do help when I wanna extract data along one or more axises. However, I did not figure out how to sort them yet, neither sort nor argsort op can work it out when I tested with nan values. Whats your idea? is it nessecery to design a specific cpu and cuda kernel to solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.quantile may gave some hints
w.r.t your suggestion, I have adopted this methods in paddle.quantile, and updated this api design doc, pls kindly take a view at it. The calculation process has not achieved my expectation, because apis like paddle.sort, paddle.min, paddle.argsort won`t work if nan values were in there. Whats the worse is, I have to transfer memory from tensor (sum_t) to numpy, and use a for-loop further. paddle.take_along_axis also did not help that much when the non-nan value amount differs in axis and also the odd and even cases should be taken into consideration respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the answer "paddle.quantile may gave some hints" is for "how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis?"
but if we want to handle the 'nan', just composed of paddle.sort, paddle.argsort and so on may not work, because these APIs are
not considered to handle the 'nan' in design like paddle.median. So the main logic is recommended to be implemented in C++(suggest researching the implementation of torch.nanmedian in detail) , not only Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeff41404 The api design doc was newly updated. My rough thought is to firstly access to target axis and then calculate the nanmedian row by row in kernel function. does this api need backward gradient also, which I did not find in pytorch source code? if it do need, I will simply broadcast this median value to the original input shape, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do need backward gradient like other Paddle APIs.
"I will simply broadcast this median value to the original input shape, is that correct?" I suggest recalling the definition of derivation and gradient, or refer to other similar APIs(max/min/topk...), believe you can find the answer.
by the way, the backward logic of some API in pytorch is generated, you need compile the code and find it in pytorch/torch/csrc/autograd/generated/Functions.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeff41404 Hi, Sir. The doc is updated just now for your reference. Compiling the source code of Pytorch was really tough. So I did some research about the ops you mentioned above, and designed the backward logic. Please check it and more suggestions are welcome. Thank you so much for guiding me to go through this exciting task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key logic of nanmedian has been fully considered in the design, so approve!
one more step, suggest the implementation of kernel function of nanmedian also support median, so paddle.median can switch to it and improve performance.
- `keepdim`参数的正确性,输出结果的正确性; | ||
- 输入含`NaN`结果的正确性; | ||
- 输入axis上全为`NaN`结果的正确性; | ||
- 测试在进行反向梯度计算时结果的正确性(包含nan值和非nan值位置的梯度); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very important. Except test normal data, we need to test some extremely special data. For example, when there are a large number of nan in the data and the counts of nan in axis are different, and even when the data in axes are all nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeff41404 Hi, Sir, I added some new test cases, if I come up with more, I will add them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
新增api nanmedian设计文档