Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No.15】add RFC for Nanmedian #89

Merged
merged 5 commits into from
Apr 13, 2022

Conversation

thunder95
Copy link
Contributor

新增api nanmedian设计文档

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Apr 1, 2022

PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

## 命名与参数设计
API设计为`paddle.nanmedian(x, axis=None, keepdim=False, name=None)`
命名与参数顺序为:形参名`input`->`x`和`dim`->`axis`, 与paddle其他API保持一致性,不影响实际功能使用。
参数类型中,`axis`支持`int`输入, keepdim支持返回保持原来的形状。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we support tuple or list of int in axis like numpy? only support int is simplest case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reply, Sir. This API is indeed supposed to support multiple axis with tuple or list of int. I have updated this design doc.


## API实现方案
主要按下列步骤进行组合实现,实现位置为`paddle/tensor/math.py`与`sum`,`nansum`等方法放在一起:
1. 使用`paddle.take_along_axis`获取axis上的元素.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if axis is int, according to the description in Chapter 2, we need to transpose. Why use paddle.take_along_axis here? Can you give us a general code similar to that in Chapter 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff41404 That was my mistake. The transpose op is needed. But I got confused there, how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis? I would appreciate a lot if you can give me more hints. My new idea is to calculate this to be transposed axis and reshape them, which could give me the same result with outputs of numpy, please take a view at my new design doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.quantile may gave some hints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff41404 This quantile op do help when I wanna extract data along one or more axises. However, I did not figure out how to sort them yet, neither sort nor argsort op can work it out when I tested with nan values. Whats your idea? is it nessecery to design a specific cpu and cuda kernel to solve this problem?

Copy link
Contributor Author

@thunder95 thunder95 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.quantile may gave some hints

w.r.t your suggestion, I have adopted this methods in paddle.quantile, and updated this api design doc, pls kindly take a view at it. The calculation process has not achieved my expectation, because apis like paddle.sort, paddle.min, paddle.argsort won`t work if nan values were in there. Whats the worse is, I have to transfer memory from tensor (sum_t) to numpy, and use a for-loop further. paddle.take_along_axis also did not help that much when the non-nan value amount differs in axis and also the odd and even cases should be taken into consideration respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the answer "paddle.quantile may gave some hints" is for "how could I more efficiently access to the correct axis data with paddle API, and even in case of multiple axis?"
but if we want to handle the 'nan', just composed of paddle.sort, paddle.argsort and so on may not work, because these APIs are
not considered to handle the 'nan' in design like paddle.median. So the main logic is recommended to be implemented in C++(suggest researching the implementation of torch.nanmedian in detail) , not only Python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff41404 The api design doc was newly updated. My rough thought is to firstly access to target axis and then calculate the nanmedian row by row in kernel function. does this api need backward gradient also, which I did not find in pytorch source code? if it do need, I will simply broadcast this median value to the original input shape, is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do need backward gradient like other Paddle APIs.
"I will simply broadcast this median value to the original input shape, is that correct?" I suggest recalling the definition of derivation and gradient, or refer to other similar APIs(max/min/topk...), believe you can find the answer.
by the way, the backward logic of some API in pytorch is generated, you need compile the code and find it in pytorch/torch/csrc/autograd/generated/Functions.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff41404 Hi, Sir. The doc is updated just now for your reference. Compiling the source code of Pytorch was really tough. So I did some research about the ops you mentioned above, and designed the backward logic. Please check it and more suggestions are welcome. Thank you so much for guiding me to go through this exciting task.

Copy link
Contributor

@jeff41404 jeff41404 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key logic of nanmedian has been fully considered in the design, so approve!
one more step, suggest the implementation of kernel function of nanmedian also support median, so paddle.median can switch to it and improve performance.

- `keepdim`参数的正确性,输出结果的正确性;
- 输入含`NaN`结果的正确性;
- 输入axis上全为`NaN`结果的正确性;
- 测试在进行反向梯度计算时结果的正确性(包含nan值和非nan值位置的梯度);
Copy link
Contributor

@jeff41404 jeff41404 Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very important. Except test normal data, we need to test some extremely special data. For example, when there are a large number of nan in the data and the counts of nan in axis are different, and even when the data in axes are all nan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff41404 Hi, Sir, I added some new test cases, if I come up with more, I will add them later.

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants