Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API #40690

Merged
merged 57 commits into from
May 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
001a5f1
corrcoef commit
liqitong-a Mar 17, 2022
45a53eb
corrcoef commit
liqitong-a Mar 17, 2022
d2aa0fb
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 22, 2022
bb5c04d
Update test_corr.py
liqitong-a Mar 22, 2022
6d88d9a
Update linalg.py
liqitong-a Mar 22, 2022
2c1cfe8
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
1cf83e8
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
2d654fd
Update test_corr.py
liqitong-a Mar 25, 2022
e5b66e9
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 25, 2022
6efaca6
Update test_corr.py
liqitong-a Mar 25, 2022
53ee671
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
9a74568
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
fe98fcd
Update test_corr.py
liqitong-a Mar 26, 2022
f39ad07
Update test_corr.py
liqitong-a Mar 26, 2022
4ee155e
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 26, 2022
794e368
Update test_corr.py
liqitong-a Mar 26, 2022
02c9ef9
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 27, 2022
af6b514
Update test_corr.py
liqitong-a Mar 27, 2022
f90d599
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 27, 2022
3d8e0b0
Update test_corr.py
liqitong-a Mar 27, 2022
1991f92
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
760b858
Update test_corr.py
liqitong-a Mar 28, 2022
694b895
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
510c6e7
Update test_corr.py
liqitong-a Mar 28, 2022
74969a4
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Mar 28, 2022
677ba6f
Update test_corr.py
liqitong-a Mar 28, 2022
6b8e3d3
Update linalg.py
liqitong-a Apr 7, 2022
87ba181
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 7, 2022
49227fb
Update linalg.py
liqitong-a Apr 7, 2022
84c65a7
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 7, 2022
a80e7a5
Update linalg.py
liqitong-a Apr 8, 2022
d63e2dc
Update test_corr.py
liqitong-a Apr 12, 2022
3c1dd13
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 12, 2022
9b66604
Update test_corr.py
liqitong-a Apr 12, 2022
46b9021
Update test_corr.py
liqitong-a Apr 13, 2022
299d4c0
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 13, 2022
937a4fe
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 13, 2022
f631db9
Update test_corr.py
liqitong-a Apr 14, 2022
dde2566
Update test_corr.py
liqitong-a Apr 14, 2022
f59b0ef
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 15, 2022
7eddd43
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 15, 2022
67be88e
Update test_corr.py
liqitong-a Apr 18, 2022
607eb71
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
552c07b
Update test_corr.py
liqitong-a Apr 18, 2022
ae572c4
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
3652bbd
Update test_corr.py
liqitong-a Apr 18, 2022
18339f5
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 18, 2022
decb986
Update test_corr.py
liqitong-a Apr 24, 2022
189a29f
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 24, 2022
69064d2
Update test_corr.py
liqitong-a Apr 27, 2022
7c3b09d
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 27, 2022
7c5efcf
Update test_corr.py
liqitong-a Apr 27, 2022
e97d91c
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 27, 2022
4fca073
Update test_corr.py
liqitong-a Apr 29, 2022
51da5d6
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a Apr 29, 2022
2255acf
Update test_corr.py
liqitong-a May 5, 2022
9cbac57
Merge branch 'PaddlePaddle:develop' into corrcoef
liqitong-a May 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions python/paddle/fluid/tests/unittests/test_corr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
import unittest
import numpy as np
import six
import paddle
import warnings


def numpy_corr(np_arr, rowvar=True, dtype='float64'):
return np.corrcoef(np_arr, rowvar=rowvar, dtype=dtype)


class Corr_Test(unittest.TestCase):
def setUp(self):
self.shape = [4, 5]

def test_tensor_corr_default(self):
typelist = ['float64', 'float32']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol

好的好的 我调整一下

places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))
for idx, p in enumerate(places):
if idx == 0:
paddle.set_device('cpu')
else:
paddle.set_device('gpu')

for dtype in typelist:
np_arr = np.random.rand(*self.shape).astype(dtype)
tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor)
np_corr = numpy_corr(np_arr, rowvar=True, dtype=dtype)
if dtype == 'float32':
self.assertTrue(
np.allclose(
np_corr, corr.numpy(), atol=1.e-5))
else:
self.assertTrue(np.allclose(np_corr, corr.numpy()))

def test_tensor_corr_rowvar(self):
typelist = ['float64', 'float32']
places = [fluid.CPUPlace()]
if fluid.core.is_compiled_with_cuda():
places.append(fluid.CUDAPlace(0))

for idx, p in enumerate(places):
if idx == 0:
paddle.set_device('cpu')
else:
paddle.set_device('gpu')

for dtype in typelist:
np_arr = np.random.rand(*self.shape).astype(dtype)
tensor = paddle.to_tensor(np_arr, place=p)
corr = paddle.linalg.corrcoef(tensor, rowvar=False)
np_corr = numpy_corr(np_arr, rowvar=False, dtype=dtype)
if dtype == 'float32':
self.assertTrue(
np.allclose(
np_corr, corr.numpy(), atol=1.e-5))
else:
self.assertTrue(np.allclose(np_corr, corr.numpy()))


# Input(x) only support N-D (1<=N<=2) tensor
class Corr_Test2(Corr_Test):
def setUp(self):
self.shape = [10]


class Corr_Test3(Corr_Test):
def setUp(self):
self.shape = [4, 5]


# Input(x) only support N-D (1<=N<=2) tensor
class Corr_Test4(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释,
增加不支持的数据类型测试案例

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个test类添加一下注释, 增加不支持的数据类型测试案例

我改好啦,麻烦看一下哦

def setUp(self):
self.shape = [2, 5, 2]

def test_errors(self):
def test_err():
np_arr = np.random.rand(*self.shape).astype('float64')
tensor = paddle.to_tensor(np_arr)
covrr = paddle.linalg.corrcoef(tensor)

self.assertRaises(ValueError, test_err)


# test unsupported complex input
class Corr_Comeplex_Test(unittest.TestCase):
def setUp(self):
self.dtype = 'complex128'

def test_errors(self):
paddle.enable_static()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

静态图可否写到基类中

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

静态图可否写到基类中

可以的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不过fp32,我测试了下cov的,补充以后cov的test不能通过,这个是在cov的基础上写的,所以可能不太方便。

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api测试需要覆盖所支持的数据类型,可以调节allclose精度

x1 = fluid.data(name=self.dtype, shape=[2], dtype=self.dtype)
self.assertRaises(TypeError, paddle.linalg.corrcoef, x=x1)
paddle.disable_static()


class Corr_Test5(Corr_Comeplex_Test):
def setUp(self):
self.dtype = 'complex64'


if __name__ == '__main__':
unittest.main()
2 changes: 2 additions & 0 deletions python/paddle/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from .tensor.linalg import norm # noqa: F401
from .tensor.linalg import eig # noqa: F401
from .tensor.linalg import cov # noqa: F401
from .tensor.linalg import corrcoef # noqa: F401
from .tensor.linalg import cond # noqa: F401
from .tensor.linalg import matrix_power # noqa: F401
from .tensor.linalg import solve # noqa: F401
Expand All @@ -41,6 +42,7 @@
'norm',
'cond',
'cov',
'corrcoef',
'inv',
'eig',
'eigvals',
Expand Down
2 changes: 2 additions & 0 deletions python/paddle/tensor/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from .linalg import matmul # noqa: F401
from .linalg import dot # noqa: F401
from .linalg import cov # noqa: F401
from .linalg import corrcoef # noqa: F401
from .linalg import norm # noqa: F401
from .linalg import cond # noqa: F401
from .linalg import transpose # noqa: F401
Expand Down Expand Up @@ -278,6 +279,7 @@
'matmul',
'dot',
'cov',
'corrcoef',
'norm',
'cond',
'transpose',
Expand Down
70 changes: 70 additions & 0 deletions python/paddle/tensor/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from .creation import full

import paddle
import warnings
from paddle.common_ops_import import core
from paddle.common_ops_import import VarDesc
from paddle import _C_ops
Expand Down Expand Up @@ -3181,3 +3182,72 @@ def lstsq(x, y, rcond=None, driver=None, name=None):
singular_values = paddle.static.data(name='singular_values', shape=[0])

return solution, residuals, rank, singular_values


def corrcoef(x, rowvar=True, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?

这个是由于计算corrcoef首先要计算cov,paddle的cov在编写的时候对比numpy也是没有y参数,后续需要看cov是否添加y参数。

Copy link
Contributor

@zhiboniu zhiboniu May 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

源码:
3c06f698d853ffbf64e4431cfaf0b432
示例:
image

这个原因在于,y是对x元素的补充,最终使用的时候也是跟x拼接起来了,所以跟只用一个拼接好的x是一样的。
从简洁性上考虑,这个y也有点多余。所以以后应该不会添加y参数的。

"""

A correlation coefficient matrix indicate the correlation of each pair variables in the input matrix.
For example, for an N-dimensional samples X=[x1,x2,…xN]T, then the correlation coefficient matrix
element Rij is the correlation of xi and xj. The element Rii is the covariance of xi itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中英文文档不统一?需要保证内容一致,且不可以照抄numpy哦~


The relationship between the correlation coefficient matrix `R` and the
covariance matrix `C`, is

.. math:: R_{ij} = \\frac{ C_{ij} } { \\sqrt{ C_{ii} * C_{jj} } }

The values of `R` are between -1 and 1.

Parameters:

x(Tensor): A N-D(N<=2) Tensor containing multiple variables and observations. By default, each row of x represents a variable. Also see rowvar below.
rowvar(Bool, optional): If rowvar is True (default), then each row represents a variable, with observations in the columns. Default: True.
name(str, optional): Name of the output. Default is None. It's used to print debug info for developers. Details: :ref:`api_guide_Name`.

Returns:

The correlation coefficient matrix of the variables.

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

这里报错了,再检查一下

.. code-block:: python
:name: code-example1

import paddle

xt = paddle.rand((3,4))
print(paddle.linalg.corrcoef(xt))

# Tensor(shape=[3, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
# [[ 1. , -0.73702252, 0.66228950],
# [-0.73702258, 1. , -0.77104872],
# [ 0.66228974, -0.77104825, 1. ]])

"""
if len(x.shape) > 2 or len(x.shape) < 1:
raise ValueError(
"Input(x) only support N-D (1<=N<=2) tensor in corrcoef, but received "
"length of Input(input) is %s." % len(x.shape))
check_variable_and_dtype(x, 'dtype', ['float32', 'float64'], 'corrcoef')

c = cov(x, rowvar)
if (c.ndim == 0):
# scalar covariance
# nan if incorrect value (nan, inf, 0), 1 otherwise
return c / c

d = paddle.diag(c)

if paddle.is_complex(d):
d = d.real()
stddev = paddle.sqrt(d)
c /= stddev[:, None]
c /= stddev[None, :]

# Clip to [-1, 1]. This does not guarantee
if paddle.is_complex(c):
return paddle.complex(
paddle.clip(c.real(), -1, 1), paddle.clip(c.imag(), -1, 1))
else:
c = paddle.clip(c, -1, 1)

return c