Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 4 No.9】Add pca_lowrank API to Paddle #53743

Merged
merged 26 commits into from
Jun 13, 2023

Conversation

Patrick-Star125
Copy link
Contributor

@Patrick-Star125 Patrick-Star125 commented May 12, 2023

PR types

New features

PR changes

APIs

Description

Add pca_lowrank API to Paddle

Rfc PR: PaddlePaddle/community#474

待完成

@paddle-bot
Copy link

paddle-bot bot commented May 12, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels May 12, 2023
@paddle-bot
Copy link

paddle-bot bot commented May 12, 2023

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@Patrick-Star125
Copy link
Contributor Author

Patrick-Star125 commented May 14, 2023

我对于paddle的稀疏算子的使用似乎不规范,对于当前sparse::matmul的报错,在线上CI线下Linux平台测试中的报错不相同。

线上的报错为:

NotFoundError: The kernel `matmul_coo_dense` is not registered.

线下的报错为:

On entry to cusparseCreateCoo(): dimension mismatch, nnz (6) > matrix size (4)

线下python端的报错更详细一些

 ** On entry to cusparseCreateCoo() dimension mismatch: nnz > rows * cols
 ** On entry to cusparseSpMM_bufferSize() parameter number 5 (matA) had an illegal value: already destroyed
 ** On entry to cusparseSpMM() parameter number 5 (matA) had an illegal value: already destroyed
 ** On entry to cusparseDestroySpMat() parameter number 1 (spMatDescr) had an illegal value: already destroyed

我想知道问题在于sparse matmul算子使用还是sparse tensor的创建。nnz > rows * cols是什么含义

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented May 21, 2023

Sorry to inform you that 485662d's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@zkh2016
Copy link
Contributor

zkh2016 commented May 23, 2023

nnz > rows * cols是什么含义

这个意思是说矩阵的非零元素的个数比 rows * cols大,rows是矩阵的行数,cols是矩阵的列数。

要不在创建sparse tensor和调用sparse matmul的前后加点信息看看哪里开始有异常?

@Patrick-Star125
Copy link
Contributor Author

Patrick-Star125 commented May 29, 2023

这个问题的来源是当前paddle.sparse.sum API在某些情况下出现会出现非零元素数量多于tensor元素数量的情况,我不太确定这是否正常,下面是复现代码

# 原始数据 Paddle
Tensor(shape=[17, 4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[0, 1, 2, 3, 4, 5],
                [1, 0, 0, 3, 1, 2]], 
       values=[-0.35408317,  0.34116612,  0.01369466, -0.04936034,  0.00005647,
               -0.00023232])
# 原始数据 Pytorch
tensor(indices=tensor([[0, 1, 2, 3, 4, 5],
                       [1, 0, 0, 3, 1, 2]]),
       values=tensor([-3.5408e-01,  3.4117e-01,  1.3695e-02, -4.9360e-02,
                       5.6474e-05, -2.3232e-04]),
       size=(17, 4), nnz=6, dtype=torch.float64, layout=torch.sparse_coo)
paddle.sparse.sum(x, axis=-2) # Paddle结果
Tensor(shape=[4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[1, 0, 0, 3, 1, 2]], 
       values=[-0.35402670,  0.35486078,  0.35486078, -0.04936034, -0.35402670,
               -0.00023232])

torch.sparse.sum(s_t_x, dim=-2) # Pytorch结果
tensor(indices=tensor([[0, 1, 2, 3]]),
       values=tensor([ 3.5486e-01, -3.5403e-01, -2.3232e-04, -4.9360e-02]),
       size=(4,), nnz=4, dtype=torch.float64, layout=torch.sparse_coo)

因此sum的结果在sparse.matmul计算时会得到nnz > rows * cols的警告,再深入打印可以发现

paddle.sparse.matmul(C_t, ones_m1_t)paddle.matmul(C_t.to_dense(), ones_m1_t)的结果也出现了不一致的问题

@zkh2016
Copy link
Contributor

zkh2016 commented May 29, 2023

这个问题的来源是当前paddle.sparse.sum API在某些情况下出现会出现非零元素数量多于tensor元素数量的情况,我不太确定这是否正常,下面是复现代码

# 原始数据 Paddle
Tensor(shape=[17, 4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[0, 1, 2, 3, 4, 5],
                [1, 0, 0, 3, 1, 2]], 
       values=[-0.35408317,  0.34116612,  0.01369466, -0.04936034,  0.00005647,
               -0.00023232])
# 原始数据 Pytorch
tensor(indices=tensor([[0, 1, 2, 3, 4, 5],
                       [1, 0, 0, 3, 1, 2]]),
       values=tensor([-3.5408e-01,  3.4117e-01,  1.3695e-02, -4.9360e-02,
                       5.6474e-05, -2.3232e-04]),
       size=(17, 4), nnz=6, dtype=torch.float64, layout=torch.sparse_coo)
paddle.sparse.sum(x, axis=-2) # Paddle结果
Tensor(shape=[4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[1, 0, 0, 3, 1, 2]], 
       values=[-0.35402670,  0.35486078,  0.35486078, -0.04936034, -0.35402670,
               -0.00023232])

torch.sparse.sum(s_t_x, dim=-2) # Pytorch结果
tensor(indices=tensor([[0, 1, 2, 3]]),
       values=tensor([ 3.5486e-01, -3.5403e-01, -2.3232e-04, -4.9360e-02]),
       size=(4,), nnz=4, dtype=torch.float64, layout=torch.sparse_coo)

因此sum的结果在sparse.matmul计算时会得到nnz > rows * cols的警告,再深入打印可以发现

paddle.sparse.matmul(C_t, ones_m1_t)paddle.matmul(C_t.to_dense(), ones_m1_t)的结果也出现了不一致的问题

@zrr1999 看起来是sparse.sum的问题,能否定位下?

@zrr1999
Copy link
Member

zrr1999 commented May 29, 2023

这个问题的来源是当前paddle.sparse.sum API在某些情况下出现会出现非零元素数量多于tensor元素数量的情况,我不太确定这是否正常,下面是复现代码

# 原始数据 Paddle
Tensor(shape=[17, 4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[0, 1, 2, 3, 4, 5],
                [1, 0, 0, 3, 1, 2]], 
       values=[-0.35408317,  0.34116612,  0.01369466, -0.04936034,  0.00005647,
               -0.00023232])
# 原始数据 Pytorch
tensor(indices=tensor([[0, 1, 2, 3, 4, 5],
                       [1, 0, 0, 3, 1, 2]]),
       values=tensor([-3.5408e-01,  3.4117e-01,  1.3695e-02, -4.9360e-02,
                       5.6474e-05, -2.3232e-04]),
       size=(17, 4), nnz=6, dtype=torch.float64, layout=torch.sparse_coo)
paddle.sparse.sum(x, axis=-2) # Paddle结果
Tensor(shape=[4], dtype=paddle.float64, place=Place(gpu:0), stop_gradient=True, 
       indices=[[1, 0, 0, 3, 1, 2]], 
       values=[-0.35402670,  0.35486078,  0.35486078, -0.04936034, -0.35402670,
               -0.00023232])

torch.sparse.sum(s_t_x, dim=-2) # Pytorch结果
tensor(indices=tensor([[0, 1, 2, 3]]),
       values=tensor([ 3.5486e-01, -3.5403e-01, -2.3232e-04, -4.9360e-02]),
       size=(4,), nnz=4, dtype=torch.float64, layout=torch.sparse_coo)

因此sum的结果在sparse.matmul计算时会得到nnz > rows * cols的警告,再深入打印可以发现
paddle.sparse.matmul(C_t, ones_m1_t)paddle.matmul(C_t.to_dense(), ones_m1_t)的结果也出现了不一致的问题

@zrr1999 看起来是sparse.sum的问题,能否定位下?

好的,我去查一下

@Patrick-Star125
Copy link
Contributor Author

Patrick-Star125 commented May 30, 2023

将paddle.sparse.matmul绕过后单测可以通过,能麻烦对代码要更改的点都review一下吗

@luotao1 luotao1 added the API label May 31, 2023
@zhengqiwen1997
Copy link
Contributor

请把PR-CI-Codestyle-Check的流水线修过了,其他的我看没问题了。
image

@luotao1
Copy link
Contributor

luotao1 commented May 31, 2023

同时需要补充单测过一下coverage流水线

@Patrick-Star125
Copy link
Contributor Author

是否需要在spare.pca_lowrank代码中添加CUDA11.0的限制

@zhengqiwen1997
Copy link
Contributor

是否需要在spare.pca_lowrank代码中添加CUDA11.0的限制

不需要。windows流水线重新run一下

positive number. The data type of x should be float32 or float64.
q (int, optional): a slightly overestimated rank of :math:`X`.
Default value is :math:`q=min(6,N,M)`.
center (bool, optional): if True, center the input tensor, otherwise,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if True, center the input tensor, otherwise,Default value is True. otherwise后面是不是少了一段话?不太通顺。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zhengqiwen1997
Copy link
Contributor

static的CI没通过,看起来是因为sparse API的文档代码跑到了 return _C_ops.sparse_matmul(x, y),而这个必须cuda11.X支持。看起来需要在spare.pca_lowrank代码中添加CUDA11.0的限制来通过static的CI。

@Patrick-Star125
Copy link
Contributor Author

我在示例代码和计算代码中都添加了CUDA11限制,请问保留哪一处比较合适?

@zhengqiwen1997
Copy link
Contributor

两处都加限制吧;我看示例代码加了限制,但static的CI还是失败了。是因为你用 os.environ["CUDA_VISIBLE_DEVICES"] = "" 隐藏GPU了,但是此CI的cuda版本大于11.0,所以跑错分支了,可以删除os的限制吧。

@Patrick-Star125
Copy link
Contributor Author

这里OS的限制是自动加的,好像是通过这种方式限制gpu在cpu模式下运行,源码在tools\sampcd_processor.py

os.environ["CUDA_VISIBLE_DEVICES"] = ""可以让CI在有GPU的前提下使用CPU运行代码,但是方法paddle.version.cuda()依然可以看到cuda存在,所以在这里没法绕过sparse.pca_lowrank的调用

如果使用os.environ["CUDA_VISIBLE_DEVICES"]的值来判断cuda存在,在用户端会产生报错(我自己测的),这段example code就不可用了

我的想法是简单使用注释的方式解释,如下所示,这样CI能通过,用户也能看懂

    print("sparse.pca_lowrank API only support CUDA 11.x")
    U, S, V = None, None, None
    # U, S, V = pca_lowrank(sparse_x)

@zhengqiwen1997
Copy link
Contributor

好的,麻烦@sunzhongkai588 看看这样写文档代码可以吗?

@sunzhongkai588
Copy link
Contributor

这里OS的限制是自动加的,好像是通过这种方式限制gpu在cpu模式下运行,源码在tools\sampcd_processor.py

os.environ["CUDA_VISIBLE_DEVICES"] = ""可以让CI在有GPU的前提下使用CPU运行代码,但是方法paddle.version.cuda()依然可以看到cuda存在,所以在这里没法绕过sparse.pca_lowrank的调用

如果使用os.environ["CUDA_VISIBLE_DEVICES"]的值来判断cuda存在,在用户端会产生报错(我自己测的),这段example code就不可用了

我的想法是简单使用注释的方式解释,如下所示,这样CI能通过,用户也能看懂

    print("sparse.pca_lowrank API only support CUDA 11.x")
    U, S, V = None, None, None
    # U, S, V = pca_lowrank(sparse_x)
    print("sparse.pca_lowrank API only support CUDA 11.x")
    U, S, V = None, None, None
    # U, S, V = pca_lowrank(sparse_x)

@Patrick-Star125 注释这行,可以简单写清楚是什么情况下用。
另外,ci好像还是没有过

@Patrick-Star125
Copy link
Contributor Author

简单加了一行解释,static的CI过了

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

sunzhongkai588
sunzhongkai588 previously approved these changes Jun 7, 2023
Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

Copy link
Contributor

@zhengqiwen1997 zhengqiwen1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage的CI可以豁免,因为sparse的功能需要cuda11.X。

Comment on lines +60 to +62
x = paddle.sparse.sparse_coo_tensor(
indices_tensor, values, (rows, columns)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to test the sparse_csr_tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API don't support sparse_csr_tensor due to sparse.sum with axis(-2) has not been implemented.

(Unimplemented) axis of SumCsrKernel only support None or -1 now.More number will be supported in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO to remind us to complement the test when sparse.sum with axis(-2) is implemented in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 999 to 1000
if x.is_sparse():
return paddle.sparse.matmul(x, B)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API requires sparse tensor as input(L1084) , so there is no need for this judgment and encapsulation anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some parts of code would use paddle.matmul such as L1057 matmul(M, Q_c)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if matmul(M, Q_c) use paddle.matmul then write it directly, I think it is easier to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +1012 to +1013
if x.is_sparse():
return paddle.sparse.transpose(x, perm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API requires sparse tensor as input(L1084) , so there is no need for this judgment anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some parts of code would use paddle.matmul such as L1127

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering saving calculation code of perm, this can be preserved

@jeff41404
Copy link
Contributor

RFC needs to be modified to match the code. e.g. API in RFC is paddle.pca_lowrank(A: Tensor, q: Optional[int] = None, center: bool = True, niter: int = 2), but code is paddle.pca_lowrank(**x**, q, center, niter, name); and paddle.sparse.pca_lowrank is not mentioned in RFC.

@Patrick-Star125
Copy link
Contributor Author

rfc modify pr has been submit PaddlePaddle/community#555

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 4ebb476 into PaddlePaddle:develop Jun 13, 2023
@Patrick-Star125 Patrick-Star125 deleted the pca branch June 18, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants