Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linear-ridge-cholesky]使用 SPU 实现线性回归模型 #297

Merged
merged 9 commits into from
Aug 14, 2023

Conversation

magic-hya
Copy link
Contributor

@magic-hya magic-hya commented Aug 10, 2023

What problem does this PR solve?

I have read the CLA Document and I hereby sign the CLA
Issue Number: Fixed #274

使用cholesky分解法实现ridge,在spu镜像中完成emul和test单元测试

Possible side effects?

  • Performance:
    在diabetes数据集上,与sklearn的误差在0.01-0.02之间
  • Backward compatibility:

Signed-off-by: magic-hya <huangya@asiainfo.com>
@github-actions
Copy link

github-actions bot commented Aug 10, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Candicepan
Copy link
Contributor

辛苦按照上方提示签订 CLA 哈

Copy link
Collaborator

@tpppppub tpppppub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,除了 sml/linear_model 下的改动,.vscode 和 examples 下的请不要 commit

@magic-hya
Copy link
Contributor Author

Hi,除了 sml/linear_model 下的改动,.vscode 和 examples 下的请不要 commit

好的

@magic-hya
Copy link
Contributor Author

recheck

@magic-hya
Copy link
Contributor Author

magic-hya commented Aug 11, 2023

I have read the CLA Document and I hereby sign the CLA

@anakinxc
Copy link
Collaborator

recheck

@anakinxc
Copy link
Collaborator

anakinxc commented Aug 11, 2023

Hi @magic-hya

Please use buildifier to format bazel files.
Also format python files with black formatter

Thanks

Copy link
Contributor

@deadlywing deadlywing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外,sml对目录结构做了一次重构,麻烦您按照现在的目录结构有序放置文件,并修改对应的BUILD.bazel文件

感谢支持~

name = "ridge",
srcs = ["ridge.py"],
deps = [
"//sml/utils:fxp_approx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fxp_approx似乎没有用到,可以去掉这个deps

dot(X.T, X)
"""

def __init__(self, alpha=1.0, solver="lsqr") -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还需要支持是否拟合bias项的选项

if y.ndim == 1:
y = y.reshape(-1, 1)
alpha = jnp.asarray(self.alpha, dtype=x.dtype).ravel()
print(f"<<<solver: {self.solver}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

算法主逻辑里不要加print语句吧


x1, x2, y = dsutil.load_dataset_by_config(dataset_config)
result = spsim.sim_jax(sim, proc)(x1, x2, y)
print(result[:10])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单元测试需要比较明文sklearn的结果~


# run
result = emulator.run(proc)(x1, x2, y)
print(result[:10])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

麻烦比较一下明文sklearn的结果,并注明一下误差大小~



class Solver(Enum):
SVD = 'svd' # not supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前utils的extmath.py中有svd的实现,但是在定点代数下发现比较容易溢出,如果您有兴趣可以调用里面的api实现

Xy = jnp.dot(x.T, y)

for i in range(n_features):
A = A.at[i, i].set(A[i][i] + alpha[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接加一个对角阵即可,无需循环赋值~

Signed-off-by: magic-hya <huangya@asiainfo.com>
Signed-off-by: magic-hya <huangya@asiainfo.com>
Signed-off-by: magic-hya <huangya@asiainfo.com>
@magic-hya
Copy link
Contributor Author

1.添加了fit_bias选项
2.优化了矩阵计算
3.格式化了bazel和py代码
4.增加了sklearn算法对比测试,误差在0.005以下
测试效果如下:

[sklearn_result]---------------------------------------------
[199.37653   73.73293  172.66196  161.55786  128.74493  103.354576
  83.321686 128.3768   160.04437  206.40292 ]
[spsim_result]-----------------------------------------------  
[199.3735    73.72925  172.65936  161.55386  128.74008  103.35206
  83.318665 128.37201  160.04198  206.40111 ]
[absolute_error]---------------------------------------------  
[0.00302124 0.003685   0.00259399 0.0039978  0.00485229 0.0025177
 0.00302124 0.00479126 0.00239563 0.0018158  0.00493622 0.00495148
 0.00419617 0.00210571 0.00326538 0.00457764 0.00273132 0.00369263
 0.00405884 0.00289154]

@magic-hya magic-hya requested a review from deadlywing August 13, 2023 07:39
Copy link
Contributor

@deadlywing deadlywing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外,麻烦确保commit的代码已经:
使用black格式化过python代码,以及isort重排import语句


from scipy import linalg
from enum import Enum
from scipy import linalg, sparse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup一下import语句

"""
if y.ndim == 1:
y = y.reshape(-1, 1)
alpha = jnp.asarray(self.alpha, dtype=x.dtype).ravel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

似乎不必要转化为array?或许可以直接转化为float类型



class UnitTests(unittest.TestCase):
def test_simple(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试函数名可以改为:test_ridge,明确一下测试函数的语意

Signed-off-by: magic-hya <huangya@asiainfo.com>
@magic-hya
Copy link
Contributor Author

1.代码已使用black+isort格式化,请检查
2.alpha直接使用float类型,后续需要做多变量回归再做调整
3.测试函数名已修正
4.重新执行了测试用例,修改结果精度显示
测试效果如下:

Executing tests from //sml/linear_model/tests:ridge_test
-----------------------------------------------------------------------------
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
.
----------------------------------------------------------------------
Ran 1 test in 0.838s

OK
[sklearn_result]---------------------------------------------
[199.37653   73.73293  172.66196  161.55786  128.74493  103.354576
  83.321686 128.3768   160.04437  206.40292 ]
[spsim_result]-----------------------------------------------
[199.37305   73.728676 172.6597   161.5527   128.73994  103.349915
  83.31765  128.37166  160.04211  206.39943 ]
[absolute_error]---------------------------------------------
[0.00348 0.00426 0.00226 0.00516 0.00499 0.00466 0.00404 0.00514 0.00226
 0.00349 0.00556 0.00314 0.00468 0.00363 0.00404 0.0065  0.00435 0.00391
 0.0041  0.00439]

@magic-hya magic-hya requested a review from deadlywing August 14, 2023 08:21
Copy link
Contributor

@deadlywing deadlywing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@magic-hya magic-hya requested a review from tpppppub August 14, 2023 09:05
@deadlywing deadlywing merged commit 9857c82 into secretflow:main Aug 14, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[linear-ridge-svd]使用 SPU 实现线性回归模型
5 participants