[Kernel] Optimize p_norm gpu #69660

HydrogenSulfate · 2024-11-24T12:27:51Z

PR Category

Performance Optimization

PR Types

Improvements

Description

Pcard-75624

针对p=1/2/3的case，优化 p_norm 的 GPU functor

优化前

	N=10	N=100	N=1000	N=10000
p=1	3.09E-02	4.19E-02	4.99E-02	1.21E+00
p=2	3.11E-02	4.18E-02	4.96E-02	1.24E+00
p=3	3.08E-02	4.16E-02	4.94E-02	1.21E+00

优化后

	N=10	N=100	N=1000	N=10000
p=1	2.62E-02	3.24E-02	3.29E-02	5.12E-01
p=2	3.11E-02	3.70E-02	3.73E-02	4.94E-01
p=3	3.08E-02	3.64E-02	3.71E-02	4.95E-01

耗时减少比例

	N=10	N=100	N=1000	N=10000
p=1	15%	23%	34%	58%
p=2	0%	11%	25%	60%
p=3	0%	13%	25%	59%

paddle-bot · 2024-11-24T12:27:56Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

zyfncg · 2024-11-25T02:47:54Z

paddle/phi/kernels/gpu/p_norm_kernel.cu

-    phi::funcs::ElementwiseKernel<T>(
-        dev_ctx, ins, &outs, UnsignedPowFunctor<T>(1. / porder));
+    if (porder != 1.0) {
+      // save computation when porder is 1.0


porder is 1.0的描述和判断条件 porder != 1.0 不太对应

porder is 1.0的描述和判断条件 porder != 1.0 不太对应

代码上应该是没问题的，1.0的时候不用跑下面的幂运算，注释我后续移动到上面去

Summary of this PR: 1. upload DPA-1 related code 2. merge much develop code 3. add all eager composite operators except `softmax_grad`, `p_norm_grad`, `split_grad`, and `concat_grad` to the composite operator blacklist(<https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148>) to significantly improve model execution speed (reducing the time taken from 100% more than PyTorch to about 10% to 15% more). related PR: lanpa/tensorboardX#728 ### Training curve: ![training_curves_comparison_eager_opt](https://github.com/user-attachments/assets/3b71fc99-5abf-4353-a61a-38737d3c7f2c) ### Accuracy test(left: paddle, right: torch): ![image](https://github.com/user-attachments/assets/a42b4bfd-c0f8-4eb8-85eb-ff1adf981dbb) Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556  ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced several new classes for molecular descriptors, including `DescrptDPA1`, `DescrptBlockSeAtten`, and `LayerNorm`, enhancing the modeling capabilities for molecular simulations. - Added new JSON configuration files for model parameters and multitask models related to water simulations. - Implemented new test classes for validating the functionality of the `DPAtomicModel` and various descriptor classes. - Added new test classes for evaluating denoising models, including `TestDenoiseModelDPA1` and `TestDenoiseModelDPA2`. - Enhanced the `ModelWrapper` class to clarify the handling of model parameters and state management. - **Bug Fixes** - Improved internal logic for handling model state saving and loading, ensuring consistency in outputs. - **Documentation** - Enhanced type hints and return annotations across various classes and methods for better clarity. - **Tests** - Expanded the testing framework with new test cases for denoising models and descriptor functionalities, ensuring robust validation of features. - Activated previously skipped tests for energy models, improving test coverage. - Enhanced multitask training tests with new configuration handling and test classes.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* optimize p_norm gpu impl * upload missing code

* optimize p_norm gpu impl * upload missing code Co-authored-by: HydrogenSulfate <490868991@qq.com>

Support DPA-2 in paddle backend. This PR will be updated after #4414 is merged. ### Training curve: ![training_curves_comparison_dpa2](https://github.com/user-attachments/assets/29bdeffa-cf2d-4586-afcf-7df0569997c3) ### Accuracy test(left: paddle, right: torch): ![image](https://github.com/user-attachments/assets/5bff55f3-1c39-4b95-93f0-68783e794716) Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556  ## Summary by CodeRabbit - **New Features** - Introduced new classes for molecular descriptors: `DescrptDPA2`, `DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`. - Added new functions for tensor operations and descriptor management, enhancing the capabilities of the module. - Updated JSON configurations for multitask models to refine selection criteria and data paths. - **Bug Fixes** - Improved error handling and parameter validation across various descriptor classes. - **Documentation** - Enhanced test coverage for new descriptor functionalities and configurations. - **Tests** - Added new test classes to validate the functionality of `DescrptDPA2` and multitask training scenarios. - Expanded test capabilities for descriptor classes based on installed dependencies. - Updated existing tests to support new configurations and functionalities.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

HydrogenSulfate added 3 commits November 23, 2024 23:52

optimize p_norm gpu impl

8bbf48e

upload missing code

c41657a

Merge branch 'develop' into optimize_p_norm_gpu

6d52a00

zyfncg approved these changes Nov 25, 2024

View reviewed changes

HydrogenSulfate closed this Nov 25, 2024

HydrogenSulfate reopened this Nov 25, 2024

HydrogenSulfate merged commit a291887 into PaddlePaddle:develop Nov 25, 2024
27 of 28 checks passed

HydrogenSulfate deleted the optimize_p_norm_gpu branch November 25, 2024 03:06

This was referenced Nov 25, 2024

pd: support dpa1 deepmodeling/deepmd-kit#4414

Merged

pd: support dpa2 deepmodeling/deepmd-kit#4418

Merged

HydrogenSulfate mentioned this pull request Dec 4, 2024

[Win] Restore slow code in windows #69948

Merged

LiYuRio pushed a commit to LiYuRio/Paddle that referenced this pull request Dec 18, 2024

[Kernel] Optimize p_norm gpu (PaddlePaddle#69660)

9bd1aa5

* optimize p_norm gpu impl * upload missing code

LiYuRio mentioned this pull request Dec 18, 2024

[Kernel] Optimize p_norm gpu (#69660) #70312

Merged

ForFishes pushed a commit that referenced this pull request Dec 19, 2024

[Kernel] Optimize p_norm gpu (#69660) (#70312)

fc56f38

* optimize p_norm gpu impl * upload missing code Co-authored-by: HydrogenSulfate <490868991@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Optimize p_norm gpu #69660

[Kernel] Optimize p_norm gpu #69660

HydrogenSulfate commented Nov 24, 2024

paddle-bot bot commented Nov 24, 2024

zyfncg Nov 25, 2024

HydrogenSulfate Nov 25, 2024

[Kernel] Optimize p_norm gpu #69660

[Kernel] Optimize p_norm gpu #69660

Conversation

HydrogenSulfate commented Nov 24, 2024

PR Category

PR Types

Description

paddle-bot bot commented Nov 24, 2024

zyfncg Nov 25, 2024

Choose a reason for hiding this comment

HydrogenSulfate Nov 25, 2024

Choose a reason for hiding this comment