Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_fused_dropout_act_bias failure on H100 #47285

Merged
merged 1 commit into from
Oct 27, 2022

Conversation

Wong4j
Copy link
Collaborator

@Wong4j Wong4j commented Oct 24, 2022

PR types

Bug fixes

PR changes

Others

Describe

test_fused_dropout_act_bias UT can pass on A100 but failed on H100. Adding __launch_bounds__(THREADS_PER_CTA) to FusedDropoutActBiasGrad kernel can solve the bug.
I have already reported this bug internally. I guess the nvcc compiler doesn't allocate resources properly for Hopper. This bug may also occur in H800. Thus, I file this PR.

Compute-sanitizer error message:

========= COMPUTE-SANITIZER
========= Program hit cudaErrorLaunchOutOfResources (error 701) due to "too many resources requested for launch" on CUDA API call to cudaLaunchKernel.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x454676]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaLaunchKernel [0x6d4a8]
=========                in /test-hopper/./test.out
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) [0xb777]
=========                in /test-hopper/./test.out
=========     Host Frame:__device_stub__Z23FusedDropoutActBiasGradIdEv15GeluGradFunctorIT_EPKS1_PKhS4_S4_S1_llPS1_S7_(GeluGradFunctor<double>&, double const*, unsigned char const*, double const*, double const*, double, long, long, double*, double*) [0xb5ad]  
=========                in /test-hopper/./test.out
=========     Host Frame:void __wrapper__device_stub_FusedDropoutActBiasGrad<double>(GeluGradFunctor<double>&, double const*&, unsigned char const*&, double const*&, double const*&, double const&, long const&, long const&, double*&, double*&) [0xb64c]        
=========                in /test-hopper/./test.out
=========     Host Frame:void FusedDropoutActBiasGrad<double>(GeluGradFunctor<double>, double const*, unsigned char const*, double const*, double const*, double, long, long, double*, double*) [0xb849]

@paddle-bot
Copy link

paddle-bot bot commented Oct 24, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@zkh2016 zkh2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zkh2016 zkh2016 merged commit 13181fd into PaddlePaddle:develop Oct 27, 2022
zlsh80826 pushed a commit to zlsh80826/Paddle that referenced this pull request Nov 23, 2022
qingqing01 pushed a commit that referenced this pull request Nov 28, 2022
* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100

Co-authored-by: Shijie <505749828@qq.com>
Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: Tian Zheng <tizheng@nvidia.com>
@Wong4j Wong4j deleted the fix_fused_dropout_act_bias branch February 14, 2023 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants