Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix, read-write race at fast_ln_fwd_kernel #56435

Merged

Conversation

jeng1220
Copy link
Collaborator

PR types

Bug fixes

PR changes

OPs

Description

Fix #56100 and turn on fast_ln_fwd_kernel

@paddle-bot
Copy link

paddle-bot bot commented Aug 18, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Aug 18, 2023
@@ -295,6 +295,7 @@ __global__ __launch_bounds__(THREADS_PER_CTA) void fast_ln_fwd_kernel(
}

if (WARPS_N > 1) {
__syncthreads();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sync needed? or line 302 is enough?

Copy link
Collaborator Author

@jeng1220 jeng1220 Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary. Without sync, some threads are reading smem at L274, and some are writing smem at L300 in parallel.

compute-sanitizer also can point out they have read-write race.

Copy link
Contributor

@zhaoyinglia zhaoyinglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
the unittest in #56100 is OK.

@jeng1220
Copy link
Collaborator Author

PR-CI-Coverage was failed but nothing is related to this PR.

Log:

2023-08-18 18:36:15 [ 81%] Linking CXX executable prim_op_test
2023-08-18 18:37:05 collect2: fatal error: ld terminated with signal 9 [Killed]
2023-08-18 18:37:05 compilation terminated.
2023-08-18 18:37:05 paddle/fluid/distributed/fleet_executor/test/CMakeFiles/compute_interceptor_run_op_test.dir/build.make:537: recipe for target 'paddle/fluid/distributed/fleet_executor/test/compute_interceptor_run_op_test' failed
2023-08-18 18:37:05 make[2]: *** [paddle/fluid/distributed/fleet_executor/test/compute_interceptor_run_op_test] Error 1

@jeng1220
Copy link
Collaborator Author

jeng1220 commented Aug 21, 2023

@zhiqiu

ALL CI pipelines are passed. It is ready to be merged.

@zhaoyinglia zhaoyinglia merged commit 1f987a7 into PaddlePaddle:develop Aug 21, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
@jeng1220 jeng1220 deleted the bugfix_github_fast_ln_fwd_race branch September 12, 2023 09:19
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fast layer norm has non-deterministic problem
5 participants