Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix allreduce_sum potential bugs on NPU. #34462

Merged
merged 6 commits into from
Jul 29, 2021

Conversation

gongweibao
Copy link
Contributor

@gongweibao gongweibao commented Jul 28, 2021

PR types

Bug fixes

PR changes

Others

Describe

NPU will crash if allreduce_sum meets Nan, so change it to INF to avoid this.

TODO:
Speed up checknumeric

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@gongweibao gongweibao changed the title [WIP]add numeric test=develop add numeric test=develop Jul 29, 2021
@gongweibao gongweibao changed the title add numeric test=develop Fix allreduce_sum potential bugs on NPU. Jul 29, 2021
@gongweibao gongweibao requested a review from zhiqiu July 29, 2021 06:29
Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for const_cast.
Suggested to use a tmp tensor, which can be refined in the next PR.

@gongweibao gongweibao merged commit 02cc3c5 into PaddlePaddle:develop Jul 29, 2021
@gongweibao gongweibao deleted the checknumeric branch July 29, 2021 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants