Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v5.0.x] btl/smcuda: Add atomic_wmb() before sm_fifo_write #12343

Merged
merged 1 commit into from
Feb 19, 2024

Conversation

lrbison
Copy link
Contributor

@lrbison lrbison commented Feb 15, 2024

Cherry pick #12338 to fix #12270 on v5.0.x

This change fixes open-mpi#12270

Testing on c7g instance type (arm64) confirms this change elminates
hangs and crashes that were previously observed in 1 in 30 runs of
IMB alltoall benchmark.  Tested with over 300 runs and no failures.

The write memory barrier prevents other CPUs from observing the fifo
get updated before they observe the updated contents of the header
itself.  Without the barrier, uninitialized header contents caused
the crashes and invalid data.

Signed-off-by: Luke Robison <lrbison@amazon.com>
(cherry picked from commit 71f378d)
@github-actions github-actions bot added this to the v5.0.3 milestone Feb 15, 2024
@wenduwan wenduwan merged commit 2f2ddaf into open-mpi:v5.0.x Feb 19, 2024
10 checks passed
bedroge added a commit to bedroge/easybuild-easyconfigs that referenced this pull request Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants