Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in ICL eval #235

Merged
merged 3 commits into from
May 30, 2023
Merged

Fix a race condition in ICL eval #235

merged 3 commits into from
May 30, 2023

Conversation

dakinggg
Copy link
Collaborator

There was a race condition where rank N could delete the ICL file before other ranks managed to load it. This fixes that by only allowing local rank 0 to delete ICL files.

Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dakinggg
Copy link
Collaborator Author

A bit hard to "prove" this works, but before this change ~3/5 runs were failing (crash before starting eval) on r1z1, and now 5/5 succeed (make it to the start of eval).

@dakinggg dakinggg merged commit f3f2d26 into mosaicml:main May 30, 2023
@dakinggg dakinggg deleted the race branch June 1, 2023 00:10
bmosaicml pushed a commit that referenced this pull request Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants