Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report mode GPU is in and write out gpu UUID #699

Merged
merged 11 commits into from
Jun 2, 2023
Merged

Conversation

mikemhenry
Copy link
Contributor

@mikemhenry mikemhenry commented May 25, 2023

Description

Improve GPU logging information, inform user of potential issue with GPU compute mode, provide a better error message when openmm.OpenMMException: No compatible CUDA device is available is thrown.

Resolves #693
Resolves #697

Todos

  • Implement feature / fix bug
  • Add tests
  • Update documentation as needed
  • Update changelog to summarize changes in behavior, enhancements, and bugfixes implemented in this PR

Status

  • Ready to go

Changelog message

Fixes #693 #697 Improve GPU logging information, inform user of potential issue with GPU compute mode, provide a better error message when `openmm.OpenMMException: No compatible CUDA device is available` is thrown. Note, formatting of "CUDA devices available:" message has changed. 

@mikemhenry mikemhenry requested a review from ijpulidos May 25, 2023 21:03
@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Merging #699 (5fb719a) into main (3cf98eb) will decrease coverage by 0.77%.
The diff coverage is 50.00%.

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few suggestions.

openmmtools/multistate/multistatesampler.py Outdated Show resolved Hide resolved
openmmtools/multistate/multistatesampler.py Outdated Show resolved Hide resolved
@mikemhenry mikemhenry requested a review from ijpulidos May 30, 2023 21:29
@mikemhenry mikemhenry marked this pull request as ready for review May 30, 2023 21:29
Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! :)

openmmtools/multistate/multistatesampler.py Outdated Show resolved Hide resolved
openmmtools/multistate/multistatesampler.py Outdated Show resolved Hide resolved
@mikemhenry mikemhenry requested a review from ijpulidos May 30, 2023 22:37
@mikemhenry
Copy link
Contributor Author

Do you calling the command twice is okay? cf286f5#diff-2329790a58bd1510dc3f9b49f4ba83c331d71627f62e70d01f57677bf6f6a1b2R1786-R1787

Or should I check the error code instead of the try/except? If I check the error code then I don't have to worry about calling it twice. Thoughts?

@ijpulidos
Copy link
Contributor

ijpulidos commented May 31, 2023

Or should I check the error code instead of the try/except? If I check the error code then I don't have to worry about calling it twice. Thoughts?

Couldn't you just get the error message from the previous cuda_query_output? As with cuda_query_output.stdout.read()?

Knowingly it already failed, since we are catching the exception.

@mikemhenry
Copy link
Contributor Author

No because it throws a subprocess.CalledProcessError, it doesn't return a completed process object if it throws an error.

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on my local machine with two different GPUs, it works! LGTM!

@mikemhenry
Copy link
Contributor Author

I'm going to change it a bit, I'm worried that we could end up with an error on the first call, but no error on the second call to nvidia-smi or sommething

@mikemhenry mikemhenry requested a review from ijpulidos May 31, 2023 22:21
@mikemhenry
Copy link
Contributor Author

@ijpulidos sorry for changing my mind and the last second here, but I think this is better now

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! LGTM.

@mikemhenry mikemhenry enabled auto-merge (squash) June 1, 2023 23:30
@mikemhenry
Copy link
Contributor Author

Few flaky tests it seems, re-running them

@mikemhenry mikemhenry merged commit abb2f61 into main Jun 2, 2023
@mikemhenry mikemhenry deleted the fix/issue_693_697 branch June 2, 2023 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants