Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't output nvidia-smi failure in automated platform search #693

Closed
IAlibay opened this issue Apr 27, 2023 · 5 comments · Fixed by #699
Closed

Don't output nvidia-smi failure in automated platform search #693

IAlibay opened this issue Apr 27, 2023 · 5 comments · Fixed by #699
Assignees

Comments

@IAlibay
Copy link
Contributor

IAlibay commented Apr 27, 2023

We've had folks a bit confused about this message showing up when they don't have a GPU device:

/bin/sh: nvidia-smi: command not found

Is there a way to avoid outputting this to users?

@ijpulidos
Copy link
Contributor

Ok, I think I understand the confusion even though this lives at the DEBUG level of the logger, so it shouldn't be hard to filter that out or avoid printing it, unless we have deeper issues with the logging.

At the time this was implemented we wanted to make sure we have some place in the log that reports the GPUs found in the system, for debugging purposes. We were having cases where the simulation was falling back to run on CPU because there was some problem accessing the GPU/CUDA devices, so we just wanted to make sure the devices were available/found. On the other hand, I think we could try getting the exit code of that call to nvidia-smi and if the command isn't found for it to not output anything. Would that be better? (We would still output other errors/messages using it)

@mikemhenry
Copy link
Contributor

Yes, I think checking the error code

def _display_cuda_devices():
"""Query system nvidia-smi to get available GPUs indices and names in debug log."""
# Read nvidia-smi query, should return empty strip if no GPU is found.
cuda_query_output = os.popen("nvidia-smi --query-gpu=index,gpu_name --format=csv,noheader").read().strip()
# Split by line jump and comma
cuda_devices_list = [entry.split(',') for entry in cuda_query_output.split('\n')]
logger.debug(f"CUDA devices available: {*cuda_devices_list,}")
and if it fails, we can output something like "No GPU detected" since that is more clear with what is going on.

@mikemhenry
Copy link
Contributor

Actually all we need to do is capture the output of the os.popen call and not have it dump to std err

@mikemhenry mikemhenry self-assigned this Apr 27, 2023
@mikemhenry
Copy link
Contributor

I can fix this

@mikemhenry
Copy link
Contributor

@ijpulidos Why do we split by comma? # Split by line jump and comma I am using a slightly different method to invoke the subprocess call to nvidia-smi and the captured output looks like '0, NVIDIA GeForce RTX 3060 Laptop GPU\n' So I was thinking we would want to save it like 0, NVIDIA GeForce RTX 3060 Laptop GPU (in a list) so that users know the GPU index, thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants