Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Report the error more gracefully when nvidia-smi not exist #1418

Merged
merged 9 commits into from
Aug 9, 2019

Conversation

liuzhe-lz
Copy link
Contributor

@liuzhe-lz liuzhe-lz commented Aug 6, 2019

Currently nni_gpu_tool crashes when nvidia-smi is not available.
This PR makes it output empty GPU status ("gpuCount": 0) instead.
In local mode, when gpuNum set to non-zero and gpuCount is zero, the experiment fail.
In remote mode, when a server's gpuCount is zero, a warning message will be written to log file. (If you think it should be an "error", please leave comment.)

Update:
Previously the script creates gpu_metrics file and then changes it's permission to 777. This will fail when the file is owned by another user, even if it's 777.
Now it creates gpu_metrics with umask 0.

Relate to: issue #1375

@suiguoxin suiguoxin changed the base branch from master to v1.0 August 6, 2019 09:18
@liuzhe-lz liuzhe-lz changed the title Output empty GPU metrics when nvidia-smi not exists Report the error more gracefully when nvidia-smi not exist Aug 8, 2019
@liuzhe-lz liuzhe-lz merged commit ee9246a into microsoft:v1.0 Aug 9, 2019
@liuzhe-lz liuzhe-lz deleted the smi-not-exist branch October 9, 2019 04:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants