Multiple designated GPU question #775

LeoLau94 · 2019-02-22T14:13:43Z

Short summary about the issue/question:
你们好，由于我与他人共用一台服务器，所以我想在3个指定编号的空闲GPU上同时跑3个trials，但是我发现，无论我怎么指定可见的GPU设备，自动生成的trial执行脚本run.sh里面，始终会从编号0的开始生成export CUDA_VISIBLE_DEVICES语句。我于是想到使用docker run的-e NVIDIA_VISIBLE_DEVICES=1,2,3来控制GPU的可见性，然而还是没有用，因为在docker里面使用nvidia-smi得到的GPU编号是重新编号的，而你们生成的export CUDA_VISIBLE_DEVICES还是根据1,2,3来生成的，所以生成的命令在docker容器里是不对的。
后来我看了一下源码（nni/src/nni_manager/training_service/local/gpuScheduler.ts），你们似乎是使用node-nvidia-smi来获取GPU的编号的，于是我在docker容器里面执行了nvidia-smi -q，发现这样获取到的是真实的GPU编号，这样就很不灵活（摊手，叹气），而且你们貌似没有提供指定GPU的方法。
非常希望你们能尽快回复我，很急很关键，谢谢你们的开源和贡献，辛苦大佬们了。
Hello, I want to start 3 trials concurrently in 3 designated free GPUs. According to the tutorial, I just need to set the trialConcurrency to 3 and trial->gpuNum to 1 in config.yaml. However, I found it impossible to do that. Because the automatically generated run.sh of
trials always have statement export CUDA_VISIBLE_DEVICES started from number 0, no matter how I set visible devices. Then, I tried another way: use -e NVIDIA_VISIBLE_DEVICES=1,2,3 of docker run to build a new container in which only the number 1, 2, 3 GPUs are visible, and the GPUs are renumbered to 0, 1, 2. I failed again in such way because the automatic generated statement export CUDA_VISIBLE_DEVICES in run.sh started from 1, which is absolutely wrong!
Finally, I checked out the source code (nni/src/nni_manager/training_service/local/gpuScheduler.ts). It seems that you use node-nvidia-smi to get the information of GPU, such that you get the real number (not the renumbered one) of GPU device. Yes, I try nvidia-smi -q in the container mentioned above and I get 1,2,3 instead of the renumbered ones (0, 1, 2) in nvidia-smi. Well, I don't konw how to deal with it. What else could I say ? Fxxk you, nVidia? (Linus Torvalds approved)
Please response as soon as possible, thank you and your great contribution!

nni Environment:

nni version:0.5.1
nni mode(local|pai|remote):local
OS:Ubuntu 16.04
python version:3.6.5
is conda or virtualenv used?: conda
is running in docker?:yes

Anything else we need to know:
Nope.

The text was updated successfully, but these errors were encountered:

leelaylay · 2019-02-23T15:42:21Z

Thank you for your suggestion. It seems that NNI can not do such thing.
I think it will be more flexible if users can assign GPU index to every trial.

LeoLau94 · 2019-02-24T02:50:46Z

Thank you for your suggestion. It seems that NNI can not do such thing.
I think it will be more flexible if users can assign GPU index to every trial.

Sad😭...
Actually, I don't request assigning GPU index to each trial. All I want is setting different visibility of different GPU devices for NNI.
And it seems feasible if you don't have to get the real number of GPU devices.

JohnAllen · 2019-04-01T19:38:10Z

I have what seems to be a similar problem: I have 3 GPUs on a machine and want to be able to run separate but concurrent trials. Since I can't change the params passed to my training script I cannot vary the GPU id in any way. I would have to do something like have 3 separate configs and 3 experiments which obviously doesn't really work as I could end up repeating hyper-parameters in my search.

scarlett2018 · 2019-05-05T07:36:27Z

@LeoLau94 - we had released the support for multiple designated GPU in 0.7, try it out =).

scarlett2018 added the enhancement New feature or request label Feb 25, 2019

leelaylay mentioned this issue Mar 29, 2019

Allow to run multiple trials on one GPU #938

Closed

scarlett2018 assigned chicm-ms Apr 4, 2019

chicm-ms added this to the April 2019 Release milestone Apr 10, 2019

chicm-ms mentioned this issue Apr 16, 2019

Designated gpu devices for NNI trial jobs #991

Merged

scarlett2018 mentioned this issue Apr 17, 2019

Iteration Plan for April 2019 #944

Closed

19 tasks

scarlett2018 closed this as completed May 5, 2019

scarlett2018 added user raised GPU-usage labels Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple designated GPU question #775

Multiple designated GPU question #775

LeoLau94 commented Feb 22, 2019 •

edited

Loading

leelaylay commented Feb 23, 2019

LeoLau94 commented Feb 24, 2019 •

edited

Loading

JohnAllen commented Apr 1, 2019

scarlett2018 commented May 5, 2019

Multiple designated GPU question #775

Multiple designated GPU question #775

Comments

LeoLau94 commented Feb 22, 2019 • edited Loading

leelaylay commented Feb 23, 2019

LeoLau94 commented Feb 24, 2019 • edited Loading

JohnAllen commented Apr 1, 2019

scarlett2018 commented May 5, 2019

LeoLau94 commented Feb 22, 2019 •

edited

Loading

LeoLau94 commented Feb 24, 2019 •

edited

Loading