-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to install benchbot #13
Comments
Thanks for reporting this to us @gmuraleekrishna. This error seems to be coming from the NVIDIA Container Toolkit, which passes the GPU from your host OS to Docker containers. It may be being caused by the two GPUs, or simply a bad state. I'd recommend the following:
Let me know how you go. |
@btalb Thanks for the reply. The steps 1-3 didn't work. We still get the error. The ouput for Step 4 is as follows.
|
Thanks for providing the logs @gmuraleekrishna . Sorry I can't provide an immediate solution; your configuration of 2xQuadros is not something we have available to test on in the lab. Next debugging step is to narrow down the scope as much as possible. The command below isolates whether it is an issue with BenchBot, or the Nvidia Container Toolkit:
Can you show me the results of that command please? If it returns something like below it is a BenchBot issue, otherwise we need to figure out what is going on with Nvidia's Container Toolkit
|
Hi @btalb, I am in his team. Thanks for your help. When running the cmd, I got:
BTW, the CUDA version we have in the machine is 11.2, does it cause any problem? |
Thanks @hagianga21 . That information is good, it confirms that the error is being caused by the NVIDIA Container Toolkit (example bug). I'm trying to piece through this, but can't reproduce errors on our side.
Otherwise, let's start digging into the NVIDIA Container Toolkit. Can you post the output of:
Then if that works:
And lastly paste the contents of the created |
The content of nvidia.log file
|
Sorry about the delay in sorting this. I'm having troubles finding a cause in any of the logs you've provided me. It definitely appears to be an issue with the Nvidia Container Toolkit. I've been scouring their documentation trying to find some hints, but haven't had any luck. I've got a few possible things to try, but they are more hoping than based on any intuition:
Sorry your first experiences with BenchBot have been this troublesome. Unfortunately, it seems to be caused by a dependency we have no control over. My next step is to file an issue over at NVIDIA/nvidia-docker, but I'll need those debugging logs from step 4 before I can. |
|
I've opened the issue above to try and get to the bottom of this. Sorry it's been anything but straightforward. I've struggled with finding clarity in what's openly available describing the NVIDIA Docker stack. Feel free to participate / add any extra info you think may be helpful directly to that issue. |
Hi, thanks so much for your support, I appreciate it. |
Any further progress on this @hagianga21 ? Someone in the lab had exactly the same error come up yesterday. They were using Docker for something completely unrelated to BenchBot. But they were able to fix it by installing Can I just double check what packages you have installed and where they are installed from. We're having real trouble being able to reproduce this error:
Here's the output of those commands on one of our working machines for reference:
|
Hi, sorry for the delay. About the output of double check:
|
Thanks @hagianga21 , I'm going to close this issue. But feel free to create another describing your setup if that simulator issue continues. |
When I run
benchbot_run --robot carter --env miniroom:1 --task semantic_slam:active:ground_truth
I get the following error
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
The
nvidia-smi
output is as followsThe text was updated successfully, but these errors were encountered: