-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robot controller error due to Vulkan crash #18
Comments
Thanks for reporting @cheneeheng . We've seen this arbitrary Can we confirm how you are using the machine specified:
|
Hi @btalb, aiks that does not sound good.
As for your questions: 1. Are you directly on the machine, and does it have a physical screen attached? 2. Are you connecting via SSH with window forwarding? 3. Are you using other remote software like remote desktop or alternatives? Thanks ! |
The core of the issue is Vulkan only seems to be happy when it is using a discrete GPU to render to a physical screen. The reason I ask all of those questions is that we have had these issues when using configurations that tamper with that relationship. For example:
I can see Vulkan is the cause as I see the following extra lines in the failure when I diff the logs:
I can also see those lines in the log you provided me. We need to dig a little deeper though to try and figure out why Vulkan is throwing those errors for the simulator:
vkcube.mp4Here's the command you need:
Thanks; I wish these things were simpler... |
I shall try them out once i get back to the lab on friday. I regularly work with programs using CUDA, so this is not the worst I have seen 😃 |
Hi, I encountered the same error log as @cheneeheng today when I tried to run
Last week when I run this command, there's no error but just no simulator window coming out after this:
I tried this command as suggested:
unfortunately I get this error
I am using a x2goclient via SSH to a remote machine with RTX 3080. I wonder if it has anything to do with the remote desktop that the simulator window doesn't appear? Thanks in advance! |
Thanks for the information @tyou1 . The behaviour you experienced last week with the window showing up and disappearing was a bug which was hiding the error log. We weren't correctly bubbling the crash log up to What's important to understand with any remote access systems is how they actually perform the rendering. I don't know much about
How does this tie in with BenchBot? NVIDIA's Isaac Simulator relies on hardware-accelerated rendering powered by Vulkan. If the system doing the rendering doesn't meet those requirements, then we get a crash from the simulator (a crash that should be much more verbose & explicit.... but a crash nonetheless). So from those requirements, there's only a couple of solutions I would expect to work for cases where your GPU is on a remote machine:
Hope this helps. I know it's not an ideal solution, but hardware-accelerated rendering under Linux with Vulkan support is something that's traditionally caused enough challenges by itself. Crisp solutions for remote use on top of this unfortunately aren't quite there yet. We're always interested in better solutions though. If anyone knows of better ways to enable remote hardware-accelerated rendering, especially on headless machines, we'd love to hear them. Unfortunately, it's not something I have time to dig too far into at the moment. |
@btalb the vulkaninfo command is returning this error: No protocol specified Update 1: Update 2: Update 2.1: Update 2.2: Issue can be closed if @btalb don't need anything more from my side. |
That's excellent @cheneeheng, great to hear! I'm not sure the relation of that series of errors (the first error I've never seen before, even before It's a little odd that line is causing issues with containers as it is running after all of the containers have started, so shouldn't effect them. But maybe there is some asynchronous behaviour causing race conditions. Thanks for pointing that out though, that's a really good find. I'll close this issue here, but feel free to open a new issue with any outdated commands you find in the documentation / tutorials. I'm always keen to fix those when they're found. Unfortunately, I'm a little documentation blind by this point. |
Reboot (x3) did not fix the error. Only the xhost command did. |
Hi May I ask where is this line Or is it only works with VNC server&client that the simulator window appears successfully? May I ask what specific VNC server & client that @david2611 use to run benchbot smoothly ? Thanks a lot! :) |
Here is the line : Line 347 in 783e9ca
You could try running |
Hi @tyou1 , good question. Only VNC will work as RDP generally creates a virtual X server which won't have the hardware accelerated rendering. @david2611 uses NoMachine, just make sure it's not using a virtual screen. There's plenty of simple VNC options out there also like:
The crucial thing is just to make sure it is mirroring a physical screen, and not creating a virtual one. |
Remmina also should be fine as a VNC client to conect to a server. |
Hi there,
I have just installed the benchbot successfully on a machine with RTX2080 8GB, 32GB ram, i7-9700K CPU.
But when i tried to run
benchbot_run --robot carter --env miniroom:1 --task semantic_slam:passive:ground_truth
I keep getting a robot controller error. (small snippet below and the full log in the file attached.)
I'm wondering if you guys ever encountered this.
Thanks!
Chen.
...
Supervisor is now available @ 'http://0.0.0.0:10000' ...
Waiting until a robot controller is found @ 'http://benchbot_robot:10000' ...
Found
Sending environment data & robot config to controller ...
Ready
################################################################################
####################### BENCHBOT ROBOT CONTROLLER ERROR ########################
################################################################################
ERROR: The BenchBot Robot Controller container has exited unexpectedly. This
should not happen under normal operating conditions. Please see the complete
log below for a dump of the crash output:
Robot controller is now available @ 'http://0.0.0.0:10000' ...
Waiting to receive valid config data...
172.20.0.102 - - [2021-03-22 15:04:04] "GET // HTTP/1.1" 200 152 0.000542
172.20.0.102 - - [2021-03-22 15:04:05] "POST //configure HTTP/1.1" 200 137 0.066839
Starting the requested real robot ROS stack ...
THE PROCESS STARTED BY THE FOLLOWING COMMAND HAS CRASHED:
sed -i "0,/"pose":/{s/("pose": )(.)/\1[0.7, 0, 0, -0.7, 1.2, 1.5, 0.3]/}" /benchbot/isaac_sdk/apps/carter/carter_sim/bridge_config/carter_full_config.json && perl -0777 -i -pe 's/"static_mesh".?]/"static_mesh":[{"name": "bottle"}, {"name": "cup"}, {"name": "knife"}, {"name": "bowl"}, {"name": "wine glass"}, {"name": "fork"}, {"name": "spoon"}, {"name": "banana"}, {"name": "apple"}, {"name": "orange"}, {"name": "cake"}, {"name": "potted plant"}, {"name": "mouse"}, {"name": "keyboard"}, {"name": "laptop"}, {"name": "cell phone"}, {"name": "book"}, {"name": "clock"}, {"name": "chair"}, {"name": "table"}, {"name": "couch"}, {"name": "bed"}, {"name": "toilet"}, {"name": "tv"}, {"name": "microwave"}, {"name": "toaster"}, {"name": "refrigerator"}, {"name": "oven"}, {"name": "sink"}, {"name": "person"}]/s' /benchbot/isaac_sdk/apps/carter/carter_sim/bridge_config/carter_full_config.json && cd "/benchbot/addons/benchbot_addons/benchbot-addons/envs_isaac_develop/environments" && .sim_package/IsaacSimProject.sh "/Game/AI_vol3_03_base/Maps/AI_vol3_scene_03" -isaac_sim_config_json= "/benchbot/isaac_sdk/apps/carter/carter_sim/bridge_config/carter_full.json" -windowed -ResX=960 -ResY=540 -vulkan -game
...
log.txt
The text was updated successfully, but these errors were encountered: