Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative way of running CARLA off-screen choosing GPU #225

Closed
nsubiron opened this issue Feb 16, 2018 · 26 comments
Closed

Alternative way of running CARLA off-screen choosing GPU #225

nsubiron opened this issue Feb 16, 2018 · 26 comments

Comments

@nsubiron
Copy link
Collaborator

Supposedly SDL2 already allows off-screen rendering on NVidia devices with CUDA enabled and selecting a specific device using

SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 ./CarlaUE4.sh

Please check if this works.

@seken
Copy link

seken commented Mar 21, 2018

Yes this works for me on linux

@felipecode
Copy link
Contributor

This also works for me in linux !

@felipecode
Copy link
Contributor

On recent experiments I observed a little bit of instability on the SDL solution.
In some situations, CARLA seems to run much slower. Even taking more than the Timeout of 10
seconds to give a response to the client.

@mahaoran1997
Copy link

I can run this without screen, but it seems that I can only run on GPU 0 no matter what number I set for SDL_HINT_CUDA_DEVICE.

@rmst
Copy link

rmst commented Aug 15, 2018

Can confirm @mahaoran1997 's observation that SDL_HINT_CUDA_DEVICE has no effect. I also observed that SDL_VIDEODRIVER=offscreen isn't necessary if there is no X server present, SDL seems to automatically select that method.

@nsubiron do you have any additional information about SDL rendering on CUDA? I couldn't find any infos and the official SDL site doesn't even list it as an option https://wiki.libsdl.org/FAQUsingSDL .

@bhaprayan
Copy link

Tried this, but returns a segmentation fault error

@crizCraig
Copy link

crizCraig commented Mar 14, 2019

Are you running with nvidia-docker @bhaprayan, i.e.
docker run --runtime=nvidia ...

@zlw21gxy
Copy link

zlw21gxy commented Mar 22, 2019

https://hg.libsdl.org/SDL/log?rev=SDL_HINT_CUDA_DEVICE in SDL source code nothing about SDL_HINT_CUDA_DEVICE
image

image
image

@zlw21gxy
Copy link

zlw21gxy commented Mar 22, 2019

@felipecode @rmst @nsubiron @seken what version SDL you use? @crizCraig

@bhaprayan
Copy link

@crizCraig Nopes, was running on a native machine. Got it to work though. I'm running with passing in -carla-server as a flag now. I think that's what solved the issue, don't recall though.

@qhaas
Copy link
Contributor

qhaas commented Jul 2, 2019

Like @rmst , running with SDL_HINT_CUDA_DEVICE=2 doesn't do anything on my system, neither does setting CUDA_VISIBLE_DEVICES=2 and NVIDIA_VISIBLE_DEVICES=2.

I also tried placing r.GraphicsAdapter=2 in 'CarlaUE4/Saved/Config/LinuxNoEditor/Engine.ini', as suggested in the UE4 forums, that too didn't work.

Finally, I tried using vglrun to specify the GPU, but still CARLA runs on GPU 0 instead of GPU 2 (zero-indexed) on the Ubuntu 16.04 system I'm testing on.

I'm using the version of SDL that ships with Ubuntu 16.04: libsdl2 2.0.4

UPDATE: Upon examining the SDL 2.0.x source code, I can't find evidence that SDL_HINT_CUDA_DEVICE is meaningful.

 # SDL_VIDEODRIVER exists
grep -lIr SDL_VIDEODRIVER SDL2-2.0.9 | wc -l
6
# SDL_HINT_CUDA_DEVICE does not
grep -lIr SDL_HINT_CUDA_DEVICE SDL2-2.0.9 | wc -l
0

Unless this variable is somehow generated and not hard coded or has been removed in SDL 2.0+, it appears to be a myth (one which is also propagated in other project discussions, not just CARLA). I have asked for clarification on the SDL web forums.

For those interested, here are some other combinations I've tried. Singularity image is the CARLA provided Docker Ubuntu 16.04 turned into a Singularity image, the host is Ubuntu 18.04:

# host info, truncated some stuff
nvidia-smi -L | wc -l
16

awk -F '='  '/VERSION=/ {print $2}' /etc/os-release 
"18.04.2 LTS (Bionic Beaver)"

nvidia-smi | awk '/Driver Version/ {print $3}'
418.67

singularity --version
singularity version 3.0.2-87

# create sif from upstream carla image
singularity pull docker://carlasim/carla:0.9.5

# create a writeable home directory with the binary the way CARLA entryscript expects
CARLA_WORKSPACE=`pwd`/workspace/home/carla
install -d $CARLA_WORKSPACE
singularity exec -C -H $CARLA_WORKSPACE images/carla_0.9.5.sif /bin/bash -c 'cp -r /home/carla/* .'

# image has SDL installed
singularity exec images/carla_0.9.5.sif /bin/bash -c 'apt list --installed | grep sdl'
libsdl2-2.0-0/now 2.0.4+dfsg1-2ubuntu2 amd64 [installed,local]

# Run standard way, runs on GPU 0, as expected
singularity run --nv -C -H $CARLA_WORKSPACE images/carla_0.9.5.sif

# still runs on GPU 0
SINGULARITYENV_SDL_VIDEODRIVER=offscreen SINGULARITYENV_SDL_HINT_CUDA_DEVICE=5 SINGULARITYENV_NVIDIA_VISIBLE_DEVICES=5 SINGULARITYENV_CUDA_VISIBLE_DEVICES=5 singularity run --nv -C -H $CARLA_WORKSPACE images/carla_0.9.5.sif

# on the offhand chance the variables must be on the host, in the container, and the entryscript is canceling them out... still runs on GPU 0
SINGULARITYENV_SDL_VIDEODRIVER=offscreen SDL_VIDEODRIVER=offscreen SINGULARITYENV_SDL_HINT_CUDA_DEVICE=5 SDL_HINT_CUDA_DEVICE=5 SINGULARITYENV_NVIDIA_VISIBLE_DEVICES=5 NVIDIA_VISIBLE_DEVICES=5 SINGULARITYENV_CUDA_VISIBLE_DEVICES=5 CUDA_VISIBLE_DEVICES=5 singularity exec --nv -C -H $CARLA_WORKSPACE images/carla_0.9.5.sif CarlaUE4/Binaries/Linux/CarlaUE4 CarlaUE4 -carla-server

# run without singularity, still on device 0 (not surprising, SDL isn't installed on the host)
SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=5 NVIDIA_VISIBLE_DEVICES=5 CUDA_VISIBLE_DEVICES=5 workspace/home/carla/CarlaUE4/Binaries/Linux/CarlaUE4 CarlaUE4 -carla-server

I asked over in the nvidia forums for a general solution to selecting which GPU an OpenGL process runs on:

@qhaas
Copy link
Contributor

qhaas commented Sep 21, 2019

UPDATE: CARLA 0.9.6 will now let you select which GPU to run on, for some reason this wasn't working in CARLA 0.9.5 (possibly an artifact of the way UE4 was built).

Our contacts at nVidia found where the mysterious SDL_HINT_CUDA_DEVICE variable exists, it was in the UE4 source, NOT the upstream SDL or CARLA codebase. The UE4 vendor has their own flavor of SDL in their repository that contains this variable, which explains why I didn't find it while grepping the upstream SDL codebase.

$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 ./CarlaUE4.sh &> /tmp/carla0.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 ./CarlaUE4.sh -carla-world-port=5010 &> /tmp/carla1.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=2 ./CarlaUE4.sh -carla-world-port=5020 &> /tmp/carla2.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=3 ./CarlaUE4.sh -carla-world-port=5030 &> /tmp/carla3.txt &
$ nvidia-smi
Sat Sep 21 08:48:41 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:06.0 Off |                    0 |
| N/A   35C    P0    48W / 300W |    928MiB / 16130MiB |     37%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:07.0 Off |                    0 |
| N/A   34C    P0    61W / 300W |    928MiB / 16130MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   33C    P0    61W / 300W |    928MiB / 16130MiB |     20%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    60W / 300W |    928MiB / 16130MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     56887    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    1     57023    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    2     57147    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    3     57273    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
+-----------------------------------------------------------------------------+

Docker is slightly less intuitive. The environment variable SDL_HINT_CUDA_DEVICE=1 is ignored inside the Docker container for reasons I've yet to determine. Further, docker run --gpus 1 specifies only one GPU is visible, but it defaults to GPU index 0. You need to add the 'device=' prefix, e.g. --gpus 'device=1' to force it to only expose GPU 1 to the container. By restricting GPU visibility, one can force CARLA to run on a specific GPU.

@atroccoli
Copy link

Following up on @qhaas comment, it is possible to have GPU selection working inside docker by changing the FROM nvidia/opengl:1.0-glvnd-runtime-ubuntu16.04 in Release.Dockerfile to FROM nvidia/cudagl:10.0-runtime-ubuntu16.04. With this new image, the SDL_HINT_CUDA_DEVICE=1 achieves the desired effect.

@nachovizzo
Copy link

UPDATE: CARLA 0.9.6 will now let you select which GPU to run on, for some reason this wasn't working in CARLA 0.9.5 (possibly an artifact of the way UE4 was built).

Our contacts at nVidia found where the mysterious SDL_HINT_CUDA_DEVICE variable exists, it was in the UE4 source, NOT the upstream SDL or CARLA codebase. The UE4 vendor has their own flavor of SDL in their repository that contains this variable, which explains why I didn't find it while grepping the upstream SDL codebase.

$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=0 ./CarlaUE4.sh &> /tmp/carla0.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=1 ./CarlaUE4.sh -carla-world-port=5010 &> /tmp/carla1.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=2 ./CarlaUE4.sh -carla-world-port=5020 &> /tmp/carla2.txt &
$ SDL_VIDEODRIVER=offscreen SDL_HINT_CUDA_DEVICE=3 ./CarlaUE4.sh -carla-world-port=5030 &> /tmp/carla3.txt &
$ nvidia-smi
Sat Sep 21 08:48:41 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:06.0 Off |                    0 |
| N/A   35C    P0    48W / 300W |    928MiB / 16130MiB |     37%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:00:07.0 Off |                    0 |
| N/A   34C    P0    61W / 300W |    928MiB / 16130MiB |     17%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   33C    P0    61W / 300W |    928MiB / 16130MiB |     20%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    60W / 300W |    928MiB / 16130MiB |     18%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     56887    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    1     57023    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    2     57147    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
|    3     57273    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   917MiB |
+-----------------------------------------------------------------------------+

Docker is slightly less intuitive. The environment variable SDL_HINT_CUDA_DEVICE=1 is ignored inside the Docker container for reasons I've yet to determine. Further, docker run --gpus 1 specifies only one GPU is visible, but it defaults to GPU index 0. You need to add the 'device=' prefix, e.g. --gpus 'device=1' to force it to only expose GPU 1 to the container. By restricting GPU visibility, one can force CARLA to run on a specific GPU.

This doesn't work for me. I'm running CARLA 0.9.6-29 on Ubuntu 18.04 the SDL_HINT_CUDA_DEVICE has literally NO effect on my setup

Any suggestions?

@pziecina
Copy link

From CARLA 0.9.5 release GPU may also be selected with CUDA_VISIBLE_DEVICES environment variable. This works for me while starting CARLA in docker with multiple GPUs. Possibly also slightly older releases supports this.

@nachovizzo
Copy link

Using CUDA_VISIBLE_DEVICES makes no difference in my setup 😢

Wed Nov 27 13:04:17 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P400         Off  | 00000000:17:00.0  On |                  N/A |
| 34%   39C    P0    N/A /  N/A |   1172MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:B3:00.0 Off |                  N/A |
| 41%   30C    P8    21W / 250W |      0MiB / 10986MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic

@ess476
Copy link

ess476 commented Dec 3, 2019

This appears to no longer work on Ubuntu. Any suggestions?

@pziecina
Copy link

pziecina commented Dec 4, 2019

I've just downloaded official carla 0.9.6 release and tested with:

DISPLAY= CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 ./CarlaUE4.sh -opengl

And nvidia-smi shows carla running on 2nd card.

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2085      G   /usr/lib/xorg/Xorg                           282MiB |
|    1     17473    C+G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   669MiB |
  • By setting DISPLAY to none string nvidia off-screen driver is forced
  • In offscreen mode you have to force opengl instead of default vulkan
  • To have same indexing as in nvidia-smi you have to set CUDA_DEVICE_ORDER=PCI_BUS_ID

@nachovizzo
Copy link

Great! But still does not work on my machine. But now at least I see the process showing up on the right GPU

DISPLAY= CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 ./CarlaUE4.sh -opengl
4.22.3-0+++UE4+Release-4.22 517 0
Disabling core dumps.
LowLevelFatalError [File:Unknown] [Line: 102] 
Exception thrown: bind: Address already in use
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554 
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119 
Malloc Size=123824 LargeMemoryPoolOffset=254960 
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1614      G   /usr/lib/xorg/Xorg                           278MiB |
|    0      8992      G   /usr/bin/compiz                              291MiB |
|    0     13095      G   ...uest-channel-token=10380395948321160125   245MiB |
|    1      3445      G   .../Binaries/Linux/CarlaUE4-Linux-Shipping   115MiB |
+-----------------------------------------------------------------------------+

Do you mind sharing to X11 config file, I guess that might be related

@pziecina
Copy link

pziecina commented Dec 4, 2019

Since with DISPLAY set to empty string, simulator is run offscreen mode - it communicates with NVIDIA GPU directly skipping X server. It runs on systems with and without xserver.

This error is caused because you have already started simulator which is binded to 2000 port

Exception thrown: bind: Address already in use

@nachovizzo
Copy link

@pawel-ziecina thanks a lot! I had a service running in the background using that port. Now it's working like a charm! Thanks a lot!

@egorfolley
Copy link

I have got this error:

4.22.1-0+++UE4+Release-4.22 517 0
Disabling core dumps.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=111328 LargeMemoryPoolOffset=242464
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=111328 LargeMemoryPoolOffset=242464
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=98832 LargeMemoryPoolOffset=229968
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)

@1469cgw
Copy link

1469cgw commented Mar 6, 2020

How do I run the Carla training model on a Linux remote server without root access

@1469cgw
Copy link

1469cgw commented Mar 6, 2020

How can i do when face it?????please help me

[2020.03.06-08.36.02:857][ 0]LogInit: Using OS detected language (en-US).
[2020.03.06-08.36.02:857][ 0]LogInit: Using OS detected locale (en-US).
[2020.03.06-08.36.02:859][ 0]LogTextLocalizationManager: No specific localization for 'en-US' exists, so the 'en' localization will be used.
Signal 11 caught.
Malloc Size=131076 LargeMemoryPoolOffset=131092
CommonLinuxCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=196655

@germanros1987
Copy link
Member

This conversation got totally obsolete. The current official way to run CARLA offscreen is using Nvidia-docker. Please open a new issue if further discussion is needed.

@dbersan
Copy link

dbersan commented May 10, 2021

This conversation got totally obsolete. The current official way to run CARLA offscreen is using Nvidia-docker. Please open a new issue if further discussion is needed.

Then this tutorial is also obsolete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests