Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some issues about verify_image_on_cuda.py #334

Closed
feriorior opened this issue Mar 14, 2023 · 26 comments
Closed

some issues about verify_image_on_cuda.py #334

feriorior opened this issue Mar 14, 2023 · 26 comments

Comments

@feriorior
Copy link

Thank you for this wonderful work!!!!
Unfortunately, when running this program (verify_image_on_cuda.py) with a screenless remote server (Ubuntu 18.04), I got the some errors. Further, I define the env = MetaDriveEnv(dict(environment_num=1000, start_seed=1010, image_on_cuda=True, traffic_density=0.05,)), I get the similar errors. I want to know how to use cuda_version Env in a remote server without screens.

"Known pipe types:
glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
:display(error): The 'textures_power_2' configuration is set to 'none', meaning
that non-power-of-two texture support is required, but the video
driver I'm trying to use does not support non-power-of-two textures.
:device(warning): /dev/input/event2 is not readable, some features will be unavailable.
Traceback (most recent call last):
File "verify_image_on_cuda.py", line 57, in
_test_rgb_camera_as_obs(args.render, image_on_cuda=not args.native)
File "verify_image_on_cuda.py", line 28, in _test_rgb_camera_as_obs
env.reset()
File "/home/vehicle/meta/metadrive/envs/base_env.py", line 372, in reset
self.lazy_init() # it only works the first time when reset() is called to avoid the error when render
File "/home/vehicle/meta/metadrive/envs/base_env.py", line 259, in lazy_init
engine = initialize_engine(self.config)
File "/home/vehicle/meta/metadrive/engine/engine_utils.py", line 12, in initialize_engine
cls.singleton = cls(env_global_config)
File "/home/vehicle/meta/metadrive/engine/base_engine.py", line 29, in init
EngineCore.init(self, global_config)
File "/home/siao/vehicle/meta/metadrive/engine/core/engine_core.py", line 243, in init
use_occlusion_maps=False
File "/home/vehicle/meta/metadrive/engine/core/our_pbr.py", line 51, in init
use_occlusion_maps=use_occlusion_maps
File "/home/miniconda3/envs/md/lib/python3.7/site-packages/simplepbr/init.py", line 136, in init
self._setup_tonemapping()
File "/home/miniconda3/envs/md/lib/python3.7/site-packages/simplepbr/init.py", line 264, in _setup_tonemapping
self.tonemap_quad.set_shader(tonemap_shader)
AttributeError: 'NoneType' object has no attribute 'set_shader'"

@QuanyiLi
Copy link
Member

QuanyiLi commented Mar 14, 2023

Hi,

:display:x11display(error): Could not open display ":0.0".

This error is caused by Panda3D graphics pipeline which doesn't support headless rendering in the official distribution. You have to compile panda3d on your headless machine and install it in your conda env. For more details, see: https://metadrive-simulator.readthedocs.io/en/latest/install.html#install-metadrive-with-headless-rendering Besides, sudo is required in this case as some libs might be required for compiling.

:display(error): The 'textures_power_2' configuration is set to 'none'

Generally, textures on the graphics card will be scaled up/down to power-2 size, like 256, 512, etc. In this case, you are not allowed to set random image size as your camera observation, but only power-2 size image observation can be returned correctly. To remove this limit, we set loadPrcFileData("", "textures-power-2 none") in class EngineCore. For your problem, you can simply remove this line and use a power_2 size image as the observation, like setting rgb_camera=(512,512) or window_size=(128, 128). Then everything will go well and this error will be suppressed. I will consider adding this option to env_config so we can turn it off/on it quickly.

However, we have to say that we didn't test the headless pipeline for a long time, and thus you may encounter other problems beyond the documentation. Let's stay in touch. Besides, could you provide more information about your platform like your os, GPU, driver, and CUDA? Running nvidia-smi is enough I think. If we have machines in similar condition, we can, probably, help you figure out the installation.

@feriorior
Copy link
Author

Thanks for your reply.
I encounter jpeg-relevant bugs.
But I am not sure where is these path "–jpeg-incdir /path/to/your/jpeg/include and –jpeg-libdir /path/to/your/jpeg/lib.".
More details about nvidia-smi information as follows:

" NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 "

@QuanyiLi
Copy link
Member

According to our doc, you don't need to specify the jpeg path. All you need is to specify your python path in the following command:

python ./makepanda/makepanda.py --everything --no-x11 --no-opencv --no-fmodex --use-egl --no-gtk3\
  --python-incdir /path/to/your/conda_env/include/ \
  --python-libdir /path/to/your/conda_env/lib/ \
  --thread 8 --wheel

@QuanyiLi
Copy link
Member

Well, I think recompiling panda is not required now. The error is raised due to Graphics.renderIntoTexture(). I tested the headless mode on a server with 1080ti, and got a rendered image as follows after skipping the broken function.
main_1678886366 1613483

I will fix the headless mode issue soon

@QuanyiLi
Copy link
Member

Also, I found several rendering-related problems, see #336. I will fix them ASAP

@QuanyiLi QuanyiLi mentioned this issue Mar 15, 2023
2 tasks
@feriorior
Copy link
Author

Thanks for your help. I recompile the panda and a new error. I'm sure the same problems with you. I put my results as follows:

"Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0'].
:device(warning): /dev/input/event2 is not readable, some features will be unavailable.
WARNING:root:You may using too large buffer! The height is 256, and width is 256. It may lower the sample efficiency! Considering reduce buffer size or using cuda image by set [image_on_cuda=True].
Bullet physics world is launched successfully!
Known pipe types:
eglGraphicsPipe
(all display modules loaded.)
:display(error): Could not get requested FrameBufferProperties; abandoning window.
requested: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 multisamples=16 force_hardware
got: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 stencil_bits=8 multisamples=8 force_hardware
Error happens when drawing scene in offscreen mode!"

@QuanyiLi
Copy link
Member

QuanyiLi commented Mar 15, 2023

Hi,

I create a new PR #337, which can successfully run on my headless machine without compiling Panda3D and any further actions. Could you help me test it?

Just switch to fix-offscreen-rendering branch and reinstall all dependencies including panda3d via pip install -e . and run following command:

python -m metadrive.examples.verify_headless_installation

Note: donot use the compiled panda3d, the officially distributed one is fine. I already tested it.

The script will generate Three pairs of images to examples directory, one from agent observation, the other from panda3d internal rendering buffer. Please fetch and check those images from the cluster or server to ensure MetaDrive can draw scenes and capture images correctly.

@feriorior
Copy link
Author

Thanks a lot for your quick update. This branch can obtain complete visual images & depth images.
I noticed that this testing file uses some constant hyper-parameters to test this feature, and it may be unable to generate visual observations with CUDA right now. Looking forward to your update.

Besides, the RLLib is still not friendly to new comers. Turning into Stable Baseline 3 (SB3) might be more pythonic solution. Hope the anthors give more training examples.

@pengzhenghao
Copy link
Member

pengzhenghao commented Mar 16, 2023 via email

@QuanyiLi
Copy link
Member

@siaoliu I updated the script, now you can use python -m metadrive.examples.verify_headless_installation --cuda --camera ["main"/"rgb"/"depth"] to test each camera with or without the cuda. I didn't follow previous ways that testing all cameras together. In this way, we can avoid a subtle problem.

Besides, I noticed that your machine's cuda is < 12.0 which may not support you to use the cuda pipeline. Consider updating it and enjoy!

@QuanyiLi
Copy link
Member

I just merged this PR to main. You can pull the latest main for the test!

@feriorior
Copy link
Author

Thanks, I would update my CUDA version soon.
I'm not sure the CUDA12 is whether be incompatible with the torch version (I remember torch may run with a inherent toolkit).

@pengzhenghao
Copy link
Member

pengzhenghao commented Mar 18, 2023 via email

@feriorior
Copy link
Author

Perhaps off-heading envs still invokes the display device?
I am not sure why this happens : Failed to destroy EGL context: EGL_BAD_DISPLAY.

"Bullet physics world is launched successfully!
Known pipe types:
glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
The observation is a dict with numpy arrays as values: {'image': (512, 512, 3, 3), 'state': (19,)}
rgb_camera Test result:
Headless mode Offscreen render launched successfully!
images named 'rgb_camera_from_observation.png' and 'rgb_camera_from_buffer.png' are saved to /home/vehicle/metadrive/metadrive/examples. Open it to check if offscreen mode works well
:display:egldisplay(error): Failed to destroy EGL context: EGL_BAD_DISPLAY
:display:egldisplay(error): Failed to terminate EGL display: EGL_BAD_DISPLAY
"

@QuanyiLi
Copy link
Member

If the rendered image is ok, just ignore this error. It is something raised when closing the environment and the game engine, so won't affect the application. And yes, the offscreen rendering still looks for the display device, while stopping sending rendered image to the screen if can not find one. This is what we called the headless situation. In this case, our game engine can follow the original OpenGL to render the same content as there is a screen. Besides, other rendering pipeline like EGL might be launched (without using), and this error is raised by EGL. EGL is a solution for rendering without X-server on a headless machine.

As for the cuda version. 12.0 is for the cuda runtime, which is the cuda in your system and can be checked by nvidia-smi, while the cuda for torch usually stands for cuda toolkit. You can always have different versions for both, and a lower version toolkit can always be used. Simple conda install cudatoolkit==11.3 or whatever is fine or personally, I like pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

@feriorior
Copy link
Author

@pengzhenghao Can you provide a simple training example (like SAC or PPO (sb3)) as a baseline on basic MetaDriveEnv. Maybe due to some error hyper-parameters, it is hard for me to reproduce the accuracy obtained in the paper. Looking forward your help !!!
Besides, I think the observations can add a RGB-D as a basic visual format, which is useful in the realistic scenarios.

@leejiahe
Copy link

leejiahe commented Apr 3, 2023

Hi,
Similar to siaoliu, I had faced the same issue. I am using the new version of metadrive of MetaDrive-0.3.0.1
NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4

I am running the software on 4x NVIDIA Tesla V100 32GB GPU. Hence, the difference may be I must run the software headless.

I had also followed the steps you mentioned to make panda. However, the issue still persists.

According to our doc, you don't need to specify the jpeg path. All you need is to specify your python path in the following command:

python ./makepanda/makepanda.py --everything --no-x11 --no-opencv --no-fmodex --use-egl --no-gtk3\
  --python-incdir /path/to/your/conda_env/include/ \
  --python-libdir /path/to/your/conda_env/lib/ \
  --thread 8 --wheel

Can I ask if there is any remedy or precedent to run Metadrive headless on V100 servers?

@QuanyiLi
Copy link
Member

QuanyiLi commented Apr 3, 2023

Hi @leejiahe

I think no special treatment is required now. I remember that the doc's installation section has already been updated. It says now only a line of pip install -e . can make all things work, including headless running. Could you follow the instructions in doc and give us feedback? I tested headless mode on Nvidia-1080/A6000 and I think V100 should work as well.

Quanyi

@leejiahe
Copy link

leejiahe commented Apr 3, 2023

Dear @QuanyiLi ,

Thank you for your fast reply.

I had created a new conda environment and followed the installation instructions which contains the three lines of code (that includes pip install -e . as you mentioned). However, it still doesn't work, my error message is different from siaoliu's, below is the error message when I typed python -m metadrive.examples.verify_headless_installation

Known pipe types:
glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
:display:egldisplay(warning): Couldn't initialize the default EGL display: EGL_NOT_INITIALIZED
:display(warning): FrameBufferProperties available less than requested.
requested: depth_bits=1 color_bits=3 red_bits=1 green_bits=1 blue_bits=1 alpha_bits=1 multisamples=8 back_buffers=1 force_hardware
got: depth_bits=32 color_bits=24 red_bits=8 green_bits=8 blue_bits=8 alpha_bits=8 back_buffers=1 force_hardware
:display(error): Could not get requested FrameBufferProperties; abandoning window.
requested: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 multisamples=16 force_hardware
got: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 stencil_bits=8 multisamples=1 force_hardware
Traceback (most recent call last):
File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/stevenlee/metadriving/metadrive/metadrive/examples/verify_headless_installation.py", line 11, in
verify_installation(args.cuda, args.camera)
File "/home/stevenlee/metadriving/metadrive/metadrive/tests/test_installation.py", line 80, in verify_installation
capture_headless_image(cuda)
File "/home/stevenlee/metadriving/metadrive/metadrive/tests/test_installation.py", line 29, in capture_headless_image
env.reset()
File "/home/stevenlee/metadriving/metadrive/metadrive/envs/base_env.py", line 375, in reset
self.lazy_init() # it only works the first time when reset() is called to avoid the error when render
File "/home/stevenlee/metadriving/metadrive/metadrive/envs/base_env.py", line 262, in lazy_init
engine = initialize_engine(self.config)
File "/home/stevenlee/metadriving/metadrive/metadrive/engine/engine_utils.py", line 12, in initialize_engine
cls.singleton = cls(env_global_config)
File "/home/stevenlee/metadriving/metadrive/metadrive/engine/base_engine.py", line 29, in init
EngineCore.init(self, global_config)
File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/engine_core.py", line 251, in init
use_occlusion_maps=False
File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/our_pbr.py", line 51, in init
use_occlusion_maps=use_occlusion_maps
File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/site-packages/simplepbr/init.py", line 136, in init
self._setup_tonemapping()
File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/our_pbr.py", line 87, in _setup_tonemapping
self.tonemap_quad.set_shader(tonemap_shader)
AttributeError: 'NoneType' object has no attribute 'set_shader'

Thank you so much for your help

@QuanyiLi
Copy link
Member

QuanyiLi commented Apr 3, 2023

It must be caused by interactions between OpenGL and GPU. I tried it again on a new A5000 GPU cluster. Everything works fine. My output message after running the same script is:

Known pipe types:
  glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
WARNING:root:You may using too large buffer! The height is 512, and width is 512. It may lower the sample efficiency! Considering reduce buffer size or using cuda image by set [image_on_cuda=True].
main_camera Test result:
Headless mode Offscreen render launched successfully!
images named 'main_camera_from_observation.png' and 'main_camera_from_buffer.png' are saved to /home/quanyi/metadrive/metadrive/examples. Open it to check if offscreen mode works well
Aborted (core dumped)

I guess it is due to some wrong settings of FrameBufferProperties, but I can not help more on this. I created an issue for this in the Panda3D forum: https://discourse.panda3d.org/t/got-nothing-returned-when-calling-rendersceneinto-in-headless-mode/29253

Let's just wait for their feedback!

@QuanyiLi
Copy link
Member

QuanyiLi commented Apr 4, 2023

Hi, Jiahe @leejiahe

According to the reply here: https://discourse.panda3d.org/t/got-nothing-returned-when-calling-rendersceneinto-in-headless-mode/29253, one possible reason is that virtual framebuffer can't be created on your Linux. Could you try: sudo apt-get install xvfb xserver-xephyr -y to allow creating the virtual frame buffer?

Besides, have you tried other offscreen rendering simulators on your machine? Especially, those rendering scenes via OpenGL. They will also fail due to the lack of this function.

Quanyi

@leejiahe
Copy link

leejiahe commented Apr 4, 2023

Dear @QuanyiLi ,

Thank you for your help.

I had tried installing the libraries, as you suggested, but the problem still persists. It still does not create a visual frame buffer.

Actually, I had tried installing DonkeyCar and AirSim, but I also faced the same issue, where I can't render virtually.

I chanced upon this just now, is there something equivalent for MetaDrive?
CARLA off-screen GPU Selection

@QuanyiLi
Copy link
Member

QuanyiLi commented Apr 4, 2023

@leejiahe Thanks for sharing. But I don't believe that Panda3D supports SDL, so the GPU selection has nothing to do with your problem.

I wish I could help more, but sorry about it. Maybe you could try looking through some topics with the keywords: headless x-11 server/OpenGL headless to see if you can get some hints. Besides, I guess some remote desktop services can not be launched on your machine due to the same problem. Thus related topics are worth reading.

@leejiahe
Copy link

leejiahe commented Apr 4, 2023

Dear @QuanyiLi ,

Perfectly understandable. Thank you for your help!

@AHPUymhd
Copy link

AHPUymhd commented Apr 13, 2023

图片
what this issue?

@QuanyiLi
Copy link
Member

@AHPUymhd
Try pip install panda3d-gltf==0.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants