Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault (Core Dumped ) #5

Open
DhananjayAshok opened this issue Nov 26, 2020 · 32 comments
Open

Segmentation Fault (Core Dumped ) #5

DhananjayAshok opened this issue Nov 26, 2020 · 32 comments

Comments

@DhananjayAshok
Copy link

Hi there,

When I try to run SoftGym, The PyFleX compilation works just fine, but when I run the line in the example python file that goes:
env = normalize(SOFTGYM_ENVSargs.env_name)

I get the error:
Unable to initialize SDLCould not initialize GL extensions
Reshaping
Segmentation fault (core dumped)

Do you have any idea what could be causing this?

System specifications:
Ubuntu 18.04, CUDA 9.1

OpenGL applications work fine on my system, for example glxinfo and glxgears work as expected.

@Xingyu-Lin
Copy link
Owner

We have not tested the compilation steps for ubuntu 18. Are you using the docker?

@liduanken
Copy link

I have encountered the same problem. BTW I am using a docker.

@DhananjayAshok
Copy link
Author

I am not using the docker, have used the other installation method. I am doing this whole process on a compute cluster and for admin reasons cannot use docker.

@Xingyu-Lin
Copy link
Owner

If you are using a cluster, then you probably do not have a display enviornment for GL applications. Can you try running softgym with the headless option on?

@FranBesq
Copy link

FranBesq commented Nov 26, 2020

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)

I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

@DhananjayAshok
Copy link
Author

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)

@yufeiwang63
Copy link
Collaborator

Looks like this is an EGL error. Are you sure you have all correct EGL libraries installed on the cluster? maybe try this:
apt-get install libglfw3 libgles2-mesa-dev

@DhananjayAshok
Copy link
Author

Yup, these libraries are all installed.

@liduanken
Copy link

liduanken commented Nov 28, 2020

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)

I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get:
Could not initialize GL extensions
CUDA 11 and Ubuntu 18.04
I wonder if you have some ideas, thank you

@FranBesq
Copy link

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)
I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get:
Could not initialize GL extensions
CUDA 11 and Ubuntu 18.04
I wonder if you have some ideas, thank you

Have you tried @yufeiwang63 answer? If it didn't work here are some things I would try. Although I don't want to send you on a wild goose chase.

  • I described the steps I followed to install on my fork

  • @Xingyu-Lin links this article on his docker.md wich may be helpful

  • The way I call random_env.py is through python interpreter directly, although you may have to do some minor changes on random_env.py.
    I dont see how this can be of any help with openGL, but it helped me with imports (conda messed up some env variables). Your installation may have some problem locating gl libraries, again, check the article above for additional help with this.

python
import examples.random_env as rand_env
rand_env.main()

@liduanken
Copy link

I have encountered the same problem. BTW I am using a docker.

I had this problem when running the example from inside the docker. (In Ubuntu 18 and CUDA 11)
I solved it executing the example outside the container, but you need to set env variables again if not already in .bashrc

conda activate softgym
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH

python examples/random_env.py --env_name PourWaterAmount

I still get:
Could not initialize GL extensions
CUDA 11 and Ubuntu 18.04
I wonder if you have some ideas, thank you

Have you tried @yufeiwang63 answer? If it didn't work here are some things I would try. Although I don't want to send you on a wild goose chase.

* I described the steps I followed to install on my [fork](https://github.com/FranBesq/softgym/blob/master/docker/docker.md)

* @Xingyu-Lin links [this article](https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1) on his docker.md wich may be helpful

* The way I call random_env.py is through python interpreter directly, although you may have to do some minor changes on random_env.py.
  I dont see how this can be of any help with openGL, but it helped me with imports (conda messed up some env variables). Your installation may have some problem locating gl libraries, again, check the article above for additional help with this.
python
import examples.random_env as rand_env
rand_env.main()

Hello, Thanks for your answering.
I have tried your solution and I successfully complied outside the container but it still did not work. I have also tried on Ubuntu 16.04 NVIDIA 440.33.01 and CUDA 10.2 (actually I suspect that whether the authors have successfully complied on CUDA 9.2 cause I have tried it before, but apparently some libraries do not match where you will get a error 'undefined symbol: cudaSetupArgument'. ), but still got 'Could not initialize GL extensions.' So I really do not know how to successfully compile authors' softgym as I have already tried at least 30 hours on it while nothing comes out. I wonder if you could provide some alternative ideas. Thanks for your answer!

@yufeiwang63
Copy link
Collaborator

Hi LiDuanAtGlasgow,
I feel sorry that the compilation brings so much trouble to you. We ourselves also spent lots of time getting the system running correctly at our early stage of development on this project.
We were indeed able to compile the project with Nvidia driver 440.33.01, and cuda version 9.1 or 9.2. See the screenshot below.
image

Also, can you check your $LD_LIBRARY_PATH to make sure it looks sth similar to mine?

@DanielTakeshi
Copy link

DanielTakeshi commented Jan 6, 2021

Hi @yufeiwang63 and @Xingyu-Lin ,
I am also running into the same problem that @LiDuanAtGlasgow has been running into. I am using Ubuntu 18.04 and the provided Docker.

I can produce a detailed issue report, but before doing that, I am interested in knowing the workflow that you two use to run softgym. Just to be clear, did you need to follow the instructions in this fork linked above? Is this the workflow that you generally follow? And when you run your python commands, are you using the usual command line shell or are you inside a docker environment?

@FranBesq
Copy link

FranBesq commented Jan 6, 2021

The purpose of the container is to compile PyFlex as far as I understood. I followed similar steps to the PyFlex docker.md when creating the fork and got it to work this way. Again, Im not going to talk in behalf of the authors obviously. But I think is worth giving it a try.

@Xingyu-Lin
Copy link
Owner

We generally do not use the docker on our local desktop and only use it for launch experiments on computing clusters. On our local desktop, we follow the instructions here https://github.com/Xingyu-Lin/softgym/blob/master/README.md. The purpose of the docker was to make the compilation easier for more people. What @FranBesq said is correct: The docker is only used for compiling the Flex and PyFlex. Once the compilation is done, softgym can be run in a normal python environment.

Hi @yufeiwang63 and @Xingyu-Lin ,
I am also running into the same problem that @LiDuanAtGlasgow has been running into. I am using Ubuntu 18.04 and the provided Docker.

I can produce a detailed issue report, but before doing that, I am interested in knowing the workflow that you two use to run softgym. Just to be clear, did you need to follow the instructions in this fork linked above? Is this the workflow that you generally follow? And when you run your python commands, are you using the usual command line shell or are you inside a docker environment?

@DanielTakeshi
Copy link

Hi @Xingyu-Lin @FranBesq here is my more detailed minimum working example:
#9

(In a separate issue report)

@rehaanahmad2013
Copy link

Hey @Xingyu-Lin what do you have in your /usr/lib/nvidia-440 folder? I do not have a folder like that in /usr/lib, and I suspect that could be my issue.

@Xingyu-Lin
Copy link
Owner

Hi @rehaanahmad2013, here is my ls result:

alternate-install-present
alt_ld.so.conf
bin
ld.so.conf
libEGL_nvidia.so.0
libEGL_nvidia.so.440.64.00
libEGL.so
libEGL.so.1
libEGL.so.1.1.0
libEGL.so.440.64.00
libGLdispatch.so.0
libGLESv1_CM_nvidia.so.1
libGLESv1_CM_nvidia.so.440.64.00
libGLESv1_CM.so
libGLESv1_CM.so.1
libGLESv1_CM.so.1.2.0
libGLESv2_nvidia.so.2
libGLESv2_nvidia.so.440.64.00
libGLESv2.so
libGLESv2.so.2
libGLESv2.so.2.1.0
libGL.so
libGL.so.1
libGL.so.1.7.0
libGLX_indirect.so.0
libGLX_nvidia.so.0
libGLX_nvidia.so.440.64.00
libGLX.so
libGLX.so.0
libnvcuvid.so
libnvcuvid.so.1
libnvcuvid.so.440.64.00
libnvidia-allocator.so
libnvidia-allocator.so.1
libnvidia-allocator.so.440.64.00
libnvidia-cbl.so.440.64.00
libnvidia-cfg.so
libnvidia-cfg.so.1
libnvidia-cfg.so.440.64.00
libnvidia-compiler.so
libnvidia-compiler.so.1
libnvidia-compiler.so.440.64.00
libnvidia-eglcore.so.440.64.00
libnvidia-egl-wayland.so.1
libnvidia-egl-wayland.so.1.1.4
libnvidia-encode.so
libnvidia-encode.so.1
libnvidia-encode.so.440.64.00
libnvidia-fatbinaryloader.so.440.64.00
libnvidia-fbc.so
libnvidia-fbc.so.1
libnvidia-fbc.so.440.64.00
libnvidia-glcore.so.440.64.00
libnvidia-glsi.so.440.64.00
libnvidia-glvkspirv.so.440.64.00
libnvidia-ifr.so
libnvidia-ifr.so.1
libnvidia-ifr.so.440.64.00
libnvidia-ml.so
libnvidia-ml.so.1
libnvidia-ml.so.440.64.00
libnvidia-opticalflow.so
libnvidia-opticalflow.so.1
libnvidia-opticalflow.so.440.64.00
libnvidia-ptxjitcompiler.so
libnvidia-ptxjitcompiler.so.1
libnvidia-ptxjitcompiler.so.440.64.00
libnvidia-rtcore.so.440.64.00
libnvidia-tls.so.440.64.00
libnvoptix.so.1
libnvoptix.so.440.64.00
libOpenGL.so
libOpenGL.so.0
tls
vdpau
xorg

@ShiguangSun
Copy link

Hello, I got the same question in an ubuntu16.04 server. My cuda version is 9.2, and the nvidia driver version is 460.73.01. When I ran the random_env.py, if headless 0, it showed
Could not initialize GL extensions
Reshaping
Segmentation fault (core dumped)
and if headless 1, it showed
eglGetDisplay() failedeglInitialize() failedeglChooseConfig() failedeglCreateContext() failedeglCreatePbufferSurface() failedeglMakeCurrent() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)
I tried the methods above, but can't work.
I wonder whether the nvidia driver version affect?

@Xingyu-Lin
Copy link
Owner

If you are on a ubuntu server, it's very likely that you don't have a display environment. Does it work with headless set to 1?

@ShiguangSun
Copy link

No, I tried both, neither 1 nor 0, it didn't work.

@Xingyu-Lin
Copy link
Owner

Dirver version does make a difference. We got it working with Nvidia driver 440.33.01, and cuda version 9.1 or 9.2, although others also got it working under some other driver versions.

@ShiguangSun
Copy link

Ok,thanks, I 'll try.

@ShiguangSun
Copy link

Hi,when I ran . ./compile_1.0.sh, there were some warnings:
/softgym/PyFlex/bindings/opengl/shadersGL.cpp:3386:25: warning: invalid conversion from ‘EGLConfig {aka void*}’ to ‘void**’ [-fpermissive]
g_eglConfig = configs[0]
/PyFlex/bindings/opengl/shadersGL.cpp:3390:33: warning : invalid conversion from ‘EGLContext {aka void*}’ to ‘void**’ [-fpermissive]
g_eglContext = eglCreateContext(
^
/PyFlex/bindings/opengl/shadersGL.cpp:3398:40: warning : invalid conversion from ‘EGLSurface {aka void*}’ to ‘void**’ [-fpermissive]
g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig,
Is this the reason why I couldn't run softgym?

@ShiguangSun
Copy link

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

因此,在集群上我一直在使用 xvfb,因此显示环境应该能够正常工作(因为其他 GL 应用程序如 glxgears 可以正常工作)。然而,我只是运行它再次无头,并得到一个非常相似的错误。我还尝试了 FranBesq 提供的解决方案,得到了同样的问题。

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)

Hi, have you solved this problem?

@rehaanahmad2013
Copy link

rehaanahmad2013 commented Aug 31, 2021

So on the cluster I have been using xvfb and so the display environment should be working properly (because other GL applications like glxgears works as expected). However, I just ran it again headless and get a very similar error. I also attempted the solution provided by FranBesq and get the same issue.

Waiting to generate environment variations. May take 1 minute for each variation...
eglInitialize() failedeglChooseConfig() failedfailed to find suitable EGLConfigeglCreateContext() failedeglCreatePbufferSurface() failedeglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions
Segmentation fault (core dumped)

@DhananjayAshok Have you been able to solve this problem? Others have had the segmentation fault but I'm also experiencing the exact same output error with "eglInitialize()..." etc. I also used a non-docker approach for admin reasons.

@TriBall3
Copy link

Have you solved the problem yet?

@TriBall3
Copy link

I have encountered the same problem. BTW I am using a docker.

Have you solved the problem yet?

@zcswdt
Copy link

zcswdt commented Jun 14, 2023

lib/nvidia-440文件夹中有什么?我在/usr/lib中没有这样的文件夹,我怀疑这可能是我的问题
Hello, I also encountered this issue. I entered echo $LD_ LIBRARY_ The PATH display is as follows
image

@zcswdt
Copy link

zcswdt commented Jun 19, 2023

/compile_1.0.sh,有一些警告

你好,我在ubuntu16.04服务器上遇到了同样的问题。我的cuda版本是9.2,nvidia驱动版本是460.73.01。当我运行random_env.py时,如果headless 0,它显示 Could not initialize GL extensions Reshaping Segmentation fault (core dumped) ,如果headless 1,它显示 eglGetDisplay() failedglInitialize() failedglChooseConfig() failedglCreateContext() failedglCreatePbufferSurface() failedglMakeCurrent( ) failedglQueyContext(EGL_RENDER_BUFFER) failedCould not initialize GL extensions Segmentation fault (core dumped) 我试过上面的方法,但是不行。 请问nvidia驱动版有影响吗?

Hello, I have encountered the same problem as you. Have you resolved it?

@bilkitty
Copy link

bilkitty commented Feb 8, 2024

Late to the party, but if anyone is stuck on this, you can run the examples in headless mode. e.g.,
python examples/random_env.py --headless 1 --env_name PassWater

This worked in the prebuild Docker image which I setup in Ubuntu 20.04. The build script uses CUDA 9.2.

@karinoon
Copy link

Hi,when I ran . ./compile_1.0.sh, there were some warnings: /softgym/PyFlex/bindings/opengl/shadersGL.cpp:3386:25: warning: invalid conversion from ‘EGLConfig {aka void*}’ to ‘void**’ [-fpermissive] g_eglConfig = configs[0] /PyFlex/bindings/opengl/shadersGL.cpp:3390:33: warning : invalid conversion from ‘EGLContext {aka void*}’ to ‘void**’ [-fpermissive] g_eglContext = eglCreateContext( ^ /PyFlex/bindings/opengl/shadersGL.cpp:3398:40: warning : invalid conversion from ‘EGLSurface {aka void*}’ to ‘void**’ [-fpermissive] g_eglSurface = eglCreatePbufferSurface(g_eglDisplay, g_eglConfig, Is this the reason why I couldn't run softgym?

Hi, have you solved this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests