Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault with protobuf, gym (atari), cv2, and ScenarIO #316

Merged
merged 1 commit into from
Mar 31, 2021

Conversation

diegoferigo
Copy link
Collaborator

Similar to the infamous problems we have with tensorflow / protobuf, also a system with torch could segfault similarly 😰

[libprotobuf FATAL google/protobuf/stubs/common.cc:87] This program was compiled against version 3.5.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.15.6). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "../modules/dnn/misc/caffe/opencv-caffe.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): This program was compiled against version 3.5.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.15.6). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "../modules/dnn/misc/caffe/opencv-caffe.pb.cc".)

stack trace
Current thread 0x00007f7921033740 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1101 in create_module
  File "<frozen importlib._bootstrap>", line 556 in module_from_spec
  File "<frozen importlib._bootstrap>", line 657 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/conda/lib/python3.8/site-packages/gym/wrappers/atari_preprocessing.py", line 7 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/conda/lib/python3.8/site-packages/gym/wrappers/__init__.py", line 5 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1042 in _handle_fromlist
  File "/conda/lib/python3.8/site-packages/gym/__init__.py", line 14 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 961 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/dferigo/git/gym-ignition/python/gym_ignition/utils/typing.py", line 5 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/dferigo/git/gym-ignition/python/gym_ignition/utils/__init__.py", line 7 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/dferigo/git/gym-ignition/python/gym_ignition/__init__.py", line 12 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 783 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/dferigo/git/gym-ignition/examples/panda_pick_and_place.py", line 5 in <module>

This time, the culprit is gym/wrappers/atari_preprocessing.py. I found out that, as done for tensorflow, pre-importing gym fixes the problem.

@traversaro
Copy link
Contributor

Just to understand, which pytorch binaries are you using?

@diegoferigo
Copy link
Collaborator Author

diegoferigo commented Mar 31, 2021

❯ mamba list | grep torch
pytorch                   1.8.0           cuda110py38h65e529b_0    conda-forge
pytorch-gpu               1.8.0           cuda110py38h5b0ac8e_0    conda-forge
pytorch-lightning         1.2.5              pyhd8ed1ab_0    conda-forge
torchmetrics              0.2.0              pyhd8ed1ab_0    conda-forge
❯ mamba list | grep gym
gym                       0.18.0           py38h81c977d_0    conda-forge
gym-atari                 0.18.0           py38h578d9bd_0    conda-forge
gym-ignition              1.1.1.dev7+dirty           dev_0    <develop>
gym-ignition-models       1.0                      pypi_0    pypi
❯ mamba list | grep protobuf
libprotobuf               3.15.6               h780b84a_0    conda-forge
protobuf                  3.15.6           py38h709712a_0    conda-forge

@diegoferigo diegoferigo merged commit d3e6003 into devel Mar 31, 2021
@diegoferigo diegoferigo deleted the fix/segfault_gym_atari branch March 31, 2021 16:51
@traversaro
Copy link
Contributor

Version verification failed in "../modules/dnn/misc/caffe/opencv-caffe.pb.cc".It

I am not sure about what is going on, but this seems something coming from opencv, in particular from https://github.com/opencv/opencv/tree/master/modules/dnn/misc/caffe . It seems to be some sort of C++ generated file with an old version of protobuf, but that was committed to the opencv repo. Do you know who could bring in opencv?

@traversaro
Copy link
Contributor

I searched something in https://www.google.com/search?channel=fs&client=ubuntu&q=Version+verification+failed+in+%22..%2Fmodules%2Fdnn%2Fmisc%2Fcaffe%2Fopencv-caffe.pb.cc%22. , and from one of the search results ( https://patchwork.ozlabs.org/project/buildroot/patch/20200113195757.905861-1-fontaine.fabrice@gmail.com/ ) it seems that the PROTOBUF_UPDATE_FILES OpenCV option should be on in https://github.com/conda-forge/opencv-feedstock . Are you able reproduce this issue with a small snipped of Python code? Thanks!

@traversaro
Copy link
Contributor

traversaro commented Mar 31, 2021

For example, I imagine that:

import gym_ignition
import cv2

or

import cv2
import gym_ignition

could reproduce the issue?

@traversaro
Copy link
Contributor

Do you know who could bring in opencv?

Here: https://github.com/openai/gym/blob/0.18.0/gym/wrappers/atari_preprocessing.py#L7 .

@diegoferigo
Copy link
Collaborator Author

Yes I confirm it's due cv2, not pytorch. Reproducing it is quite easy, as you already described:

import scenario
import cv2

Naively, I hoped that switching to conda would have solved all these problems, but they keep appearing. Luckily now when a segfault occurs, we know who to blame (at first :D:)

@diegoferigo diegoferigo changed the title Fix segfault with protobuf, torch, gym (atari), and ScenarIO Fix segfault with protobuf, gym (atari), cv2, and ScenarIO Mar 31, 2021
@traversaro
Copy link
Contributor

Naively, I hoped that switching to conda would have solved all these problems, but they keep appearing.

The main difference is that with pip or by mixing apt/pip this kind of errors are kind intrinsic in how C++ dependencies are typically handled, while in conda-forge (hopefully) this should all be fixable problems. See conda-forge/opencv-feedstock#269 for the PR that should solve this problem.

@diegoferigo
Copy link
Collaborator Author

I totally agree. I'm not yet that familiar with the conda infrastructure but the advantages over PyPI are evident. Beyond these segfaults, effortless multiplatform binary distribution is just gold ❤️ If only the tensorflow situation could get better.... xD

Thanks a lot for fixing the problem upstream in conda-forge! I'll keep in any case the workaround since it could occur in a PyPI-based setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants