segfault or free(): invalid pointer when importing dgl with other libraries due to RTLD_GLOBAL #2255

skrsna · 2020-10-01T17:52:04Z

🐛 Bug

importing dgl after importing C++ based library with pybind interface leads to segfault or free(): invalid pointer. The C++ library in question is an internal library that is not available publicly. I found some relevant issues on pytorch repo pytorch/pytorch#3059 and RobotLocomotion/drake#12073. I was able to find a workaround by deleting ctypes.RTLD_GLOBAL here in the dgl source code. Pytorch and tensorflow seemed to move away from RTLD_GLOBAL. ref (pytorch/pytorch#28536). Just wondering if something similar can be done in dgl.

To Reproduce

Sorry the library I'm using that causes this error is not available publicly and uses TBB allocator.
Steps to reproduce the behavior:

Expected behavior

import dgl without segfault or free(): invalid pointer Aborted

Environment

DGL Version (e.g., 1.0): 0.5.2 cpu
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): pytorch 1.7 nightly
OS (e.g., Linux): linux
How you installed DGL (conda, pip, source): conda
Build command you used (if compiling from source):
Python version: 3.8
CUDA/cuDNN version (if applicable): 10.2, 7.6
GPU models and configuration (e.g. V100):
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

VoVAllen · 2020-10-12T06:51:56Z

bump this

BarclayII · 2020-10-30T14:30:14Z

Does it immediately crash after importing DGL after importing the said library?

skrsna · 2020-10-30T14:35:52Z

Hi @BarclayII,

If I import dgl then import the private library, it doesn't crash right away but crashes when there's a call to any dgl functions or the library's functions. On the other hand if I import the library first and then dgl it crashes right away.

dgasmith · 2020-12-23T18:03:36Z

This issue was references in #2328, but then the line was crossed out. I didn't see an immediate reason of why in the issue.

If this is a longer term item, could we introduce an env variable to dynamically change the CDLL load in specific circumstance-- perhaps DGL_RTLD_SETTING?

BarclayII · 2020-12-24T03:22:28Z

As I mentioned in the crossed-out text, directly changing RTLD_GLOBAL will make some examples (namely examples/pytorch/graphsage/train_sampling.py with num_workers=0) freeze.

I wasn't able to figure out the reason yet, so I had to work around it by ensuring PyTorch/MXNet/Tensorflow C library to be loaded before libdgl.so. Obviously not a fix to this issue per se.

dgasmith · 2020-12-28T19:19:00Z

Ah got it. In the meantime would take a PR that allows us to alter this setting via env variable?

BarclayII · 2021-01-06T10:23:23Z

@dgasmith @skrsna I removed the flag in a recent PR. Please give the nightly builds a try. I tested on the GraphSAGE examples and it currently run without any problems.

dgasmith · 2021-01-06T15:16:42Z

@BarclayII Thanks! I really appreciate that, we will evaluate the nightly builds ASAP.

BarclayII · 2021-03-03T11:00:17Z

So far no issues as per our experience. Please reopen the issue if the problem still exists in your case.

jermainewang self-assigned this Nov 3, 2020

BarclayII mentioned this issue Nov 9, 2020

[Performance] Use allocator from PyTorch if possible #2328

Merged

5 tasks

jermainewang added the help wanted Need helps from the community label Dec 28, 2020

VoVAllen mentioned this issue Jan 1, 2021

Interacts with aws-data-wrangler to cause crashes #2477

Closed

BarclayII closed this as completed Mar 3, 2021

This was referenced Oct 25, 2021

[Torch, CI] Upgrade to PyTorch 1.10 apache/tvm#9349

Closed

[Bug] PyTorch and TVM loading problem due to conflicting LLVM symbols apache/tvm#9362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault or free(): invalid pointer when importing dgl with other libraries due to RTLD_GLOBAL #2255

segfault or free(): invalid pointer when importing dgl with other libraries due to RTLD_GLOBAL #2255

skrsna commented Oct 1, 2020 •

edited

Loading

VoVAllen commented Oct 12, 2020

BarclayII commented Oct 30, 2020 •

edited

Loading

skrsna commented Oct 30, 2020

dgasmith commented Dec 23, 2020

BarclayII commented Dec 24, 2020 •

edited

Loading

dgasmith commented Dec 28, 2020

BarclayII commented Jan 6, 2021

dgasmith commented Jan 6, 2021

BarclayII commented Mar 3, 2021

segfault or free(): invalid pointer when importing dgl with other libraries due to RTLD_GLOBAL #2255

segfault or free(): invalid pointer when importing dgl with other libraries due to RTLD_GLOBAL #2255

Comments

skrsna commented Oct 1, 2020 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

VoVAllen commented Oct 12, 2020

BarclayII commented Oct 30, 2020 • edited Loading

skrsna commented Oct 30, 2020

dgasmith commented Dec 23, 2020

BarclayII commented Dec 24, 2020 • edited Loading

dgasmith commented Dec 28, 2020

BarclayII commented Jan 6, 2021

dgasmith commented Jan 6, 2021

BarclayII commented Mar 3, 2021

skrsna commented Oct 1, 2020 •

edited

Loading

BarclayII commented Oct 30, 2020 •

edited

Loading

BarclayII commented Dec 24, 2020 •

edited

Loading