Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing mxnet causes a segfault on exit #1044

Closed
abylaw opened this issue Jul 16, 2020 · 7 comments
Closed

Importing mxnet causes a segfault on exit #1044

abylaw opened this issue Jul 16, 2020 · 7 comments

Comments

@abylaw
Copy link

abylaw commented Jul 16, 2020

🐛 Bug Reports

🌍 Environment

  • Your operating system and version: Pop_OS 19.10
  • Your python version: 3.7.5
  • How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?: Installed via apt. No virtualenv.
  • Your Rust version (rustc --version): 1.43.0
  • Your PyO3 version: 0.11.1
  • Have you tried using latest PyO3 master (replace version = "0.x.y" with git = "https://github.com/PyO3/pyo3")?: Yes

💥 Reproducing

MXnet version: 1.6.0, installed via pip: pip3 install mxnet

Minimal reproduction:

use pyo3::prelude::*;

fn main() {
    let gil = Python::acquire_gil();
    let py = gil.python();

    let _mod = PyModule::import(py, "mxnet").unwrap();
}

On exit:

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target/debug/minimal-pyo3-segfault`
Segmentation fault (core dumped)

Debugging leads us to this line: https://github.com/apache/incubator-mxnet/blob/a0e67353fe81ed97fc7aef2d8429a93dc035a394/src/c_api/c_api.cc#L1318.

Running import mxnet directly in python (and exiting the interpreter) exits normally.

Strangely, on Python 3.8 we get a corrupted double-linked list instead.

I realize that mxnet is absolutely huge, but thought you guys might be interested in knowing about a specific edge-case which causes problems for pyo3. Of course, if you guys are able to provide a bit of insight that'd be super helpful. :) Thank you!

@davidhewitt
Copy link
Member

Thanks for reporting. I took a quick attempt to debug this and get the same crash. gdb and readelf suggests that the cause may be related to some corruption in mxnet.so (see issue mentioned just above).

I'll leave this open in case upstream resolving that issue does not fix the problem.

@kngwyu
Copy link
Member

kngwyu commented Jul 18, 2020

So I understand it can be an upstream bug, but we really need to call Py_FinalizeEx by default?
I confirmed that this segfault does not happen when commenting out libc::atexit(finalize).
Python C extension should not cause SIGSEGV by Py_FinalizeEx so it can be a bug, but considering there can be many other this kind of bugs in the community, I don't think we should call Py_FinalizeEx by default and want it optional if @m-ou-se is OK.

@m-ou-se
Copy link
Contributor

m-ou-se commented Jul 18, 2020

Without Py_FinalizeEx in atexit, the standard output buffers will not be flushed panics/exits, threads will not be joined properly, etc. See #943.

@m-ou-se
Copy link
Contributor

m-ou-se commented Jul 18, 2020

Downgrading to a slightly older version of mxnet seems to work:

$ pip install mxnet==1.6.0b20200127

The libmxnet.so file in 1.6.0 is very broken. Not sure why the python3.8 interpreter doesn't crash. It also calls Py_FinalizeEx.

@davidhewitt
Copy link
Member

Yes I agree that we're correct to be calling Py_FinalizeEx by default. If a separate extension is buggy then it's not our concern to workaround (unless it's a really major library like numpy).

@leezu
Copy link

leezu commented Jul 22, 2020

This is probably fixed by apache/mxnet#18768

@davidhewitt
Copy link
Member

@leezu I retried a from-source build with that patch merged, and can confirm I no longer see any issues with the above sample program. Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants