Closing a Jep instance/sub-interpreter breaks some numpy methods #28

cristipp · 2015-09-02T01:40:52Z

I'm using jep3.4.1 on Mac in a Scala environment, with scipy. If I try to initialize multiple jep instances and import scipy, I get an error the second time. I'm using Scala, but Java code should be very similar:

    val jep1 = new Jep(false, ".semaphore-cache/jep/lib/python2.7/site-packages")
    try {
      jep1.eval("import scipy.optimize as opt")
    } finally {
      jep1.close()
    }
    val jep2 = new Jep(false, ".semaphore-cache/jep/lib/python2.7/site-packages")
    try {
      // This breaks.
      jep2.eval("import scipy.optimize as opt")
    } finally {
      jep2.close()
    }

The Python stack trace is:

Caused by: jep.JepException: <type 'exceptions.TypeError'>: 'NoneType' object is not callable
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/_methods._any(_methods.py:38)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.any(fromnumeric.py:1850)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/machar._do_init(machar.py:127)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/machar.__init__(machar.py:111)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/getlimits._init(getlimits.py:151)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/getlimits.__new__(getlimits.py:121)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.<module>(optimize.py:142)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/__init__.<module>(__init__.py:175)

As a workaround, I'm just going to reuse a singleton jep instance.

The text was updated successfully, but these errors were encountered:

ndjensen · 2015-09-02T02:45:52Z

I'm glad to see the stacktraces are working well. :D This is probably a scipy issue and not a Jep issue. The project I work on uses scipy but not extensively (uses numpy much more), but come to think of it we use thread pools and never close our Jep instances until JVM shutdown so my work project probably dodges this.

This sounds quite similar to a problem I found with numpy: numpy/numpy#3961

Jep can't really do much about this kind of thing if the CPython code is messing up internal function pointers, either at python sub-interpreter shutdown or the start of another sub-interpreter, that's in the specific extension's code, not Jep. I documented the one I found in numpy with a bug report and most open source projects are open to contributions of fixes, but there's not a lot of experience with embedded sub-interpreters and most times it would take a significant amount of code changes to correct. (I documented it so it's a known issue, but am not confident enough in numpy to contribute a fix).

I should update the Jep docs to mention these kinds of things and how to work around them. Numpy made a label of "embedded" and this appears to be more a numpy problem than a scipy problem, so my recommendation is that you try to figure out what wiped out that method (method appears to be um.logical_or.reduce https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L38) and write a ticket for numpy. My hope is that Jep grows in popularity enough that Python communities start giving issues running in embedded sub-interpreters more attention and more developers grow in embedded python expertise.

cristipp · 2015-09-02T03:13:18Z

Stacktraces are awesome :) Thanks for the hard work.

IMHO going with one interpreter per thread, likely using some form of ThreadLocal, is the way to go. Updating C magic in Python libraries feels quite an undertaking. I think this restriction warrants a mention in a visible place in jep docs.

behdad84 · 2015-09-03T23:27:07Z

Sharing an experience I had in a heavily multi-threaded environment using jep in a spark job:

A little background:
I was using cvxpy (written in python) to solve a numerical approximation problem using ECOS (written in C) on small independent pieces of a large dataset. Using spark the dataset is partitioned (by notion of grouping based on a key) to small pieces and every piece is considered a task to be solved by calling the cvxpy&ECOS in a separate thread. The way spark works is distributing the computation by instantiating a few JVMs on cluster worker nodes and forking a few threads in every JVM to perform the tasks. So in general a spark mixes multi-processing and multi-threading executions together.

These are the steps I took to get around issues I observed:

To address the Jep issue not admitting cross-thread computation (i.e., an instance of jep created under one thread can only be used within the same thread) I created thread local instances of Jep for every thread in every worker node JVM.
Despite making Jep instances thread local as explained in step 1, I was seeing segmentation fault errors thrown from ECOS (due to some static variable definition somewhere in their code). This was indeed against my expectation as I was thinking making independent instances of Jep per thread protects the variables and namespaces of a script but obviously this dis not seem to be the case. The way I got around this problem was by making synchronized calls to jep.invoke in my java code (where call to the cxvpy$ECOS takes place)

The problem was definitely a static variable declaration in ECOS C code. I've reported the issue to both cvxpy and ECOS and I think they are going to solve it in next releases.

embotech/ecos#127
https://github.com/cvxgrp/cvxpy/issues/220

ndjensen · 2015-09-04T01:42:15Z

Thanks for sharing your insight. That's great that ecos and cvxpy are looking into solving the issues! I'll make sure to include these workarounds in the docs.

bsteffensmeier · 2015-10-21T21:24:54Z

I've traced it through the cpython and numpy code and I think I understand the problem.

When the jep interpreter is closed we call Py_EndInterpreter which calls Py_ImportCleanup calling _PyModule_Clear which finally ends up setting all the module items to None.

The root of the problem is in the numpy c code where it imports the _mehods module into c and save methods to a static variables. Since the variable um is part of the module, it is set to None as part of _PyModule_Clear. The 'any' function itself is still held in the static variable but when the function is called it looks up the value for the variable 'um' which is None which has no logical_or attribute causing the exception.

This is not something we can fix in jep since the problem is in a static variable created by numpy. A naive solution would be modify _methods.py and move the import of the umath module into the any method so that it did not need to reference any variables defined in the module. However in newer versions of numpy they did the exact opposite, moving the logical_or.reduce into a variable. It seems this was a performance enhancement so if we undid that and moved the import within the method it would slow things down slightly. There is probably some other way to access or cache that method but I'm not familiar enough with numpy source to recommend a solution and any further discussion should probably head over to the numpy page to get input from the devs there.

The minimum code required to replicate the issue from java is:

Jep jep = new Jep();
jep.eval("import numpy");
jep.eval("numpy.ndarray([1]).any()");
jep.close();
jep = new Jep();
jep.eval("import numpy");
jep.eval("numpy.ndarray([1]).any()");
jep.close();

To demonstrate that this issue is not specific to jep, but happens anytime numpy is used from subinterpreters the following code also shows the problem.

int main( int argc, const char* argv[] ){
    PyThreadState *mainState, *subState;

    Py_Initialize();
    PyEval_InitThreads();
    mainState = PyThreadState_Get();
    PyEval_ReleaseThread(mainState);

    subState = Py_NewInterpreter();
    PyRun_SimpleString("import numpy");
    PyRun_SimpleString("numpy.ndarray([1]).any()");
    Py_EndInterpreter(subState);

    subState = Py_NewInterpreter();
    PyRun_SimpleString("import numpy");
    PyRun_SimpleString("numpy.ndarray([1]).any()");
    Py_EndInterpreter(subState);

    PyEval_AcquireThread(mainState);
    Py_Finalize();
}

ndjensen · 2015-10-22T17:08:10Z

On master branch I have added a new test with Ben's code to more easily illustrate the problem. https://github.com/mrj0/jep/blob/master/src/jep/test/numpy/TestNumpyAny.java

So it's not scipy or cvxpy, it's just numpy and how the references to variables are retained. I'm going to close #31 as a duplicate of this ticket, and then we'll have to work with the numpy developers to determine if there is an optimal way to overcome this without adversely affecting numpy.

wizbots · 2015-10-28T01:09:35Z

Hi. I also hit this problem, stack trace below. This occured when shutting down Jep instances with close command after multiple threads have finished their work....

jep.JepException: <type 'exceptions.TypeError'>: 'NoneType' object is not callable
at /lib/python2.7/site-packages/numpy/core/_methods._sum(_methods.py:32)
at /lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation._assert_all_finite(validation.py:40)
at /lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation._atleast2d_or_sparse(validation.py:138)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation.atleast2d_or_csr(validation.py:165)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/linear_model/base.decision_function(base.py:191)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/linear_model/base.predict(base.py:215)

ndjensen · 2016-08-30T04:01:58Z

This is alleviated by the new shared modules feature in Jep 3.6 which has a release candidate and will be released in the near future. It needs tested in more environments than where Ben and I have tested it. See the 3.6 release notes: https://github.com/mrj0/jep/blob/dev_3.6/release_notes/3.6-notes.rst#python-shared-modules-beta

I will update the wiki after the official release and hopefully we will have some good feedback.

ndjensen · 2016-09-23T16:27:39Z

Since 3.6 is released with shared modules, I'm closing this ticket.

ndjensen added the extension label Sep 12, 2015

ndjensen mentioned this issue Oct 19, 2015

Jep throwing exception on import cvxpy after close #31

Closed

ndjensen changed the title ~~Multiple Jep instances break scipy initialization.~~ Closing a Jep instance/sub-interpreter breaks some numpy methods Oct 22, 2015

This was referenced Oct 22, 2015

numpy.set_string_function is unsafe when using multiple embedded sub interpreters numpy/numpy#3961

Closed

Multiple Jep instances can slow down throughput/concurrency waiting on CPython GIL #34

Closed

ndjensen closed this as completed Sep 23, 2016

SemanticBeeng mentioned this issue Apr 5, 2018

SIGSEGV using jep from specs2 in sbt - embedded both Python (jep) and R (rpy2) #126

Closed

pyNpy mentioned this issue May 11, 2022

jep with pytorch happens errors #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closing a Jep instance/sub-interpreter breaks some numpy methods #28

Closing a Jep instance/sub-interpreter breaks some numpy methods #28

cristipp commented Sep 2, 2015

ndjensen commented Sep 2, 2015

cristipp commented Sep 2, 2015

behdad84 commented Sep 3, 2015

ndjensen commented Sep 4, 2015

bsteffensmeier commented Oct 21, 2015

ndjensen commented Oct 22, 2015

wizbots commented Oct 28, 2015

ndjensen commented Aug 30, 2016

ndjensen commented Sep 23, 2016

Closing a Jep instance/sub-interpreter breaks some numpy methods #28

Closing a Jep instance/sub-interpreter breaks some numpy methods #28

Comments

cristipp commented Sep 2, 2015

ndjensen commented Sep 2, 2015

cristipp commented Sep 2, 2015

behdad84 commented Sep 3, 2015

ndjensen commented Sep 4, 2015

bsteffensmeier commented Oct 21, 2015

ndjensen commented Oct 22, 2015

wizbots commented Oct 28, 2015

ndjensen commented Aug 30, 2016

ndjensen commented Sep 23, 2016