Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closing a Jep instance/sub-interpreter breaks some numpy methods #28

Closed
cristipp opened this issue Sep 2, 2015 · 9 comments
Closed

Comments

@cristipp
Copy link

cristipp commented Sep 2, 2015

I'm using jep3.4.1 on Mac in a Scala environment, with scipy. If I try to initialize multiple jep instances and import scipy, I get an error the second time. I'm using Scala, but Java code should be very similar:

    val jep1 = new Jep(false, ".semaphore-cache/jep/lib/python2.7/site-packages")
    try {
      jep1.eval("import scipy.optimize as opt")
    } finally {
      jep1.close()
    }
    val jep2 = new Jep(false, ".semaphore-cache/jep/lib/python2.7/site-packages")
    try {
      // This breaks.
      jep2.eval("import scipy.optimize as opt")
    } finally {
      jep2.close()
    }

The Python stack trace is:

Caused by: jep.JepException: <type 'exceptions.TypeError'>: 'NoneType' object is not callable
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/_methods._any(_methods.py:38)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.any(fromnumeric.py:1850)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/machar._do_init(machar.py:127)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/machar.__init__(machar.py:111)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/getlimits._init(getlimits.py:151)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/getlimits.__new__(getlimits.py:121)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.<module>(optimize.py:142)
    at /usr/local/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/__init__.<module>(__init__.py:175)

As a workaround, I'm just going to reuse a singleton jep instance.

@ndjensen
Copy link
Member

ndjensen commented Sep 2, 2015

I'm glad to see the stacktraces are working well. :D This is probably a scipy issue and not a Jep issue. The project I work on uses scipy but not extensively (uses numpy much more), but come to think of it we use thread pools and never close our Jep instances until JVM shutdown so my work project probably dodges this.

This sounds quite similar to a problem I found with numpy: numpy/numpy#3961

Jep can't really do much about this kind of thing if the CPython code is messing up internal function pointers, either at python sub-interpreter shutdown or the start of another sub-interpreter, that's in the specific extension's code, not Jep. I documented the one I found in numpy with a bug report and most open source projects are open to contributions of fixes, but there's not a lot of experience with embedded sub-interpreters and most times it would take a significant amount of code changes to correct. (I documented it so it's a known issue, but am not confident enough in numpy to contribute a fix).

I should update the Jep docs to mention these kinds of things and how to work around them. Numpy made a label of "embedded" and this appears to be more a numpy problem than a scipy problem, so my recommendation is that you try to figure out what wiped out that method (method appears to be um.logical_or.reduce https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L38) and write a ticket for numpy. My hope is that Jep grows in popularity enough that Python communities start giving issues running in embedded sub-interpreters more attention and more developers grow in embedded python expertise.

@cristipp
Copy link
Author

cristipp commented Sep 2, 2015

Stacktraces are awesome :) Thanks for the hard work.

IMHO going with one interpreter per thread, likely using some form of ThreadLocal, is the way to go. Updating C magic in Python libraries feels quite an undertaking. I think this restriction warrants a mention in a visible place in jep docs.

@behdad84
Copy link

behdad84 commented Sep 3, 2015

Sharing an experience I had in a heavily multi-threaded environment using jep in a spark job:

A little background:
I was using cvxpy (written in python) to solve a numerical approximation problem using ECOS (written in C) on small independent pieces of a large dataset. Using spark the dataset is partitioned (by notion of grouping based on a key) to small pieces and every piece is considered a task to be solved by calling the cvxpy&ECOS in a separate thread. The way spark works is distributing the computation by instantiating a few JVMs on cluster worker nodes and forking a few threads in every JVM to perform the tasks. So in general a spark mixes multi-processing and multi-threading executions together.

These are the steps I took to get around issues I observed:

  1. To address the Jep issue not admitting cross-thread computation (i.e., an instance of jep created under one thread can only be used within the same thread) I created thread local instances of Jep for every thread in every worker node JVM.
  2. Despite making Jep instances thread local as explained in step 1, I was seeing segmentation fault errors thrown from ECOS (due to some static variable definition somewhere in their code). This was indeed against my expectation as I was thinking making independent instances of Jep per thread protects the variables and namespaces of a script but obviously this dis not seem to be the case. The way I got around this problem was by making synchronized calls to jep.invoke in my java code (where call to the cxvpy$ECOS takes place)

The problem was definitely a static variable declaration in ECOS C code. I've reported the issue to both cvxpy and ECOS and I think they are going to solve it in next releases.

embotech/ecos#127
https://github.com/cvxgrp/cvxpy/issues/220

@ndjensen
Copy link
Member

ndjensen commented Sep 4, 2015

Thanks for sharing your insight. That's great that ecos and cvxpy are looking into solving the issues! I'll make sure to include these workarounds in the docs.

@bsteffensmeier
Copy link
Member

I've traced it through the cpython and numpy code and I think I understand the problem.

When the jep interpreter is closed we call Py_EndInterpreter which calls Py_ImportCleanup calling _PyModule_Clear which finally ends up setting all the module items to None.

The root of the problem is in the numpy c code where it imports the _mehods module into c and save methods to a static variables. Since the variable um is part of the module, it is set to None as part of _PyModule_Clear. The 'any' function itself is still held in the static variable but when the function is called it looks up the value for the variable 'um' which is None which has no logical_or attribute causing the exception.

This is not something we can fix in jep since the problem is in a static variable created by numpy. A naive solution would be modify _methods.py and move the import of the umath module into the any method so that it did not need to reference any variables defined in the module. However in newer versions of numpy they did the exact opposite, moving the logical_or.reduce into a variable. It seems this was a performance enhancement so if we undid that and moved the import within the method it would slow things down slightly. There is probably some other way to access or cache that method but I'm not familiar enough with numpy source to recommend a solution and any further discussion should probably head over to the numpy page to get input from the devs there.

The minimum code required to replicate the issue from java is:

Jep jep = new Jep();
jep.eval("import numpy");
jep.eval("numpy.ndarray([1]).any()");
jep.close();
jep = new Jep();
jep.eval("import numpy");
jep.eval("numpy.ndarray([1]).any()");
jep.close();

To demonstrate that this issue is not specific to jep, but happens anytime numpy is used from subinterpreters the following code also shows the problem.

int main( int argc, const char* argv[] ){
    PyThreadState *mainState, *subState;

    Py_Initialize();
    PyEval_InitThreads();
    mainState = PyThreadState_Get();
    PyEval_ReleaseThread(mainState);

    subState = Py_NewInterpreter();
    PyRun_SimpleString("import numpy");
    PyRun_SimpleString("numpy.ndarray([1]).any()");
    Py_EndInterpreter(subState);

    subState = Py_NewInterpreter();
    PyRun_SimpleString("import numpy");
    PyRun_SimpleString("numpy.ndarray([1]).any()");
    Py_EndInterpreter(subState);

    PyEval_AcquireThread(mainState);
    Py_Finalize();
}

@ndjensen
Copy link
Member

On master branch I have added a new test with Ben's code to more easily illustrate the problem. https://github.com/mrj0/jep/blob/master/src/jep/test/numpy/TestNumpyAny.java

So it's not scipy or cvxpy, it's just numpy and how the references to variables are retained. I'm going to close #31 as a duplicate of this ticket, and then we'll have to work with the numpy developers to determine if there is an optimal way to overcome this without adversely affecting numpy.

@ndjensen ndjensen changed the title Multiple Jep instances break scipy initialization. Closing a Jep instance/sub-interpreter breaks some numpy methods Oct 22, 2015
@wizbots
Copy link

wizbots commented Oct 28, 2015

Hi. I also hit this problem, stack trace below. This occured when shutting down Jep instances with close command after multiple threads have finished their work....

jep.JepException: <type 'exceptions.TypeError'>: 'NoneType' object is not callable
at /lib/python2.7/site-packages/numpy/core/_methods._sum(_methods.py:32)
at /lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation._assert_all_finite(validation.py:40)
at /lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation._atleast2d_or_sparse(validation.py:138)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/utils/validation.atleast2d_or_csr(validation.py:165)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/linear_model/base.decision_function(base.py:191)
at /analytics/lib/python2.7/site-packages/scikit_learn-0.15.2-py2.7-macosx-10.11-intel.egg/sklearn/linear_model/base.predict(base.py:215)

@ndjensen
Copy link
Member

This is alleviated by the new shared modules feature in Jep 3.6 which has a release candidate and will be released in the near future. It needs tested in more environments than where Ben and I have tested it. See the 3.6 release notes: https://github.com/mrj0/jep/blob/dev_3.6/release_notes/3.6-notes.rst#python-shared-modules-beta

I will update the wiki after the official release and hopefully we will have some good feedback.

@ndjensen
Copy link
Member

Since 3.6 is released with shared modules, I'm closing this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants