Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes got error SIGSEGV when using numpy #341

Open
tank99tank opened this issue Jul 27, 2021 · 4 comments
Open

Sometimes got error SIGSEGV when using numpy #341

tank99tank opened this issue Jul 27, 2021 · 4 comments

Comments

@tank99tank
Copy link

tank99tank commented Jul 27, 2021

@RequestMapping("/**") 
public synchronized Object exec(@RequestBody String content, HttpServletRequest request) {
	String requestUri = request.getRequestURI();
	String jsonParam = JSONObject.parseObject(content).toJSONString();
	PythonScript script = execution.getPythonScript(requestUri);
	if(script!=null) {
		try (Jep jep = new SharedInterpreter()) {
			String model = extractModel(script.getPath());
			String modelImportCmd = "from " + model + " import "+ script.getFunc();
			jep.exec(modelImportCmd);
			Object result = jep.invoke(script.getFunc(), jsonParam);
			return result;
		} catch (Exception e) {
			throw new RuntimeException(e);
		} 
	} else {
		throw new RuntimeException("The requestUri("+requestUri+") has not been config the Python script in the path: " + execution.getScriptRoot());
	}
}

Sometimes the code runs OK, but sometimes got the following errors and the program stoped:

  • ERROR 1:

*** Aborted at 1627296117 (unix time) try "date -d @1627296117" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x2b77d8087e00) received by PID 15991 (TID 0x2b78aa92c700) from PID 18446744073039019520; stack trace: ***
    @     0x2b77d87dd6d0 (unknown)
    @     0x2b77d9cf8b6b ciBytecodeStream::get_method()
    @     0x2b77d9c263e3 GraphBuilder::invoke()
    @     0x2b77d9c22ebf GraphBuilder::iterate_bytecodes_for_block()
    @     0x2b77d9c2487f GraphBuilder::iterate_all_blocks()
    @     0x2b77d9c24e0a GraphBuilder::GraphBuilder()
    @     0x2b77d9c2ab6a IRScope::IRScope()
    @     0x2b77d9c2b6f9 IR::IR()
    @     0x2b77d9c0d277 Compilation::build_hir()
    @     0x2b77d9c0f3f0 Compilation::compile_java_method()
    @     0x2b77d9c0f59a Compilation::compile_method()
    @     0x2b77d9c0fa10 Compilation::Compilation()
    @     0x2b77d9c10198 Compiler::compile_method()
    @     0x2b77d9d6cc9c CompileBroker::invoke_compiler_on_method()
    @     0x2b77d9d6e8d8 CompileBroker::compiler_thread_loop()
    @     0x2b77da366bbb JavaThread::thread_main_inner()
    @     0x2b77da366ec1 JavaThread::run()
    @     0x2b77da1f5132 java_start()
    @     0x2b77d87d5e25 start_thread
    @     0x2b77d8f04bad __clone
    @                0x0 (unknown)

  • ERROR 2:

*** Aborted at 1627350873 (unix time) try "date -d @1627350873" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x2ad282005080) received by PID 31929 (TID 0x2ad342335700) from PID 18446744071595643008; stack trace: ***
    @     0x2ad2826546d0 (unknown)
    @     0x2ad28402b333 Monitor::wait()
    @     0x2ad283bdcab2 CompileQueue::get()
    @     0x2ad283be538b CompileBroker::compiler_thread_loop()
    @     0x2ad2841ddbbb JavaThread::thread_main_inner()
    @     0x2ad2841ddec1 JavaThread::run()
    @     0x2ad28406c132 java_start()
    @     0x2ad28264ce25 start_thread
    @     0x2ad282d7bbad __clone
    @                0x0 (unknown)

@ndjensen
Copy link
Member

Does the hs_err_pid list what library it crashed in? libjvm.so or something else? And what JRE are you using?

@tank99tank
Copy link
Author

tank99tank commented Jul 28, 2021

In hs_err_pid it shows the following which I think is your want:


#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b5db2a93738, pid=20770, tid=0x00002b5d67f27700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_291-b10) (build 1.8.0_291-b10)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.291-b10 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [_multiarray_umath.cpython-36m-x86_64-linux-gnu.so+0x14c738]  PyArray_Item_INCREF+0xf8
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
 

@ndjensen
Copy link
Member

The crash is in the multiarray library, which is part of numpy. I see you're already using SharedInterpreter which is the best advice I can give you for using numpy with Jep. Unfortunately numpy wasn't written with sub-interpreters in mind and may not work correctly even when using SharedInterpreter. There's not really much Jep can do about this.

If you're going to try and solve it, I would try to narrow down what lines of Python are causing the crash and see if you can crash it with a normal CPython interpreter (not Jep).

@ndjensen ndjensen changed the title Sometimes got error SIGSEGV Sometimes got error SIGSEGV when using numpy Jul 31, 2021
@ndjensen
Copy link
Member

ndjensen commented Aug 9, 2024

On a project I work on, we encountered a crash in the same place, PyArray_ITEM_INCREF. But in our particular case, it was with SubInterpreters. It was traced to a SubInterpreter not declaring numpy as a shared module, which should be fine because it wasn't using numpy. However, in Python 3.11.9 they changed something so the threading module was automatically imported. See python/cpython#117983. So when the SubInterpreter was closed, the presence of the threading module caused the Jep.close() method to get an int/long value out of Python (see https://github.com/ninia/jep/blob/v4.1.1/src/main/java/jep/Jep.java#L428). getValue() leads to convert_p2j.c which checks for a numpy scalar object before checking for a long (see https://github.com/ninia/jep/blob/v4.1.1/src/main/c/Jep/convert_p2j.c#L1078), which caused numpy to be initialized. Then, with numpy initialized, it was immediately disposed when the SubInterpreter finished closing. Numpy was in a bad state and the next time it was used, it crashed in PyArray_ITEM_INCREF with SIGSEGV.

I don't know that this is directly related to the original problem since the original problem on this Issue is using SharedInterpreters. But if there was a mix of SubInterpreters also in the same process, then maybe something like this happened. Regardless, I felt I should document it in case others encounter something similar. The key to figuring it out on this particular software system was the numpy warning/disclaimer when importing numpy in a sub-interpreter:

sys:1: UserWarning: NumPy was imported from a Python sub-interpreter but NumPy does not properly support sub-interpreters. This will likely work for most users but might cause hard to track down issues or subtle bugs. A common user of the rare sub-interpreter feature is wsgi which also allows single-interpreter mode.
Improvements in the case of bugs are welcome, but is not on the NumPy roadmap, and full support may require significant effort to achieve.

If you're using shared modules with your Jep SubInterpreters, then that warning should not appear.

For Jep, we should somehow improve how we detect and initialize numpy to try and avoid this in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants