Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch 7.X fails to start with jna tmp dir configured (CentOS8 - Hardened) #73309

Closed
cyamal1b4 opened this issue May 22, 2021 · 3 comments · Fixed by #80651
Closed

Elasticsearch 7.X fails to start with jna tmp dir configured (CentOS8 - Hardened) #73309

cyamal1b4 opened this issue May 22, 2021 · 3 comments · Fixed by #80651
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team

Comments

@cyamal1b4
Copy link

I am posting this as an open bug for the ES team to review, it is similar to other issues but this was the ONLY actual fix I found for my particular air gapped deployment.

Overview:

Elasticsearch 7.12,
Oracle JDK 1.8.0,
Noexec on /tmp,
Selinux in enforcing mode,
Explicitly defined path for jna.tmpdir without noexec,
CentOS 8
A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f9800b8d2ca, pid=21823, tid=0x00007f98f541f700

JRE version: Java(TM) SE Runtime Environment (8.0_112-b15) (build 1.8.0_112-b15)
Java VM: Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode linux-amd64 compressed oops)
Problematic frame:
C [jna1247360738620499207.tmp+0x122ca] ffi_prep_closure_loc+0x1a

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
The crash happened outside the Java Virtual Machine in native code.
See problematic frame for where to report the bug.

The reason of failure was that the system user with which ES was running did not have an existing home directory.

Once I created the homedir, it started as expected.
In the home directory, a directory ".oracle_jre_usage" is created which has a fd53b05c83802a42.timestamp file.

More about this file - https://community.oracle.com/thread/3783686.

This is a solution to a persistent problem I encountered but also through research all seemed to point to tmpdir issues, the logs seemed to suggest as well. But on deep examination and research it was discovered that on install there is no home directory provided to the elasticsearch user and thus causes this exact error with the JNA libraries.

@cyamal1b4 cyamal1b4 added >bug needs:triage Requires assignment of a team area label labels May 22, 2021
@pgomulka pgomulka added the :Core/Infra/Core Core issues without another label label May 24, 2021
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label May 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Aug 28, 2021

I encountered a similar situation recently:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2fa3bf9b85, pid=111900, tid=112101
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (14.0+36) (build 14+36)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (14+36, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [jna6252914664211337753.tmp+0x12b85]  ffi_prep_closure_loc+0x15

...

Stack: [0x00007f30af2bb000,0x00007f30af3bc000],  sp=0x00007f30af3b9390,  free space=1016k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [jna6252914664211337753.tmp+0x12b85]  ffi_prep_closure_loc+0x15
C  [jna6252914664211337753.tmp+0x9ebc]  Java_com_sun_jna_Native_registerMethod+0x51c
j  com.sun.jna.Native.registerMethod(Ljava/lang/Class;Ljava/lang/String;Ljava/lang/String;[I[J[JIJJLjava/lang/reflect/Method;JIZ[Lcom/sun/jna/ToNativeConverter;Lcom/sun/jna/FromNativeConverter;Ljava/lang/String;)J+0
j  com.sun.jna.Native.register(Ljava/lang/Class;Lcom/sun/jna/NativeLibrary;)V+1117
j  com.sun.jna.Native.register(Ljava/lang/Class;Ljava/lang/String;)V+17
j  com.sun.jna.Native.register(Ljava/lang/String;)V+7
j  org.elasticsearch.bootstrap.JNACLibrary.<clinit>()V+73
...

Registers:
RAX=0x00007f2fa3bfa186, RBX=0x00007f30a802b310, RCX=0x00007f30aaf82560, RDX=0x00007f2fa3bf3970
RSP=0x00007f30af3b9390, RBP=0x00007f30af3b9390, RSI=0x00007f30aaf82580, RDI=0x0000000000000000
R8 =0x00007f30a802b000, R9 =0x0000000000000000, R10=0x00007f30af3b8de0, R11=0x00007f30aeb78da0
R12=0x00007f30aaf82560, R13=0x00007f30aaf82580, R14=0x0000000000000000, R15=0x0000000000000002
RIP=0x00007f2fa3bf9b85, EFLAGS=0x0000000000010293, CSGSFS=0x0000000000000033, ERR=0x0000000000000006
  TRAPNO=0x000000000000000e

Instructions: (pc=0x00007f2fa3bf9b85)
...
0x00007f2fa3bf9b85:   66 c7 07 49 bb 4c 89 47 0c 66 c7 47 0a 49 ba 48
...

The opcodes for the current instruction (at the pc) are 66 c7 07 49 bb which disassembles to mov WORD PTR [rdi],0xbb49, and RDI=0x0000000000000000 so the SIGSEGV is reporting a null dereference, which looks like a genuine bug. I'm not 100% sure what calling convention is in play here but I believe RDI is typically the first argument to the function, which would be ffi_closure* closure, which could be NULL if the previous ffi_closure_alloc failed since JNA doesn't check for a failure in that call:

https://github.com/java-native-access/jna/blob/030411b909d5dfd249b1df09a7f24c44babcae64/native/dispatch.c#L3468-L3469

There's a couple of other places in that function where allocation failures don't look to be handled too. I'm not sure how this could end up with a reproducible crash in specific configs, but I've opened a discussion with the JNA folks anyway as I think it would be better to throw a proper exception rather than just crash the whole JVM with a SEGV if an allocation fails.

DaveCTurner added a commit to DaveCTurner/jna that referenced this issue Aug 30, 2021
`ffi_closure_alloc` may fail and return `NULL` if, for instance, we're
running in a locked-down operating system that forbids FFI from
allocating executable pages of memory in any of the ways that it tries.
Today we pass this `NULL` on to `ffi_prep_closure_loc` which triggers a
segmentation fault that takes down the whole JVM. With this change we
check for a failure in this call and turn it into an
`UnsupportedOperationException` so that the caller can handle it more
gracefully.

Relates elastic/elasticsearch#73309
Relates elastic/elasticsearch#18272
@DaveCTurner
Copy link
Contributor

This is a solution to a persistent problem I encountered but also through research all seemed to point to tmpdir issues, the logs seemed to suggest as well. But on deep examination and research it was discovered that on install there is no home directory provided to the elasticsearch user and thus causes this exact error with the JNA libraries.

I believe this failure is due to your temp directory forbidding executables, but notice that libffi will try $HOME if it can't find a more suitable temp directory. I think this explains why adding a home directory appears to fix the problem, but really it's just a workaround. The real problem is here in your OP:

Noexec on /tmp,

The temp directory must not be mounted noexec.

I opened java-native-access/jna#1378 to turn this into a more graceful failure, although we'll still fail under these circumstances.

DaveCTurner added a commit to DaveCTurner/jna that referenced this issue Sep 13, 2021
`ffi_closure_alloc` may fail and return `NULL` if, for instance, we're
running in a locked-down operating system that forbids FFI from
allocating executable pages of memory in any of the ways that it tries.
Today we pass this `NULL` on to `ffi_prep_closure_loc` which triggers a
segmentation fault that takes down the whole JVM. With this change we
check for a failure in this call and turn it into an
`UnsupportedOperationException` so that the caller can handle it more
gracefully.

Relates elastic/elasticsearch#73309
Relates elastic/elasticsearch#18272
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 11, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since elastic#80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes elastic#18272
Closes elastic#73309
Closes elastic#74545
Closes elastic#77014
Closes elastic#77053
Relates elastic#77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
elasticsearchmachine pushed a commit that referenced this issue Nov 15, 2021
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
elasticsearchmachine pushed a commit that referenced this issue Nov 15, 2021
* Set LIBFFI_TMPDIR at startup (#80651)

Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.

With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.

Closes #18272
Closes #73309
Closes #74545
Closes #77014
Closes #77053
Relates #77285

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>

* Fix incorrect SSL usage

Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants