-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash doing batched write, JNI #1036
Comments
I got a different stack trace the second time around, but similar scenario (~ 16 million entries in), #1 0x00000034bf834085 in abort () from /lib64/libc.so.6 |
@ehamilto How did you build RocksJava? Do you have a recent C++ compiler and GDB. I noticed the string:
in your output, Googling about seemed to suggest that the version of gdb you have does not match your version of gcc, and so not all of the information can be provided. Also is this a production version, or a version with debug symbols? If it's production, and chance or reproducing on a version with debug symbols so we can maybe get some more info in the trace? |
Thank you for your quick reply, The version of gcc that I am using for building RocksDB is different from the one that comes by default in my OS (RHEL 6.4, 2.6.32-358.el6.x86_64 ). The default is 4.4.7. For building, I am using, export GCC_HOME=/opt/centos/devtoolset-2/root/usr
This is a production version, but I can definitely reproduce it with a debug build (it happens consistently). I will submit what I find later. |
I created a debug version and rerun the test. It happened a couple of times with different stack traces, see below. While I cannot share my code, I can execute gdb commands on the core files to get info if that might help you. #0 0x00000034bf8328a5 in raise () from /lib64/libc.so.6 #0 0x00000034bf8328a5 in raise () from /lib64/libc.so.6 |
@ehamilto I think you are still using the wrong version of GDB however. Also perhaps it is easier to learn from the hs_err files from the JVM crash? |
OK, here is what the hs_errr files give with the debug builds. I can upload them if you want them. Stack: [0x00007fc6002d1000,0x00007fc6003d2000], sp=0x00007fc6003d0240, free space=1020k Stack: [0x00007f977219c000,0x00007f977229d000], sp=0x00007f977229b018, free space=1020k |
@ehamilto Is there a way you can give us a reproducible test-case with a test-code that you share with us? Then I can run the same code on my machine and see if I can debug/fix it for you. |
Let me try to reproduce the issue by running YCSB in embedded mode (ie, without the client/server code). If I am able to do that, I'll be happy to share the code. |
I was getting the crash with the stand alone code, but now I am not. So let me make sure that I post the code that reproduces it consistently. I'll get back later in the day. |
Hi, Unfortunately, I am unable to produce a standalone test that produces the crash (I attach below the code I have been using for this purpose, a slightly modified version of the one that I posted yesterday in which each thread uses a dedicated WriteBatch object to maximize write activity). I cannot share the code from the setup that produces the crash because of confidentiality reasons. I can only say that it is a client and a server. The client implements the DB class required for the YCSB tests and sends requests to a server that processes those requests as RocksDB operations. I have tried to run the same YCSB test suite that produces the crash using the standalone code without much success. As an alternative, I can inspect the crash, that I consistently get, or run a santized build using https://github.com/google/sanitizers/wiki/AddressSanitizer ) if you provide instructions on how to create one. In the past we have used it successfully to identify crahes created by memory corruption between threads (which I don't know if it is the case here). Thanks again for your help. import com.yahoo.ycsb.ByteArrayByteIterator; import java.nio.ByteBuffer; public class RocksDBYCSBBatchBinding extends DB {
} |
After careful analysis it seems a different library we are using for the project when running on client/server mode was corrupting RocksDB's native memory, Sorry I cannot give more details. The issue can be closed now. |
Hi guys,
Below the stack trace. The use case is one YCSB client running 4 threads during the loading phase, with workload,
fieldcount=1
recordcount=1000000000
operationcount=1000000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
requestdistribution=zipfian
There is a different server process that serves requests of the YCSB client and writes to RocksDB with a single writer thread with batches of size 100, writeoptions with sync flag set to true. After writting ~ 20 million entries, I get the crash.
#0 0x00000034bf8328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.5-7.el6_0.x86_64 glibc-2.12-1.107.el6.x86_64 libgcc-4.4.7-3.el6.x86_64 libstdc++-4.4.7-3.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000034bf8328a5 in raise () from /lib64/libc.so.6
#1 0x00000034bf834085 in abort () from /lib64/libc.so.6
#2 0x00007fb18fcabe85 in os::abort(bool) () from /home/fgomez/java/jdk1.7.0/jre/lib/amd64/server/libjvm.so
#3 0x00007fb18fe00907 in VMError::report_and_die() () from /home/fgomez/java/jdk1.7.0/jre/lib/amd64/server/libjvm.so
#4 0x00007fb18fcafab0 in JVM_handle_linux_signal () from /home/fgomez/java/jdk1.7.0/jre/lib/amd64/server/libjvm.so
#5
#6 _mm_crc32_u64 (crc=Unhandled dwarf expression opcode 0xf3
) at /opt/centos/devtoolset-2/root/usr/lib/gcc/x86_64-redhat-linux/4.8.2/include/smmintrin.h:822
#7 Fast_CRC32 (crc=Unhandled dwarf expression opcode 0xf3
) at util/crc32c.cc:321
#8 rocksdb::crc32c::ExtendImplrocksdb::crc32c::Fast_CRC32 (crc=Unhandled dwarf expression opcode 0xf3
) at util/crc32c.cc:361
#9 0x00007fb1668052a4 in rocksdb::log::Writer::EmitPhysicalRecord (this=0x7fb0c4b6c700, t=Unhandled dwarf expression opcode 0xf3
) at db/log_writer.cc:121
#10 0x00007fb16680551d in rocksdb::log::Writer::AddRecord (this=0x7fb0c4b6c700, slice=Unhandled dwarf expression opcode 0xf3
) at db/log_writer.cc:82
#11 0x00007fb1667d6332 in rocksdb::DBImpl::WriteImpl (this=0x7fb188265a30, write_options=Unhandled dwarf expression opcode 0xf3
) at db/db_impl.cc:4381
#12 0x00007fb1667d6c94 in rocksdb::DBImpl::Write (this=Unhandled dwarf expression opcode 0xf3
) at db/db_impl.cc:4101
#13 0x00007fb16676818c in Java_org_rocksdb_RocksDB_write0 (env=0x7fb1888869d0, jdb=0x7fb15e33f790, jwrite_options_handle=140400476573760, jwb_handle=140397742812128)
The text was updated successfully, but these errors were encountered: