Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some problems with 5.6.1 #2733

Closed
framlog opened this issue Aug 14, 2017 · 5 comments
Closed

some problems with 5.6.1 #2733

framlog opened this issue Aug 14, 2017 · 5 comments

Comments

@framlog
Copy link

framlog commented Aug 14, 2017

Hi,

What's the version of the stable release package?

I upgraded the rocksdb version from 4.6 to 5.6.1 recently, but it seems to have many problems. For example, we encounter a write stall after tcpcopy running a while with the pipeline write enabled (specifically, 9 of 10 write threads stall at waiting the condition variable, and only one thread can work normally.) Or I even got a core dump at Get method(it seems that I got a use after free...).
I have to say we modified some code of it a bit but these problems happened without enabling the customized logic.
Besides, I get confused after seeing the version of release packages and its corresponding dates.

@framlog
Copy link
Author

framlog commented Aug 14, 2017

#0  0x0000003568889705 in memcpy () from /lib64/libc.so.6
#1  0x00000000010cab4f in std::char_traits<char>::copy (__n=20, __s2=<optimized out>, __s1=0x12f456c0 "\200\326\367\023")
    at /opt/soft/gcc-5.4.0/gcc-build-5.4.0/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/char_traits.h:290
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy (__n=20, __s=<optimized out>, __d=0x12f456c0 "\200\326\367\023")
    at /opt/soft/gcc-5.4.0/gcc-build-5.4.0/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.h:299
#3  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_mutate (this=this@entry=0x7fe409ee92f0, __pos=0, __len1=__len1@entry=0, __s=<optimized out>, __len2=20)
    at /opt/soft/gcc-5.4.0/gcc-build-5.4.0/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:326
#4  0x00000000010cb65b in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace (this=this@entry=0x7fe409ee92f0, __pos=__pos@entry=0, __len1=0, __s=<optimized out>, 
    __len2=<optimized out>) at /opt/soft/gcc-5.4.0/gcc-build-5.4.0/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/basic_string.tcc:470
#5  0x0000000000a06b24 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign (__n=<optimized out>, __s=<optimized out>, this=0x7fe409ee92f0)
    at /opt/gcc-5.4/include/c++/5.4.0/bits/basic_string.h:1150
#6  rocksdb::DB::Get (this=<optimized out>, options=..., column_family=0x3209140, key=..., value=0x7fe409ee92f0, this=<optimized out>) at ./include/rocksdb/db.h:319
#7  0x0000000000937a1f in singleGetValueByKey (db=0x304e400, key=..., value=..., error=..., handle=0x3209140) at /home/xiaoju/bigdata-storage/fusion.r2/src/inner_hmap_helper.cpp:305
#8  0x0000000000993b32 in rocksHmget (db=0x304e400, handle=0x3209140, hmgetQuery=..., hmgetResult=..., ishmgets=false, error=...) at /home/xiaoju/bigdata-storage/fusion.r2/src/rocks_hmap.cpp:80
#9  0x00000000008d0cbe in hmget_cmd (req=0x29857240) at /home/xiaoju/bigdata-storage/fusion.r2/src/cmds.cpp:2337
#10 0x00000000008eaeea in FusionCommand::process (this=0x3203c28, req=0x29857240) at /home/xiaoju/bigdata-storage/fusion.r2/src/cmds.h:42
#11 0x00000000008c169f in cmd_proc (req=0x29857240) at /home/xiaoju/bigdata-storage/fusion.r2/src/cmds.cpp:168
#12 0x00000000008942b2 in task_process (data=0x29857240, user_data=0x0) at /home/xiaoju/bigdata-storage/fusion.r2/src/resp_server.cpp:799
#13 0x0000000000894975 in server_read (bev=0x1430af40, ctx=0x3451ea20) at /home/xiaoju/bigdata-storage/fusion.r2/src/resp_server.cpp:886
#14 0x0000000000c6b9be in bufferevent_run_deferred_callbacks_unlocked (cb=<optimized out>, arg=0x1430af40) at bufferevent.c:189
#15 0x0000000000c74ad0 in event_process_active_single_queue (base=base@entry=0x6596840, activeq=0x2e3e8f0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1675
#16 0x0000000000c756ef in event_process_active (base=0x6596840) at event.c:1738
#17 event_base_loop (base=0x6596840, flags=0) at event.c:1961
#18 0x00000000008968e9 in parser_loop (data=0x2e761a0) at /home/xiaoju/bigdata-storage/fusion.r2/src/resp_server.cpp:1164
#19 0x0000000000c4be15 in g_thread_proxy (data=0x2e7b0a0) at gthread.c:778
#20 0x0000003568c07a51 in start_thread () from /lib64/libpthread.so.0
#21 0x00000035688e896d in clone () from /lib64/libc.so.6

@framlog framlog changed the title which version is stable some problems with 5.6.1 Aug 14, 2017
@yiwu-arbug
Copy link
Contributor

Hi @framlog. The latest release is 5.7.1. All the versions we tagged on https://github.com/facebook/rocksdb/releases are considered stable at the time we release, though there can be issues found later on, and we bump minor version with hot fixes.

Regarding pipelined write, are you seeing the threads get deadlock at waiting the condition variable, or are they able to advance? If it is not deadlock, what you see is expected. Our current implementation for write is that one thread will be the leader doing the actual write for other concurrent writers (the followers), and followers wait on the condition variable until the leader finishes. We'll add more documents to describe the behavior later.

Regarding issue with Get(), can you try compile with ASAN and provide the error logs printed by ASAN?

@framlog
Copy link
Author

framlog commented Aug 15, 2017

Thanks for reply.

Well, for the first problem, threads stalled at waiting the condition variable, but I don't sure about it's a bug of rocksdb after a review today since the server may restart while having a plethora of wallog. Also, I've read the logic of the write procedure but didn't think it had some bugs in it.
For the second problem, I'll get you know once we encounter that problem again

@framlog
Copy link
Author

framlog commented Aug 17, 2017

Asan didn't generate any useful information, it just repeated the core stack. Is it because I didn't enable asan properly(I compiled the library with COMPILE_WITH_ASAN=1 but static linked the library into the program) ?

@framlog
Copy link
Author

framlog commented Aug 24, 2017

I fixed this problem by changing PinSlice to PinSelf, which I think it's an offset against the aim of the PinnableSlice though... But, I don't know the exact reason currently.

@framlog framlog closed this as completed Nov 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants