Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] Process aborted going infinite loop at src/fq.cpp:40 #2881

Closed
ItsNayabSD opened this issue Jan 2, 2018 · 15 comments
Closed

[Help] Process aborted going infinite loop at src/fq.cpp:40 #2881

ItsNayabSD opened this issue Jan 2, 2018 · 15 comments

Comments

@ItsNayabSD
Copy link

Process aborted, crashed and core dump created

Environment

  • libzmq 4.2.1
  • OS: Custom LInux based OS with kernel 3.14

My app was running continuously for 3 days. There is a script running in the background to restart the app if it crashes for some reason. In the last few days, I observed app was crashed three times.

When I debug the core dump with gdb, it pointed to following prints:

#0  0xb6bea424 in __GI_raise (sig=sig@entry=6) at libpthread/nptl/sysdeps/unix/sysv/linux/raise.c:67
#1  0xb6be47f0 in __GI_abort () at libc/stdlib/abort.c:89
#2  0xb6ec12b8 in _Alloc_hider (__a=..., 
    __dat=0xb6f16944 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+12> "", this=0x1c)
    at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:275
#3  basic_string (this=0x1c) at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:439
#4  zmq::fq_t::fq_t (this=0x0) at src/fq.cpp:40
#5  0xb6ec12b8 in _Alloc_hider (__a=..., 
    __dat=0xb6f16944 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+12> "", this=0x1c)
    at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:275
#6  basic_string (this=0x1c) at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:439
#7  zmq::fq_t::fq_t (this=0x0) at src/fq.cpp:40
#8  0xb6ec12b8 in _Alloc_hider (__a=..., 
    __dat=0xb6f16944 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+12> "", this=0x1c)
    at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:275
#9  basic_string (this=0x1c) at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:439
#10 zmq::fq_t::fq_t (this=0x0) at src/fq.cpp:40
#11 0xb6ec12b8 in _Alloc_hider (__a=..., 
    __dat=0xb6f16944 <std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> >::_Rep::_S_empty_rep_storage+12> "", this=0x1c)
    at /home/nayab/toolchain/arm-openwrt-linux-uclibcgnueabi/include/c++/4.8.3/bits/basic_string.h:275
#12 basic_string (this=0x1c) at /home/nayab/toolchain/arm-openwrt-linux-ucli bcgnueabi/include/c++/4.8.3/bits/basic_string.h:439
#13 zmq::fq_t::fq_t (this=0x0) at src/fq.cpp:40
.
.
.
....(Infinite times. Stack crashed).

Any useful information I am missing here? Or Would anybody tell me how to avaoid this crash?

@bluca
Copy link
Member

bluca commented Jan 2, 2018

Are you using a socket from multiple threads? That includes creating/closing/modifying options
And same questions for messages

@ItsNayabSD
Copy link
Author

No. We have only one thread which is using ZMQ socket. Other threads in the process won't use ZeroMQs at all.

@bluca
Copy link
Member

bluca commented Jan 2, 2018

That core dump doesn't make much sense either - are you sure it's valid/not corrupted?

@ItsNayabSD
Copy link
Author

It's not corrupted. I too don't have a clue how that app got crashed. :(

@bluca
Copy link
Member

bluca commented Jan 2, 2018

According to that backtrace, it's doing an infinite recursion when instantiating a fq object:

https://github.com/zeromq/libzmq/blob/v4.2.1/src/fq.cpp#L40

As you can see from the code, that just doesn't make any sense. Can you share the code that causes it?

@ItsNayabSD
Copy link
Author

ItsNayabSD commented Jan 3, 2018

Hi Bluca,

Sorry, above core dump seems to be corrupted.

Here are debug prints from the latest core dump. I hope this will be useful. I am not much into CPP. So I am unable to debug it furter.

And I couldn't back trace which part of my code caused the crash.

(gdb) bt
#0  0xb6c10424 in __GI_raise (sig=sig@entry=6) at libpthread/nptl/sysdeps/unix/sysv/linux/raise.c:67
#1  0xb6c0a7f0 in __GI_abort () at libc/stdlib/abort.c:89
#2  0xb6ee62b0 in zmq::zmq_abort (errmsg_=errmsg_@entry=0xb6c1f210 <mylock> "") at src/err.cpp:87
#3  0xb6f1b914 in zmq::udp_engine_t::out_event (this=<optimized out>) at src/udp_engine.cpp:285
#4  0xb6f1ae70 in zmq::udp_engine_t::restart_output (this=0x2e85a8) at src/udp_engine.cpp:302
#5  0xb6f02eb4 in zmq::session_base_t::read_activated (this=0x2d7bc8, pipe_=0xb6707ed8) at src/session_base.cpp:286
#6  0xb6ee7510 in zmq::io_thread_t::in_event (this=0x2d5150) at src/io_thread.cpp:85
#7  0xb6ee5e50 in zmq::epoll_t::loop (this=0x2d5670) at src/epoll.cpp:188
#8  0xb6f1898c in thread_routine (arg_=0x2d56bc) at src/thread.cpp:100
#9  0xb6f46b04 in start_thread (arg=0xb6709520) at libpthread/nptl/pthread_create.c:297
#10 0xb6c0fb44 in clone () at libpthread/nptl/sysdeps/unix/sysv/linux/arm/../../../../../../../libc/sysdeps/linux/arm/clone.S:126
#11 0xb6c0fb44 in clone () at libpthread/nptl/sysdeps/unix/sysv/linux/arm/../../../../../../../libc/sysdeps/linux/arm/clone.S:126
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@bluca
Copy link
Member

bluca commented Jan 3, 2018

Are you sure you are on v4.2.1? Because that line doesn't make much sense in that version, but it does in the latest master or v4.2.3:

https://github.com/zeromq/libzmq/blob/master/src/udp_engine.cpp#L285

So the UDP send system call is failing - in gdb, check what errno says:

up 3
p strerror(errno)

@ItsNayabSD
Copy link
Author

I am sure it is 4.2.1.

$ cat include/zmq.h | grep ZMQ_VERSION
#define ZMQ_VERSION_MAJOR 4
#define ZMQ_VERSION_MINOR 2
#define ZMQ_VERSION_PATCH 1
#define ZMQ_VERSION \
    ZMQ_MAKE_VERSION(ZMQ_VERSION_MAJOR, ZMQ_VERSION_MINOR, ZMQ_VERSION_PATCH)

And,

(gdb) up 3
#3  0xb6f1b914 in zmq::udp_engine_t::out_event (this=<optimized out>) at src/udp_engine.cpp:285
285	        errno_assert (rc != -1);
(gdb) p strerror(errno)
Cannot find thread-local variables on this target

@ItsNayabSD
Copy link
Author

Hi Bluca,

I would like to upgrade the library to 4.2.3. But I see the latest release is not updated at http://zeromq.org/intro:get-the-software
Would you please update/post the link?

@bluca
Copy link
Member

bluca commented Jan 4, 2018

Thanks, I had forgot to update that page - note that it's just a link, you can always get the latest tarballs from https://github.com/zeromq/libzmq/releases

for the gdb, try to just print errno

@ItsNayabSD
Copy link
Author

ItsNayabSD commented Jan 4, 2018

Thanks for the link. I've upgraded version to 4.2.3 (downloaded from github v4.2.3 tag). We are unable to reproduce the crash now. Seems like it is working fine. I'll be testing this for some time.

@bluca
Copy link
Member

bluca commented Jan 4, 2018

Ok, I'll close for now then - feel free to reopen if you reproduce again.

@bluca bluca closed this as completed Jan 4, 2018
@ItsNayabSD
Copy link
Author

ItsNayabSD commented Jan 5, 2018

Hi,
I am able to reproduce the issue with v4.2.3 also.
This time GDB prints:

(gdb) bt
#0  0xb6bf8424 in __GI_raise (sig=sig@entry=6) at libpthread/nptl/sysdeps/unix/sysv/linux/raise.c:67
#1  0xb6bf27f0 in __GI_abort () at libc/stdlib/abort.c:89
#2  0xb6ed0e14 in zmq::zmq_abort (errmsg_=errmsg_@entry=0xb6c07210 <mylock> "") at src/err.cpp:87
#3  0xb6f07744 in zmq::udp_engine_t::out_event (this=<optimized out>) at src/udp_engine.cpp:285
#4  0xb6f06ca4 in zmq::udp_engine_t::restart_output (this=0x2061d0) at src/udp_engine.cpp:307
#5  0xb6eeea08 in zmq::session_base_t::read_activated (this=0x1fddd8, pipe_=0xb6edd454 <zmq::object_t::process_command(zmq::command_t&)+220>) at src/session_base.cpp:288
#6  0xb6ed1ea4 in zmq::io_thread_t::in_event (this=0x1fb2e8) at src/io_thread.cpp:85
#7  0xb6ed05e8 in zmq::epoll_t::loop (this=0x1fb808) at src/epoll.cpp:188
#8  0xb6f049a8 in thread_routine (arg_=0x1fb854) at src/thread.cpp:109
#9  0xb6f33b04 in start_thread (arg=0xb66f1520) at libpthread/nptl/pthread_create.c:297
#10 0xb6bf7b44 in clone () at libpthread/nptl/sysdeps/unix/sysv/linux/arm/../../../../../../../libc/sysdeps/linux/arm/clone.S:126
#11 0xb6bf7b44 in clone () at libpthread/nptl/sysdeps/unix/sysv/linux/arm/../../../../../../../libc/sysdeps/linux/arm/clone.S:126
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) up 3
#3  0xb6f07744 in zmq::udp_engine_t::out_event (this=<optimized out>) at src/udp_engine.cpp:285
289	        errno_assert (rc != -1);
(gdb) p errno
Cannot find thread-local variables on this target
(gdb) p strerror(errno)
Cannot find thread-local variables on this target

And root cause for crash is there is no gateway entry. We had to add some dummy gateway entry to route table.
And I can't reopen this issue as it was closed by you. ;)

@bluca
Copy link
Member

bluca commented Jan 5, 2018

Right, then it's another instance of #2862 so no need to reopen, please add the backtrace and workaround there

@ItsNayabSD
Copy link
Author

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants