Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call to zmq_init with >0 thread count causes core dump #159

Closed
mbhatia opened this issue Feb 1, 2011 · 7 comments
Closed

Call to zmq_init with >0 thread count causes core dump #159

mbhatia opened this issue Feb 1, 2011 · 7 comments

Comments

@mbhatia
Copy link

mbhatia commented Feb 1, 2011

ZMQ Version : 2.1.0
OS : AIX 5.3
Compiler : IBM XLC/C++ v10.1

While calling the zmq_init() function with an argument >1, the program core dumps. It does not core dump when the argument sent is 0.

Here is the stack trace from dbx:

dbx ./HelloWorldServer
Type 'help' for help.
[using memory image in core]
reading symbolic information ...

IOT/Abort trap in pthread_kill at 0xd005f734 ($t1)
0xd005f734 (pthread_kill+0x88) 80410014 lwz r2,0x14(r1)
(dbx) where
pthread_kill(??, ??) at 0xd005f734
_p_raise(??) at 0xd005f1a4
raise.raise(??) at 0xd02a38f0
abort() at 0xd0307778
myabort()() at 0xd0244aac
terminate()() at 0xd0242df0
terminate()() at 0xd024424c
__DoThrowV6() at 0xd02468f0
std::vectorzmq::poll_t::fd_entry_t,std::allocator<zmq::poll_t::fd_entry_t >::_Xlen() const(this = 0x200091d0), line 317 in "vector"
std::vectorzmq::poll_t::fd_entry_t,std::allocator<zmq::poll_t::fd_entry_t >::insert(std::_Ptritzmq::poll_t::fd_entry_t,long,zmq::poll_t::fd_entry_t*,zmq::poll_t::fd_entry_t&,zmq::poll_t::fd_entry_t*,zmq::poll_t::fd_entry_t&,unsigned long,const zmq::poll_t::fd_entry_t&)(this = 0x200091d0, _P = &(...), _M = 2147483647, _X = &(...)), line 63 in "vector.t"
poll.std::vectorzmq::poll_t::fd_entry_t,std::allocator<zmq::poll_t::fd_entry_t >::resize(unsigned long,zmq::poll_t::fd_entry_t)(this = 0x200091d0, _N = 2147483647, _X = (...)), line 193 in "vector"
poll.std::vectorzmq::poll_t::fd_entry_t,std::allocator<zmq::poll_t::fd_entry_t >::resize(unsigned long)(this = 0x200091d0, _N = 2147483647), line 190 in "vector"
poll.std::vectorzmq::poll_t::fd_entry_t,std::allocator<zmq::poll_t::fd_entry_t >::resize(unsigned long,zmq::poll_t::fd_entry_t)(this = 0x2ff21f80, _N = 4048429592, _X = (...)), line 48 in "poll.cpp"
zmq::io_thread_t::in_event()(this = 0xd68568c0, 0x20008650, 0x0), line 32 in "io_thread.cpp"
zmq::ctx_t::~ctx_t()(this = 0x2ff22080, _dtorFlags = -246531096), line 59 in "ctx.cpp"
zmq.zmq_init(io_threads
= 1), line 243 in "zmq.cpp"
main(), line 18 in "HelloWorldServer.c"

After some digging around, I found out that in poll.cpp, the call to getrlimit() for RLIMIT_NOFILE resource returns 2147483647 when the ulimit for no of files per process is set to 'unlimited'. Thus, the next statement which tries to resize fd_table, core dumps.

As a work around, I have forced the ulimit to a reasonable number (256 for now) in my environment. However, I think that the code should handle this condition in a better way (probably use a sane default value, or return with an exit code/message).

@sustrik
Copy link
Member

sustrik commented Feb 2, 2011

POSIX defines no constant for "unlimited" value of RLIMIT_NOFILE.
Is there's an AIX-specific constant that would allow us to do something like:

if (max_fds == AIX_UNLIMITED_NOFILE)
max_fds = 1000; // default

@sustrik
Copy link
Member

sustrik commented Feb 2, 2011

Hm, RLIM_INFINITY maybe.

But still, if i choose default of say 1000, how can I be sure that the OS won't assign fd of 1001 to a socket...

@mbhatia
Copy link
Author

mbhatia commented Feb 2, 2011

sustrik,
That is a good point. Is this behavior of 'unlimited' no. of files/process specific to AIX? What do other platforms return in getrlimit() for 'unlimited'?
I am not much of a C++ programmer, but may be, there can be a try-catch around the fd_table.resize() call, which can then translate an allocation error in resize() and throw a custom exception to tell the user to set the ulimit for no. of files to a reasonable value before running the program?

@sustrik
Copy link
Member

sustrik commented Feb 2, 2011

Yes. That's doable. Would that work for you?

Btw, check the value of RLIM_INFINITY on AIX (it should be defined in sys/resource.h assording to POSIX) in the meantime?

@mbhatia
Copy link
Author

mbhatia commented Feb 2, 2011

Yeah. I have the program working by setting:

ulimit -n 256

before running my program.

Ideally, the problem should be fixed in such a way that the code can work without any issues with 'unlimited' resource limits, but I am assuming that would require some re-writing of the code.
For now, imo, a simple exception stating the resource limit is set too high, should be enough, so the user knows what's causing the crash and how to correct the problem.

Here is how RLIM_INFINITY is defined on AIX:

#if defined(__64BIT__) && !defined(__64BIT_KERNEL)
#define RLIM_INFINITY   0x7fffffffffffffffL
#else
#define RLIM_INFINITY   0x7FFFFFFF
#endif /* __64BIT__ */

@sustrik
Copy link
Member

sustrik commented Feb 3, 2011

Ok, I've rewritten the code not to use RLIMIT_NOFILE at all. Please check the master branch and let me know whether it works for you.

@sustrik
Copy link
Member

sustrik commented Feb 17, 2011

No reply. Assuming the problem is fixed. Closing the issue.

benjdero pushed a commit to benjdero/libzmq that referenced this issue Feb 20, 2023
Updated documentation with latest API
bluca pushed a commit that referenced this issue Oct 31, 2023
Problem: manpage mentions options not available in 4.0.x
bluca pushed a commit that referenced this issue Oct 31, 2023
Problem: socket_type_string off-by-one error
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants