Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zmq_disconnect doesn't release resources associated with terminated endpoints #3186

Closed
WallStProg opened this issue Jul 17, 2018 · 18 comments
Closed

Comments

@WallStProg
Copy link
Contributor

Issue description

A long-running process which is connected to and disconnected from many times consumes ever-greater amounts of memory.

This does not show up as a leak (e.g., w/valgrind), since the memory is apparently released when the long-running process shuts down.

Environment

  • libzmq version (commit hash if unreleased): d175819
  • OS: CentOS 6.9
@WallStProg
Copy link
Contributor Author

I've been trying to track down this issue for a while, since the memory is not recognized as a leak by memcheck et al.

Nevertheless, when running a process for a long time, while other processes are repeatedly starting and stopping (and connecting and disconnecting from the long-running process), I notice memory growing from a few hundred MB to several GB.

Using massif I was able to generate statistics on memory usage over the life of the program, and from those reports it appears that the culprit is zmq_poller_poll. After taking a quick look at the code, I'm guessing that the issue has to do with socket_poller_t::items being added to but never deleted from.

I'm happy to try to create a reproduction, but before I go down that rabbit-hole, I thought I would ask here to see if this sounds reasonable, or if I'm missing something obvious.

Thanks!

@sigiesec
Copy link
Member

sigiesec commented Jul 18, 2018

Can you provide a program that reproduces the issue?

zmq_poller_poll is only called when you call zmq_poll from your code. I can't imagine that this happens here, since zmq_poller_poll exits and cleans up after every poll event.

However, in the title you refer to zmq_poller_add, which you would use together with zmq_poller_wait(_all). In that case, if you forget to remove the obsolete sockets again with zmq_poller_remove, you will exhibit unconstrained memory growth.

Or are you referring to some of the other polling mechanisms that are used internally?

@WallStProg
Copy link
Contributor Author

Thanks Simon!

Unfortunately creating a simple repro is not going to be ... simple. We've implemented a discovery mechanism using a separate channel using zmq_proxy to support point-to-point connections. This creates a fully-connected network where every node's SUB socket connects to every node's PUB socket.

When run in steady state, memory use is constant. But when other processes connect and disconnect rapidly, any long-running processes' memory utilization grows and grows. The memory is not leaking per se, since valgrind reports no leaks at shutdown.

I was initially suspicious of the push_back in zmq::socket_poller_t::add, but looking closer it appears that the zmq::socket_poller_t gets created each time zmq_poll is called, so that looks like a red herring.

I'll post back here when I have more information. Thanks again!

@sigiesec
Copy link
Member

Unconstrained memory growth is somewhat harder to analyze than memory leaks unfortunately. I am not familiar with Linux tooling for this unfortunately. Under Windows, I would use Intel Inspector e.g. to analyze Memory Growth. You can checkpoint the current memory during execution and then show the differences between checkpoints, along with the allocation callstacks, just as valgrind would do for a leak.

WallStProg added a commit to WallStProg/zmqtests that referenced this issue Jul 31, 2018
WallStProg added a commit to WallStProg/zmqtests that referenced this issue Jul 31, 2018
@WallStProg
Copy link
Contributor Author

Hi @sigiesec (et al):

I finally was able to put together a repro using plain old ZeroMQ, without any of the "special" bits from my application. (I guess that's what vacations are for ;-)

The code is at https://github.com/WallStProg/zmqtests/tree/master/3186

Anyway, the pure ZeroMQ code exhibits the same unconstrained memory growth as my application, and running valgrind memcheck/massif on it gives similar results.

Not sure how helpful this will be for you, as it's all pretty Linux-specific, but it's the best I can do. (FWIW, I'm running CentOS 6.9).

As before, it may just be that I'm doing something stupid, but if so I don't see it.

Any suggestions, hints, tips are greatly appreciated!

@WallStProg
Copy link
Contributor Author

I was able to capture memory use with valgrind by terminating the program with SIGTERM, which skipped the shutdown code (zmq_close, etc.).

This shows a number of "still-reachable" blocks that I suspect may point at the culprit.

Specifically, the leaks with block count of 141 are interesting, since that is exactly equal to the number of connects:

$ grep -c 'Connecting data' peer.out
141
$ grep ' 141 blocks' peer.out 
==14702== 4,794 bytes in 141 blocks are possibly lost in loss record 126 of 158
==14702== 13,536 bytes in 141 blocks are still reachable in loss record 140 of 158
==14702== 13,536 bytes in 141 blocks are still reachable in loss record 141 of 158
==14702== 27,072 bytes in 141 blocks are still reachable in loss record 148 of 158
==14702== 27,072 bytes in 141 blocks are still reachable in loss record 149 of 158
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 157 of 158
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 158 of 158
$ 

I'm guessing that ZeroMQ is hanging on to some data about the endpoints even after they are disconnected. That memory gets cleaned up when the socket is closed (so doesn't normally show up as a "leak"), but until then that memory can grow without bound.

FWIW, error-checking has been omitted in most cases for clarity, but the zmq_disconnect call is checked, and the program will assert if it fails.

To reproduce:

$ export PREFIX="valgrind --tool=memcheck --leak-check=full --show-reachable=yes"
$ ./runme.sh
$ ^C
$ sleep 300
$ killall peer proxy repeat.sh memcheck-amd64-linux
$ grep -c 'Connecting data' peer.out
$ CONN=$(grep -c 'Connecting data' peer.out); grep " $CONN blocks" peer.out

@WallStProg
Copy link
Contributor Author

valgrind output referenced above:

peer.out.txt

@teterycz
Copy link

teterycz commented Aug 2, 2018

Hi @sigiesec

I am seeing the leak as described by @WallStProg by attaching to the process using Intel VTune Amplifier which is part of Intel System Studio. You can get a free 3 month license and download a copy from https://software.intel.com/en-us/system-studio/choose-download#technical

Select analysis type as Memory Consumption and Attach to Process using PID after the process is already started. I've attached some preliminary results below. Please look at attached leak.xlsx file for additional details.

Top Memory-Consuming Functions        
Function Memory Consumption Allocation/Deallocation Delta Allocations Module
zmq::shared_message_memory_allocator::allocate 43 MB 195 KB 2514 libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk 23 MB 11 MB 1510 libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk 23 MB 11 MB 1510 libzmq.so.5.1.3
encoder_base_t 11 MB 80 KB 1510 libzmq.so.5.1.3
zmq::stream_engine_t::handshake 1 MB 11 KB 4530 libzmq.so.5.1.3
[Others] 13 MB 1 MB 123529  

Additional details from Intel VTune Amplifier
leak.xlsx

@WallStProg
Copy link
Contributor Author

Looks like both VTune and valgrind agree on the top consumers of memory:

==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 157 of 158
==14702==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==14702==    by 0x4CD023D: zmq::yqueue_t<zmq::msg_t, 256>::allocate_chunk() (yqueue.hpp:198)
==14702==    by 0x4CCFEC0: zmq::yqueue_t<zmq::msg_t, 256>::yqueue_t() (yqueue.hpp:70)
==14702==    by 0x4CCFBE6: zmq::ypipe_t<zmq::msg_t, 256>::ypipe_t() (ypipe.hpp:53)
==14702==    by 0x4CCD7E0: zmq::pipepair(zmq::object_t**, zmq::pipe_t**, int*, bool*) (pipe.cpp:54)
==14702==    by 0x4CE05BC: zmq::session_base_t::process_attach(zmq::i_engine*) (session_base.cpp:392)
==14702==    by 0x4CC571C: zmq::object_t::process_command(zmq::command_t&) (object.cpp:100)
==14702==    by 0x4CB75B3: zmq::io_thread_t::in_event() (io_thread.cpp:85)
==14702==    by 0x4CB5248: zmq::epoll_t::loop() (epoll.cpp:188)
==14702==    by 0x4CB5351: zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==14702==    by 0x4CFAC79: thread_routine (thread.cpp:106)
==14702==    by 0x39DC807AA0: start_thread (in /lib64/libpthread-2.12.so)
==14702== 
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 158 of 158
==14702==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==14702==    by 0x4CD023D: zmq::yqueue_t<zmq::msg_t, 256>::allocate_chunk() (yqueue.hpp:198)
==14702==    by 0x4CCFEC0: zmq::yqueue_t<zmq::msg_t, 256>::yqueue_t() (yqueue.hpp:70)
==14702==    by 0x4CCFBE6: zmq::ypipe_t<zmq::msg_t, 256>::ypipe_t() (ypipe.hpp:53)
==14702==    by 0x4CCD8A9: zmq::pipepair(zmq::object_t**, zmq::pipe_t**, int*, bool*) (pipe.cpp:61)
==14702==    by 0x4CE05BC: zmq::session_base_t::process_attach(zmq::i_engine*) (session_base.cpp:392)
==14702==    by 0x4CC571C: zmq::object_t::process_command(zmq::command_t&) (object.cpp:100)
==14702==    by 0x4CB75B3: zmq::io_thread_t::in_event() (io_thread.cpp:85)
==14702==    by 0x4CB5248: zmq::epoll_t::loop() (epoll.cpp:188)
==14702==    by 0x4CB5351: zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==14702==    by 0x4CFAC79: thread_routine (thread.cpp:106)
==14702==    by 0x39DC807AA0: start_thread (in /lib64/libpthread-2.12.so)

@teterycz
Copy link

teterycz commented Aug 3, 2018

Hi @sigiesec,

This seems to be related to an open issue dating back to 2014: #1256

One of the comments mentions pipes not being destroyed but marked as inactive, eventually being destroyed when the context is destroyed:
#1256 (comment)

From the same Intel VTune run I posted above leaks in pipe.cpp, attached pipe.xlsx shows details. My results are based on the test program provided by @WallStProg.

pipe.xlsx

Any insight or guidance on this issue would be much appreciated.

@sigiesec
Copy link
Member

sigiesec commented Aug 6, 2018

A pipe is destroyed when a pipe_term_ack command is received, which is sent by the peer using send_pipe_term_ack, which is called in a number of situations.

Unfortunately, I am not very familiar with this part of the libzmq architecture, so I cannot really tell if there is a situation where send_pipe_term_ack should be called, but it isn't.

Just one thing to consider: I am not sure at which the test connects/disconnects. It might take some time to clear up resources. zmq_disconnect (and also zmq_close) do not work synchronously.

@sigiesec
Copy link
Member

sigiesec commented Aug 6, 2018

I don't think the description of this issue is accurate. This isn't related to zmq_poller_add at all, as far as I understand this. The function that doesn't behave as you expect it is zmq_disconnect, right?

@WallStProg
Copy link
Contributor Author

Unfortunately, I am not very familiar with this part of the libzmq architecture ...

Is there someone you would suggest might be better able to help? (FWIW, I've already posted on the dev list to solicit commercial support, which is very much on the table to help us "get to the finish line").

Just one thing to consider: I am not sure at which the test connects/disconnects. It might take some time to clear up resources. zmq_disconnect (and also zmq_close) do not work synchronously.

From the top, massif, VTune etc. results, as well as from the related issues, it appears that the resources don't get cleaned up no matter how long you wait (in our cases, minutes to hours). They are only getting cleaned up when the socket is closed and/or context terminated.

I don't think the description of this issue is accurate. This isn't related to zmq_poller_add at all, as far as I understand this. The function that doesn't behave as you expect it is zmq_disconnect, right?

True enough -- that was my best guess at the time. Perhaps a better description would be "zmq_disconnect doesn't release resources associated with terminated endpoints"?

@sigiesec
Copy link
Member

sigiesec commented Aug 6, 2018

I tried something out today: When just calling zmq_disconnect as in the test_reqrep_tcp test cases, the resources are not released until the zmq_ctx_term call. When calling something that involves calling process_commands, they are released earlier (which would be calling zmq_getsockopt for ZMQ_EVENTS, e.g.). I remember that there were some other issues raised in the past that were related to commands not being processed on an unused socket.

However, I think in your example there are periodically calls to zmq_* functions that also call process_commands, so this does not seem to be the issue in your case :(

@WallStProg
Copy link
Contributor Author

WallStProg commented Aug 6, 2018

Thanks @sigiesec

I've already tried the ZMQ_EVENTS hack as mentioned in #1256 -- no joy. I've got a couple other (ineffective) work-arounds as well, which I'll commit to the repo once I've cleared up some of the #ifdef-mess (prob. by making them command-line params).

I'm guessing some kind of race condition, perhaps associated with the endpoint being moved to inactive status, and/or the other side of the connection being dropped while the disconnect is "in flight".

At this point, I think my best option is to trace through the zmq_connect/zmq_disconnect code in the debugger, but any hints on what to look for would be welcome! (It's a lot to learn, and complicated by the fact that the "tricky bits" are happening on the IO thread, I think).

@WallStProg WallStProg changed the title zmq_poller_add leaks? zmq_disconnect doesn't release resources associated with terminated endpoints Aug 6, 2018
@teterycz
Copy link

teterycz commented Aug 6, 2018

I added a dummy send in the "else if (msg.command == 'D')" block which seems to force the cleanup.

`else if (msg.command == 'D')
{
// disconnect msg from peer (possibly us)

...

// send dummy message to PUB socket to force cleanup
rc = zmq_send(dataPub, NULL, 0, ZMQ_DONTWAIT);
assert(rc == 0);

}
`

Top Memory-Consuming Functions        
Function Memory Consumption Allocation/Deallocation Delta Allocations Module
zmq::shared_message_memory_allocator::allocate 22 MB 88 KB 1294 libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk 22 MB 240 KB 1411 libzmq.so.5.1.3
zmq::encoder_base_tzmq::v2_encoder_t::encoder_base_t 5 MB 32 KB 678 libzmq.so.5.1.3
zmq::stream_engine_t::handshake 778 KB 4 KB 2034 libzmq.so.5.1.3
zmq::session_base_t::create 662 KB 4 KB 678 libzmq.so.5.1.3
[Others] 4 MB 14 KB 51045  

@WallStProg
Copy link
Contributor Author

Well, I think we've figured out the problem, and the solution is at the same time simple, but potentially pretty complicated.

The root cause of the leak appears to be that when we disconnect the dataSub socket from the dataPub socket, somebody needs to trigger process_commands to run on the dataPub socket.

The latest peer.cpp calls zmq_poll(..., ZMQ_POLLIN | ZMQ_POLLOUT) on the dataPub socket after dataSub disconnects from it. With that in place, there are no leaks, since the dataPub socket gets an opportunity to clean up the disconnected endpoint(s).

We've also implemented a similar hack in our internal test program that triggered this discussion -- that program simply subscribes to a single topic and throws away messages. That code only starts to leak when other subscribers repeatedly connect and disconnect, and this is because the program only subscribes, but never publishes data.

The hack that fixes the leak is to publish a dummy message with a dummy topic (with no subscribers) every "n" messages. That gives the dataPub socket an opportunity to clean up disconnected endpoints, and as with the peer.cpp test program that eliminates the "leak".

There are some problems with this work-around, however:

  • In the peer.cpp test program, both the dataSub and dataPub sockets are in the same process, so we know when we need to call zmq_poll. In the real world that is not the case -- the dataSub is disconnected from dataPub sockets in different processes, and the remote process has no way of knowing that another process has disconnected from it.

  • There appears to be some kind of race condition between the zmq_disconnect/zmq_poll calls and zmq_close that can cause a SEGV -- this was previously reported as crash in send_command following zmq_disconnect  #3152. We haven't pursued this isssue vigorously up to this point, since removing the zmq_disconnect/zmq_poll calls appeared to eliminate the crash, and they were superfluous because we were just going to call zmq_close on the socket anyway. However, if we need to reinstate the zmq_disconnect calls in order to release resources, then we're concerned about triggering this race/crash.

  • Last but not least, sending a dummy message to trigger process_commands can also backfire -- in other tests we've done using that approach, there appears to also be a race between the zmq_send and a subsequent zmq_close that can cause zmq_ctx_term to hang intermittently.

I would very much appreciate any suggestions that anyone has on whether there is an offical/canonical approach to dealing with the problem of triggering process_commands on sockets that have no activity other than connect/disconnect.

IMHO this would ideally entail the ability to return from a zmq_poll call whenever there is any connect/disconnect activity on a socket, and not just on arrival of a message. This would also provide a mechanism to solve other intractable problems, including #2267.

Thanks in advance for your ideas!

@stale
Copy link

stale bot commented Sep 13, 2019

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants