zmq_disconnect doesn't release resources associated with terminated endpoints #3186

WallStProg · 2018-07-17T17:58:51Z

Issue description

A long-running process which is connected to and disconnected from many times consumes ever-greater amounts of memory.

This does not show up as a leak (e.g., w/valgrind), since the memory is apparently released when the long-running process shuts down.

Environment

libzmq version (commit hash if unreleased): d175819
OS: CentOS 6.9

The text was updated successfully, but these errors were encountered:

WallStProg · 2018-07-17T18:19:56Z

I've been trying to track down this issue for a while, since the memory is not recognized as a leak by memcheck et al.

Nevertheless, when running a process for a long time, while other processes are repeatedly starting and stopping (and connecting and disconnecting from the long-running process), I notice memory growing from a few hundred MB to several GB.

Using massif I was able to generate statistics on memory usage over the life of the program, and from those reports it appears that the culprit is zmq_poller_poll. After taking a quick look at the code, I'm guessing that the issue has to do with socket_poller_t::items being added to but never deleted from.

I'm happy to try to create a reproduction, but before I go down that rabbit-hole, I thought I would ask here to see if this sounds reasonable, or if I'm missing something obvious.

Thanks!

sigiesec · 2018-07-18T09:04:01Z

Can you provide a program that reproduces the issue?

zmq_poller_poll is only called when you call zmq_poll from your code. I can't imagine that this happens here, since zmq_poller_poll exits and cleans up after every poll event.

However, in the title you refer to zmq_poller_add, which you would use together with zmq_poller_wait(_all). In that case, if you forget to remove the obsolete sockets again with zmq_poller_remove, you will exhibit unconstrained memory growth.

Or are you referring to some of the other polling mechanisms that are used internally?

WallStProg · 2018-07-18T21:02:14Z

Thanks Simon!

Unfortunately creating a simple repro is not going to be ... simple. We've implemented a discovery mechanism using a separate channel using zmq_proxy to support point-to-point connections. This creates a fully-connected network where every node's SUB socket connects to every node's PUB socket.

When run in steady state, memory use is constant. But when other processes connect and disconnect rapidly, any long-running processes' memory utilization grows and grows. The memory is not leaking per se, since valgrind reports no leaks at shutdown.

I was initially suspicious of the push_back in zmq::socket_poller_t::add, but looking closer it appears that the zmq::socket_poller_t gets created each time zmq_poll is called, so that looks like a red herring.

I'll post back here when I have more information. Thanks again!

sigiesec · 2018-07-19T08:13:47Z

Unconstrained memory growth is somewhat harder to analyze than memory leaks unfortunately. I am not familiar with Linux tooling for this unfortunately. Under Windows, I would use Intel Inspector e.g. to analyze Memory Growth. You can checkpoint the current memory during execution and then show the differences between checkpoints, along with the allocation callstacks, just as valgrind would do for a leak.

WallStProg · 2018-07-31T21:11:08Z

Hi @sigiesec (et al):

I finally was able to put together a repro using plain old ZeroMQ, without any of the "special" bits from my application. (I guess that's what vacations are for ;-)

The code is at https://github.com/WallStProg/zmqtests/tree/master/3186

Anyway, the pure ZeroMQ code exhibits the same unconstrained memory growth as my application, and running valgrind memcheck/massif on it gives similar results.

Not sure how helpful this will be for you, as it's all pretty Linux-specific, but it's the best I can do. (FWIW, I'm running CentOS 6.9).

As before, it may just be that I'm doing something stupid, but if so I don't see it.

Any suggestions, hints, tips are greatly appreciated!

WallStProg · 2018-08-01T19:36:48Z

I was able to capture memory use with valgrind by terminating the program with SIGTERM, which skipped the shutdown code (zmq_close, etc.).

This shows a number of "still-reachable" blocks that I suspect may point at the culprit.

Specifically, the leaks with block count of 141 are interesting, since that is exactly equal to the number of connects:

$ grep -c 'Connecting data' peer.out
141
$ grep ' 141 blocks' peer.out 
==14702== 4,794 bytes in 141 blocks are possibly lost in loss record 126 of 158
==14702== 13,536 bytes in 141 blocks are still reachable in loss record 140 of 158
==14702== 13,536 bytes in 141 blocks are still reachable in loss record 141 of 158
==14702== 27,072 bytes in 141 blocks are still reachable in loss record 148 of 158
==14702== 27,072 bytes in 141 blocks are still reachable in loss record 149 of 158
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 157 of 158
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 158 of 158
$

I'm guessing that ZeroMQ is hanging on to some data about the endpoints even after they are disconnected. That memory gets cleaned up when the socket is closed (so doesn't normally show up as a "leak"), but until then that memory can grow without bound.

FWIW, error-checking has been omitted in most cases for clarity, but the zmq_disconnect call is checked, and the program will assert if it fails.

To reproduce:

$ export PREFIX="valgrind --tool=memcheck --leak-check=full --show-reachable=yes"
$ ./runme.sh
$ ^C
$ sleep 300
$ killall peer proxy repeat.sh memcheck-amd64-linux
$ grep -c 'Connecting data' peer.out
$ CONN=$(grep -c 'Connecting data' peer.out); grep " $CONN blocks" peer.out

WallStProg · 2018-08-01T19:37:31Z

valgrind output referenced above:

peer.out.txt

teterycz · 2018-08-02T15:30:08Z

Hi @sigiesec

I am seeing the leak as described by @WallStProg by attaching to the process using Intel VTune Amplifier which is part of Intel System Studio. You can get a free 3 month license and download a copy from https://software.intel.com/en-us/system-studio/choose-download#technical

Select analysis type as Memory Consumption and Attach to Process using PID after the process is already started. I've attached some preliminary results below. Please look at attached leak.xlsx file for additional details.

Top Memory-Consuming Functions
Function	Memory Consumption	Allocation/Deallocation Delta	Allocations	Module
zmq::shared_message_memory_allocator::allocate	43 MB	195 KB	2514	libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk	23 MB	11 MB	1510	libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk	23 MB	11 MB	1510	libzmq.so.5.1.3
encoder_base_t	11 MB	80 KB	1510	libzmq.so.5.1.3
zmq::stream_engine_t::handshake	1 MB	11 KB	4530	libzmq.so.5.1.3
[Others]	13 MB	1 MB	123529

Additional details from Intel VTune Amplifier
leak.xlsx

WallStProg · 2018-08-02T17:33:52Z

Looks like both VTune and valgrind agree on the top consumers of memory:

==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 157 of 158
==14702==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==14702==    by 0x4CD023D: zmq::yqueue_t<zmq::msg_t, 256>::allocate_chunk() (yqueue.hpp:198)
==14702==    by 0x4CCFEC0: zmq::yqueue_t<zmq::msg_t, 256>::yqueue_t() (yqueue.hpp:70)
==14702==    by 0x4CCFBE6: zmq::ypipe_t<zmq::msg_t, 256>::ypipe_t() (ypipe.hpp:53)
==14702==    by 0x4CCD7E0: zmq::pipepair(zmq::object_t**, zmq::pipe_t**, int*, bool*) (pipe.cpp:54)
==14702==    by 0x4CE05BC: zmq::session_base_t::process_attach(zmq::i_engine*) (session_base.cpp:392)
==14702==    by 0x4CC571C: zmq::object_t::process_command(zmq::command_t&) (object.cpp:100)
==14702==    by 0x4CB75B3: zmq::io_thread_t::in_event() (io_thread.cpp:85)
==14702==    by 0x4CB5248: zmq::epoll_t::loop() (epoll.cpp:188)
==14702==    by 0x4CB5351: zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==14702==    by 0x4CFAC79: thread_routine (thread.cpp:106)
==14702==    by 0x39DC807AA0: start_thread (in /lib64/libpthread-2.12.so)
==14702== 
==14702== 2,312,400 bytes in 141 blocks are still reachable in loss record 158 of 158
==14702==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==14702==    by 0x4CD023D: zmq::yqueue_t<zmq::msg_t, 256>::allocate_chunk() (yqueue.hpp:198)
==14702==    by 0x4CCFEC0: zmq::yqueue_t<zmq::msg_t, 256>::yqueue_t() (yqueue.hpp:70)
==14702==    by 0x4CCFBE6: zmq::ypipe_t<zmq::msg_t, 256>::ypipe_t() (ypipe.hpp:53)
==14702==    by 0x4CCD8A9: zmq::pipepair(zmq::object_t**, zmq::pipe_t**, int*, bool*) (pipe.cpp:61)
==14702==    by 0x4CE05BC: zmq::session_base_t::process_attach(zmq::i_engine*) (session_base.cpp:392)
==14702==    by 0x4CC571C: zmq::object_t::process_command(zmq::command_t&) (object.cpp:100)
==14702==    by 0x4CB75B3: zmq::io_thread_t::in_event() (io_thread.cpp:85)
==14702==    by 0x4CB5248: zmq::epoll_t::loop() (epoll.cpp:188)
==14702==    by 0x4CB5351: zmq::epoll_t::worker_routine(void*) (epoll.cpp:203)
==14702==    by 0x4CFAC79: thread_routine (thread.cpp:106)
==14702==    by 0x39DC807AA0: start_thread (in /lib64/libpthread-2.12.so)

teterycz · 2018-08-03T15:30:57Z

Hi @sigiesec,

This seems to be related to an open issue dating back to 2014: #1256

One of the comments mentions pipes not being destroyed but marked as inactive, eventually being destroyed when the context is destroyed:
#1256 (comment)

From the same Intel VTune run I posted above leaks in pipe.cpp, attached pipe.xlsx shows details. My results are based on the test program provided by @WallStProg.

pipe.xlsx

Any insight or guidance on this issue would be much appreciated.

sigiesec · 2018-08-06T11:10:31Z

A pipe is destroyed when a pipe_term_ack command is received, which is sent by the peer using send_pipe_term_ack, which is called in a number of situations.

Unfortunately, I am not very familiar with this part of the libzmq architecture, so I cannot really tell if there is a situation where send_pipe_term_ack should be called, but it isn't.

Just one thing to consider: I am not sure at which the test connects/disconnects. It might take some time to clear up resources. zmq_disconnect (and also zmq_close) do not work synchronously.

sigiesec · 2018-08-06T11:12:19Z

I don't think the description of this issue is accurate. This isn't related to zmq_poller_add at all, as far as I understand this. The function that doesn't behave as you expect it is zmq_disconnect, right?

WallStProg · 2018-08-06T14:03:07Z

Unfortunately, I am not very familiar with this part of the libzmq architecture ...

Is there someone you would suggest might be better able to help? (FWIW, I've already posted on the dev list to solicit commercial support, which is very much on the table to help us "get to the finish line").

Just one thing to consider: I am not sure at which the test connects/disconnects. It might take some time to clear up resources. zmq_disconnect (and also zmq_close) do not work synchronously.

From the top, massif, VTune etc. results, as well as from the related issues, it appears that the resources don't get cleaned up no matter how long you wait (in our cases, minutes to hours). They are only getting cleaned up when the socket is closed and/or context terminated.

I don't think the description of this issue is accurate. This isn't related to zmq_poller_add at all, as far as I understand this. The function that doesn't behave as you expect it is zmq_disconnect, right?

True enough -- that was my best guess at the time. Perhaps a better description would be "zmq_disconnect doesn't release resources associated with terminated endpoints"?

sigiesec · 2018-08-06T15:03:12Z

I tried something out today: When just calling zmq_disconnect as in the test_reqrep_tcp test cases, the resources are not released until the zmq_ctx_term call. When calling something that involves calling process_commands, they are released earlier (which would be calling zmq_getsockopt for ZMQ_EVENTS, e.g.). I remember that there were some other issues raised in the past that were related to commands not being processed on an unused socket.

However, I think in your example there are periodically calls to zmq_* functions that also call process_commands, so this does not seem to be the issue in your case :(

WallStProg · 2018-08-06T15:33:24Z

Thanks @sigiesec

I've already tried the ZMQ_EVENTS hack as mentioned in #1256 -- no joy. I've got a couple other (ineffective) work-arounds as well, which I'll commit to the repo once I've cleared up some of the #ifdef-mess (prob. by making them command-line params).

I'm guessing some kind of race condition, perhaps associated with the endpoint being moved to inactive status, and/or the other side of the connection being dropped while the disconnect is "in flight".

At this point, I think my best option is to trace through the zmq_connect/zmq_disconnect code in the debugger, but any hints on what to look for would be welcome! (It's a lot to learn, and complicated by the fact that the "tricky bits" are happening on the IO thread, I think).

teterycz · 2018-08-06T17:31:42Z

I added a dummy send in the "else if (msg.command == 'D')" block which seems to force the cleanup.

`else if (msg.command == 'D')
{
// disconnect msg from peer (possibly us)

...

// send dummy message to PUB socket to force cleanup
rc = zmq_send(dataPub, NULL, 0, ZMQ_DONTWAIT);
assert(rc == 0);
}
`

Top Memory-Consuming Functions
Function	Memory Consumption	Allocation/Deallocation Delta	Allocations	Module
zmq::shared_message_memory_allocator::allocate	22 MB	88 KB	1294	libzmq.so.5.1.3
zmq::yqueue_t<zmq::msg_t, (int)256>::allocate_chunk	22 MB	240 KB	1411	libzmq.so.5.1.3
zmq::encoder_base_tzmq::v2_encoder_t::encoder_base_t	5 MB	32 KB	678	libzmq.so.5.1.3
zmq::stream_engine_t::handshake	778 KB	4 KB	2034	libzmq.so.5.1.3
zmq::session_base_t::create	662 KB	4 KB	678	libzmq.so.5.1.3
[Others]	4 MB	14 KB	51045

WallStProg · 2018-08-23T14:20:22Z

Well, I think we've figured out the problem, and the solution is at the same time simple, but potentially pretty complicated.

The root cause of the leak appears to be that when we disconnect the dataSub socket from the dataPub socket, somebody needs to trigger process_commands to run on the dataPub socket.

The latest peer.cpp calls zmq_poll(..., ZMQ_POLLIN | ZMQ_POLLOUT) on the dataPub socket after dataSub disconnects from it. With that in place, there are no leaks, since the dataPub socket gets an opportunity to clean up the disconnected endpoint(s).

We've also implemented a similar hack in our internal test program that triggered this discussion -- that program simply subscribes to a single topic and throws away messages. That code only starts to leak when other subscribers repeatedly connect and disconnect, and this is because the program only subscribes, but never publishes data.

The hack that fixes the leak is to publish a dummy message with a dummy topic (with no subscribers) every "n" messages. That gives the dataPub socket an opportunity to clean up disconnected endpoints, and as with the peer.cpp test program that eliminates the "leak".

There are some problems with this work-around, however:

In the peer.cpp test program, both the dataSub and dataPub sockets are in the same process, so we know when we need to call zmq_poll. In the real world that is not the case -- the dataSub is disconnected from dataPub sockets in different processes, and the remote process has no way of knowing that another process has disconnected from it.
There appears to be some kind of race condition between the zmq_disconnect/zmq_poll calls and zmq_close that can cause a SEGV -- this was previously reported as crash in send_command following zmq_disconnect #3152. We haven't pursued this isssue vigorously up to this point, since removing the zmq_disconnect/zmq_poll calls appeared to eliminate the crash, and they were superfluous because we were just going to call zmq_close on the socket anyway. However, if we need to reinstate the zmq_disconnect calls in order to release resources, then we're concerned about triggering this race/crash.
Last but not least, sending a dummy message to trigger process_commands can also backfire -- in other tests we've done using that approach, there appears to also be a race between the zmq_send and a subsequent zmq_close that can cause zmq_ctx_term to hang intermittently.

I would very much appreciate any suggestions that anyone has on whether there is an offical/canonical approach to dealing with the problem of triggering process_commands on sockets that have no activity other than connect/disconnect.

IMHO this would ideally entail the ability to return from a zmq_poll call whenever there is any connect/disconnect activity on a socket, and not just on arrival of a message. This would also provide a mechanism to solve other intractable problems, including #2267.

Thanks in advance for your ideas!

stale · 2019-09-13T06:49:38Z

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 56 days. Thank you for your contributions.

sigiesec added the Symptom (Leak) label Jul 18, 2018

WallStProg added a commit to WallStProg/zmqtests that referenced this issue Jul 31, 2018

- test programs for zeromq/libzmq#3186

a73c9f3

WallStProg added a commit to WallStProg/zmqtests that referenced this issue Jul 31, 2018

- test programs for zeromq/libzmq#3186

ce9faa9

WallStProg changed the title ~~zmq_poller_add leaks?~~ zmq_disconnect doesn't release resources associated with terminated endpoints Aug 6, 2018

sigiesec added the Area (Runtime / Usage) label Sep 13, 2018

WallStProg mentioned this issue Sep 15, 2018

use-after-free reported in zmq::pipe_t::terminate #3245

Closed

stale bot added the stale label Sep 13, 2019

stale bot closed this as completed Nov 8, 2019

bill-torpey mentioned this issue Mar 2, 2022

When pub/sub establishes a link, the memory on the bind side grows #4352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zmq_disconnect doesn't release resources associated with terminated endpoints #3186

zmq_disconnect doesn't release resources associated with terminated endpoints #3186

WallStProg commented Jul 17, 2018

WallStProg commented Jul 17, 2018

sigiesec commented Jul 18, 2018 •

edited

Loading

WallStProg commented Jul 18, 2018

sigiesec commented Jul 19, 2018

WallStProg commented Jul 31, 2018

WallStProg commented Aug 1, 2018

WallStProg commented Aug 1, 2018

teterycz commented Aug 2, 2018

WallStProg commented Aug 2, 2018

teterycz commented Aug 3, 2018

sigiesec commented Aug 6, 2018

sigiesec commented Aug 6, 2018

WallStProg commented Aug 6, 2018

sigiesec commented Aug 6, 2018

WallStProg commented Aug 6, 2018 •

edited

Loading

teterycz commented Aug 6, 2018

WallStProg commented Aug 23, 2018

stale bot commented Sep 13, 2019

zmq_disconnect doesn't release resources associated with terminated endpoints #3186

zmq_disconnect doesn't release resources associated with terminated endpoints #3186

Comments

WallStProg commented Jul 17, 2018

Issue description

Environment

WallStProg commented Jul 17, 2018

sigiesec commented Jul 18, 2018 • edited Loading

WallStProg commented Jul 18, 2018

sigiesec commented Jul 19, 2018

WallStProg commented Jul 31, 2018

WallStProg commented Aug 1, 2018

WallStProg commented Aug 1, 2018

teterycz commented Aug 2, 2018

WallStProg commented Aug 2, 2018

teterycz commented Aug 3, 2018

sigiesec commented Aug 6, 2018

sigiesec commented Aug 6, 2018

WallStProg commented Aug 6, 2018

sigiesec commented Aug 6, 2018

WallStProg commented Aug 6, 2018 • edited Loading

teterycz commented Aug 6, 2018

WallStProg commented Aug 23, 2018

stale bot commented Sep 13, 2019

sigiesec commented Jul 18, 2018 •

edited

Loading

WallStProg commented Aug 6, 2018 •

edited

Loading