Kapacitor leaks sockets when using barrier nodes with delete #2144

pdvyas · 2019-01-07T23:09:34Z

Hi,

kapacitor version: 1.5.2
distribution: Ubuntu xenial 16.04.3
installed from official prebuilt deb downloaded from influxdata website.

We upgraded to kapacitor 1.5.2 to take advantage of the delete feature of barrier nodes as our stream kapacitor tasks (mostly rollups) deal with ephemeral series for which memory was previously not released.

We have two sets of kapacitor nodes

type1: few rollup tick scripts (high cardinality data)
type2: Around 228 tick scripts with a mix of alerts and rollups

After upgrading to version 1.5.2. We added barrier nodes with delete before all window and union nodes in our tick scripts.

With type1, we have had only two instances of kapacitor being unresponsive (tasks don't proceed and kapacitor api is unresponsive; only way to resolve is to restart kapacitor). We've been running with this change for ~2 weeks and have had good uptime and it does release memory as expected.

With type2, we run into "unresponsive" kapacitor quite regularly (~3hrs). When kapacitor gets unresponsive, we observed that the machine had 64K tcp connections to our influxdb cluster in CLOSED_WAIT and exhausted all file descriptors (the systemd unit raises it to 64K). This does not happen when barrier nodes are added without the delete option.

Attached is a stacktrace obtained with SIGQUIT when the kapacitor processs was hung. Also attaching graphs of growth in CLOSED_WAIT in comparision to a sibling kapacitor machine without barrier nodes and a sample tick script.

stacktrace.txt
rollup-infra-cassandra.tick.txt

The text was updated successfully, but these errors were encountered:

jsternberg mentioned this issue Mar 27, 2019

fix: eliminate deadlocks in the barrier node when delete is used #2181

Merged

jsternberg closed this as completed in #2181 Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kapacitor leaks sockets when using barrier nodes with delete #2144

Kapacitor leaks sockets when using barrier nodes with delete #2144

pdvyas commented Jan 7, 2019

Kapacitor leaks sockets when using barrier nodes with delete #2144

Kapacitor leaks sockets when using barrier nodes with delete #2144

Comments

pdvyas commented Jan 7, 2019