Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: close DBReadOnly after flushing #1856

Merged
merged 1 commit into from
Dec 10, 2019
Merged

Conversation

blockloop
Copy link
Contributor

@blockloop blockloop commented Dec 9, 2019

We are running thanos receive in production with 1w retention. When the receiver starts up it consumes about 20% memory on each of our 32 nodes. After the initial WAL flush the memory footprint progressively grows over time. I originally reported this in CNCF Slack. I initially tried to delete the WAL and restart the nodes but that didn’t solve the issue. The only solution was to clear out some TSDB blocks

Potentially fixes #1855

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Close the DBReadOnly connection after FlushableStorage.Flush()

Verification

I’ve not been able to verify this just yet. I’ll try to build the image with this patch and see if I see an improvement.

@blockloop blockloop force-pushed the master branch 2 times, most recently from ee73b96 to 1543aa2 Compare December 9, 2019 23:16
Signed-off-by: Brett Jones <blockloop@users.noreply.github.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! While this makes total sense, I doubt this is the potential source of the memory leak, unless you change hashring dynamically and very often.

cc @squat tomorrow (:

@bwplotka bwplotka merged commit 352cc30 into thanos-io:master Dec 10, 2019
@blockloop
Copy link
Contributor Author

Interesting you mention that. We have a watcher that monitors our nodes and updates the hashring.json file. I noticed some changes it picked up a few times. Is it bad for the hashring to update frequently? I have it do a list | sort on the available nodes and write to the hashring.json file.

@squat
Copy link
Member

squat commented Dec 10, 2019

nice :)

No, it should absolutely not be a problem to update the file frequently. The thanos receive config watcher checks the hash of the file to check it the content has actually changed; if not, then no update is triggered and no cost is incurred.

On the other hand, if the file is actually changing very frequently, then you will be triggering lots of TSDB flushes, which could be costly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Receive: Possible memory leak with TSDB growth
3 participants