Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared interner: fix leak of interned strings from scrape cache #21

Merged
merged 2 commits into from
Jun 8, 2021

Conversation

rfratto
Copy link
Member

@rfratto rfratto commented Jun 5, 2021

Ensures that interning and releasing strings are done in sync with the cache being manipulated to prevent duplicate interned references from being created.

Adds a metric to track how many interned strings exist to detect leaks. Also adds tests and a benchmark to keep track of this more easily.

name              old time/op    new time/op    delta
StringInterner-8     4.23s ± 0%     2.39s ± 0%  -43.51%

name              old alloc/op   new alloc/op   delta
StringInterner-8     421MB ± 0%     418MB ± 0%   -0.73%

name              old allocs/op  new allocs/op  delta
StringInterner-8     4.41M ± 0%     4.41M ± 0%   -0.00%

@rfratto rfratto requested review from mapno and mattdurham and removed request for mapno June 5, 2021 02:07
scrape/scrape.go Outdated
@@ -883,7 +883,8 @@ func (c *scrapeCache) iterDone(flushCache bool) {
c.seriesPrev, c.seriesCur = c.seriesCur, c.seriesPrev

// We have to delete every single key in the map.
for k := range c.seriesCur {
for k, lset := range c.seriesCur {
intern.ReleaseLabels(intern.Global, lset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right. This is releasing labels for series even before they go stale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some tests (which I'll push soon). This isn't "wrong," since there's three refs added to the interned string:

  1. The series cache ref for a new series
  2. The staleness ref for a new series
  3. The staleness ref for an existing series

This will release the staleness ref correctly, though it spends a lot of unnecessary CPU time. Instead of doing this, the staleness ref should only be added for a new series and it should be removed when that series goes stale.

Copy link
Member Author

@rfratto rfratto Jun 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result of my benchmarks, scraping a target with 1,000,000 series 50 times:

name \ time/op    none.txt    orig.txt    old.txt     one-ref.txt
StringInterner-8  2.13s ± 0%  4.23s ± 0%  6.03s ± 0%   2.39s ± 0%

name \ alloc/op   none.txt    orig.txt    old.txt     one-ref.txt
StringInterner-8  396MB ± 0%  421MB ± 0%  397MB ± 0%   418MB ± 0%

name \ allocs/op  none.txt    orig.txt    old.txt     one-ref.txt
StringInterner-8  4.18M ± 0%  4.41M ± 0%  4.19M ± 0%   4.41M ± 0%

where:

  • none.txt: no interning
  • orig.txt: behavior of current release
  • old.txt: behavior of the original status of this PR
  • one-ref.txt: fixed PR that only tracks one reference per series and only interns when the series is added.

The newest approach is the best tradeoff between cpu/memory usage. I'll push a set of commits now.

Copy link

@mattdurham mattdurham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rfratto rfratto merged commit 7b78de4 into release-2.27.0-grafana Jun 8, 2021
@rfratto rfratto deleted the fix-interner-leak branch June 8, 2021 19:36
rfratto added a commit to rfratto/agent that referenced this pull request Jun 8, 2021
rfratto added a commit to grafana/agent that referenced this pull request Jun 8, 2021
* update prometheus dependency to include grafana/prometheus#21

* grant operator test more time to run in CI
mattdurham pushed a commit to grafana/agent that referenced this pull request Nov 11, 2021
* update prometheus dependency to include grafana/prometheus#21

* grant operator test more time to run in CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants