-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiver: Use intern
package when reallocating label strings
#5926
Conversation
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
This looks quite cool and also simpler than the first implementation. It's interesting that benchmarks show no real difference :) |
This is awesome, @matej-g! 😮 🙇 🚀 @fpetkovski the benchmarking code manually calls the GC in between calls to the benchmarked function. I suspect this triggers a cleanup of the interned strings as their only refs go away. |
@fpetkovski @douglascamata Honestly I'm not sure, this is one thing I still did not manage to figure fully out. My assumption is that we're still allocating each label string but eventually we're keeping one copy of the string around for long time and garbage collecting the others. |
@matej-g your statement seems correct to me. We keep a single copy of the interned strings. But the actual strings aren't kept forever in memory forever and will be garbage collected if they have no more references (see https://github.com/go4org/intern/blob/main/intern.go#L141). So the GC run in between benchmark iterations is likely cleaning things up. You could confirm it by running the code being benchmarked twice for each benchmark iteration. Comparing to running twice without interning, we should be able to see the memory difference. |
This looks really cool! 🌟 For the benchmarks, I don't think we manually call GC in the benchmark, but only manually call it after all the benchmark test cases here. But you'd have auto GC cycles running during benchmark anyway! You can try running the benchmark with |
@saswatamcode internal code in the benchmark package is calling the GC, not our code: https://github.com/golang/go/blob/master/src/testing/benchmark.go#L187 |
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I took a deeper look and would like to try getting this in. Perhaps this could be behind a feature flag in Thanos Receive?
@GiedriusS I'm fine with that, sorry for the delays and leaving this unfinished, if this looks good I can sanitise this PR and un-draft it |
Yeah, I think let's put this under a feature flag and merge this so others could try it out as well. Thanks for your work 🙇 |
Signed-off-by: Matej Gera <matej.gera@coralogix.com>
Signed-off-by: Matej Gera <matej.gera@coralogix.com>
Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Looks like l forgot to undraft, this should now be ready for final approvals, thanks cc @GiedriusS @douglascamata @saswatamcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot, @matej-g 🙇
Can't wait to try this one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
- Fix wrong condition - Adjust benchmarks Signed-off-by: Matej Gera <matej.gera@coralogix.com>
f63fba3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎖️
* Update Thanos engine to latest version (thanos-io#6069) This commit updates the Thanos PromQL engine to the latest version. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Receive: Tenants' external labels proposal (thanos-io#5720) * Receive external labels proposal Signed-off-by: haanhvu <haanh6594@gmail.com> * Restructure and edit proposal's content Signed-off-by: haanhvu <haanh6594@gmail.com> * Update proposal Signed-off-by: haanhvu <haanh6594@gmail.com> * Fix doc error Signed-off-by: haanhvu <haanh6594@gmail.com> Signed-off-by: haanhvu <haanh6594@gmail.com> * fixing doc CI (thanos-io#6072) Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Ben Ye <benye@amazon.com> * Fix stores filtering resets on reload (thanos-io#6063) * Fix stores filtering resets on reload `g0.store_matches` parameter appears in the url but doesn't applies in the frontend. Looks like it has been done on purpose and by removing a small piece of code fixes this issue. variable named `debugMode` is used for the store filtering checkbox which is an unappropriate name. Using `enableStoreFiltering` variable to represent the state of checkbox. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Regenerate bindata.go Signed-off-by: Pradyumna Krishna <git@onpy.in> Signed-off-by: Pradyumna Krishna <git@onpy.in> * Store: Make initial sync more robust Added re-try mechanism for store inital sync, where if the initial sync fails, it tries to do the initial sync again for given timeout duration. Signed-off-by: Kartik-Garg <kartik.garg@infracloud.io> * Recover from panics in Series calls (thanos-io#6077) * Recover from panics in Series calls This commit adds panic recovery for Series calls in all Store servers. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Apply error suggestion Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * query: reuse our own gate (thanos-io#6079) Do not call promgate directly but rather use our own wrapper that does everything we want - duration histogram, current in-flight calls, total calls. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Store: Support disable cache index header file. (thanos-io#5773) * Store: Support disable cache index header file. Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * Store: add a seprate flag to disable caching index header file Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * Tools: add cleanup API for bucket web Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * resolve conversation Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * resolve confilcts Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * change the flag to `--cache-index-header` Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * Wrap mem writer in file writer Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * update CHANGELOG Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * update CHANGELOG Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> * fix bug Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> --------- Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> Co-authored-by: wanjunlei <wanjunlei@yujnify.com> * CVE: Fix Receiver malicious tenant (thanos-io#5969) If running as root or with enough privileges, receiver can create a directory outside of the configured TenantHeader. This commit fixes it up by sanitizing the user input and explicity not allowing such behavior. Signed-off-by: Daniel Mellado <dmellado@redhat.com> * Add adopter Grupo MasMovil (thanos-io#6084) Signed-off-by: Pablo Moncada Isla <pablo.moncada@masmovil.com> * fix typo (thanos-io#6087) Signed-off-by: cyip <cyip@jackhenry.com> Co-authored-by: cyip <cyip@jackhenry.com> * optimize selector to string (thanos-io#6076) Signed-off-by: Kama Huang <kamatogo13@gmail.com> * Fix: Failure to close BlockSeriesClient cause store-gateway deadlock (thanos-io#6086) * Fix: Failure to close BlockSeriesClient cause store-gateway deadlock Signed-off-by: Alan Protasio <alanprot@gmail.com> * Adding tests Signed-off-by: Alan Protasio <alanprot@gmail.com> * reverting the change on get series Signed-off-by: Alan Protasio <alanprot@gmail.com> * fix lint Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> * Cut 0.30.2 (thanos-io#6081) * tracing: fixed panic because of nil sampler (thanos-io#6066) * fixed panic because of nil sampler Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> * added CHANGELOG entry Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> * bump version to 0.30.2 Signed-off-by: Ben Ye <benye@amazon.com> * Updates busybox SHA (thanos-io#6046) Signed-off-by: GitHub <noreply@github.com> Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * Use `e2edb.NewMinio` to disable SSE-S3 in e2e tests (thanos-io#6055) * Use e2edb.NewMinio to disable SSE Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Use temp fork for TLS Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix broken rules api fanout test Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix broken query compatibility test Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Remove fork Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> * cherry pick store gateway fix to release 0.30 (thanos-io#6089) * Fix: Failure to close BlockSeriesClient cause store-gateway deadlock (thanos-io#6086) * Fix: Failure to close BlockSeriesClient cause store-gateway deadlock Signed-off-by: Alan Protasio <alanprot@gmail.com> * Adding tests Signed-off-by: Alan Protasio <alanprot@gmail.com> * reverting the change on get series Signed-off-by: Alan Protasio <alanprot@gmail.com> * fix lint Signed-off-by: Alan Protasio <alanprot@gmail.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> * update changelog Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Co-authored-by: Alan Protasio <alanprot@gmail.com> * fix changelog entries Signed-off-by: Ben Ye <benye@amazon.com> * docs: improving the description for tsdb.retention on the receiver Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> * Receiver: Use `intern` package when reallocating label strings (thanos-io#5926) * Cleanup go mod Signed-off-by: Matej Gera <matejgera@gmail.com> * Use string interning for labels realloc method Signed-off-by: Matej Gera <matejgera@gmail.com> * Enhance label realloc benchmarks Signed-off-by: Matej Gera <matejgera@gmail.com> * Make interning optional; put behind hiddend flag Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Update CHANGELOG Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Address feedback - Fix wrong condition - Adjust benchmarks Signed-off-by: Matej Gera <matej.gera@coralogix.com> --------- Signed-off-by: Matej Gera <matejgera@gmail.com> Signed-off-by: Matej Gera <matej.gera@coralogix.com> Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com> * Updaing README with drawing fixes and minor wording clarification (thanos-io#6078) * New drawing and wording for Thanos other deployment models Signed-off-by: Jonah Kowall <jkowall@kowall.net> * New drawing and wording for Thanos other deployment models Signed-off-by: Jonah Kowall <jkowall@kowall.net> * Added comments to README.md and updated the quick-tutorial.md with the same diagram updates and text to match Signed-off-by: Jonah Kowall <jkowall@kowall.net> * Ran make docs Signed-off-by: Jonah Kowall <jkowall@kowall.net> --------- Signed-off-by: Jonah Kowall <jkowall@kowall.net> * Compact: Remove spam of replica label removed log (thanos-io#6088) * Remove spam of replica label removed log Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reduce amount of removed replica label instead of removing it Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Reformat code Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Store: Don't error when no stores are matched (thanos-io#6082) It's normal and not an error if a query does not match due to no downstream stores. This is common when querying with external labels and tiered query servers. This bug was introduced in thanos-io#5296 Fixes: thanos-io#5862 Signed-off-by: SuperQ <superq@gmail.com> * docs: Fix must have Ruler alerts definition (thanos-io#6058) * Fix must have Ruler alerts definition ThanosRuler missing rule intervals metric used the wrong comparator sign, confusing users trying to create the rule. Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com> * Update docs/components/rule.md Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com> --------- Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix conflicts Signed-off-by: haanhvu <haanh6594@gmail.com> * Specify overwriting behavior in flag and add validation Signed-off-by: haanhvu <haanh6594@gmail.com> * Add log and doc Signed-off-by: haanhvu <haanh6594@gmail.com> * Mixins(Rule): Fix query for long rule evaluations (thanos-io#6103) * mixin(Rule): Fix query for long rule evaluations Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: haanhvu <haanh6594@gmail.com> Signed-off-by: Ben Ye <benye@amazon.com> Signed-off-by: Pradyumna Krishna <git@onpy.in> Signed-off-by: Kartik-Garg <kartik.garg@infracloud.io> Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: wanjunlei <wanjunlei@kubesphere.io> Signed-off-by: Daniel Mellado <dmellado@redhat.com> Signed-off-by: Pablo Moncada Isla <pablo.moncada@masmovil.com> Signed-off-by: cyip <cyip@jackhenry.com> Signed-off-by: Kama Huang <kamatogo13@gmail.com> Signed-off-by: Alan Protasio <alanprot@gmail.com> Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com> Signed-off-by: Matej Gera <matejgera@gmail.com> Signed-off-by: Matej Gera <matej.gera@coralogix.com> Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com> Signed-off-by: Jonah Kowall <jkowall@kowall.net> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: SuperQ <superq@gmail.com> Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com> Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Co-authored-by: Ha Anh Vu <75315486+haanhvu@users.noreply.github.com> Co-authored-by: Ben Ye <benye@amazon.com> Co-authored-by: Pradyumna Krishna <git@onpy.in> Co-authored-by: Kartik-Garg <kartik.garg@infracloud.io> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: wanjunlei <53003665+wanjunlei@users.noreply.github.com> Co-authored-by: wanjunlei <wanjunlei@yujnify.com> Co-authored-by: Daniel Mellado <1313475+danielmellado@users.noreply.github.com> Co-authored-by: Pablo Moncada <pmoncadaisla@gmail.com> Co-authored-by: Chantel Yip <52993239+sshantel@users.noreply.github.com> Co-authored-by: cyip <cyip@jackhenry.com> Co-authored-by: Kama Huang <121007071+kama910@users.noreply.github.com> Co-authored-by: Alan Protasio <alanprot@gmail.com> Co-authored-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Victor Fernandes <victorhbfernandes@gmail.com> Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com> Co-authored-by: Jonah Kowall <jkowall@kowall.net> Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Maxim Muzafarov <m.muzafarov@gmail.com> Co-authored-by: haanhvu <haanh6594@gmail.com>
…s-io#5926) * Cleanup go mod Signed-off-by: Matej Gera <matejgera@gmail.com> * Use string interning for labels realloc method Signed-off-by: Matej Gera <matejgera@gmail.com> * Enhance label realloc benchmarks Signed-off-by: Matej Gera <matejgera@gmail.com> * Make interning optional; put behind hiddend flag Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Update CHANGELOG Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Address feedback - Fix wrong condition - Adjust benchmarks Signed-off-by: Matej Gera <matej.gera@coralogix.com> --------- Signed-off-by: Matej Gera <matejgera@gmail.com> Signed-off-by: Matej Gera <matej.gera@coralogix.com> Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
…s-io#5926) * Cleanup go mod Signed-off-by: Matej Gera <matejgera@gmail.com> * Use string interning for labels realloc method Signed-off-by: Matej Gera <matejgera@gmail.com> * Enhance label realloc benchmarks Signed-off-by: Matej Gera <matejgera@gmail.com> * Make interning optional; put behind hiddend flag Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Update CHANGELOG Signed-off-by: Matej Gera <matej.gera@coralogix.com> * Address feedback - Fix wrong condition - Adjust benchmarks Signed-off-by: Matej Gera <matej.gera@coralogix.com> --------- Signed-off-by: Matej Gera <matejgera@gmail.com> Signed-off-by: Matej Gera <matej.gera@coralogix.com> Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Changes
This is an alternative to #5899 (please see that PR for rational for this change) but uses the go4org/intern instead of our homebrewed method. The advantage of this is that the
go4org/intern
package is capable of dropping references to unused string automatically, so we don't have to worry about cleaning up references manually (by e.g. counting usage, dropping references periodically etc.) or worry about keeping stale references, causing a memory build-up over longer time.The downside is that the package violates safe practices when it comes to operating with pointers. However, the package takes precautions to prevent problems with the program (for full details see: https://mdlayher.com/blog/unsafe-string-interning-in-go/). Moreover, the package is written and maintained by some of the developers that are close to the upstream Go community, who ensure the quality of the code is up to Go's standards. The package is considered mature and stable as per project's description. Considering all of this, I believe it is safe to use the package in Thanos as well.
If people deem this still too risky we can consider our own safe implementation via #5899.
Verification
We (cc @philipgough )have run this in a synthetic load test with
k6
load testing tools on a cluster of 6 receivers with replication factor 3. Overall, we could confirm some gains with this change. We clearly keep fewer objects in heap as expected (average by pod for each respective version, graph on the right). On the actual heap size and working set metrics, we could confirm improvement of at least 1-2 GB average per receiver replica (values are in GB, green is latestmain
, yellow ismain
+ PR changes):On local benchmark levels, both label and handler benchmarks seem to remain on tolerable levels (with the exception of extremely large label number i.e. from 100000). This is expected as the gain should come from keeping fewer objects in memory for long term rather then fewer allocations.