-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfault in LabelValues during head compaction #6271
Merged
saswatamcode
merged 3 commits into
thanos-io:main
from
fpetkovski:fix-label-values-segfault
Apr 12, 2023
Merged
Fix segfault in LabelValues during head compaction #6271
saswatamcode
merged 3 commits into
thanos-io:main
from
fpetkovski:fix-label-values-segfault
Apr 12, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
saswatamcode
approved these changes
Apr 12, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
GiedriusS
approved these changes
Apr 12, 2023
Docs check fails due to some broken links, fixed above |
Thanks, should I rebase or we will fix them on main? |
Let's rebase! |
Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data. This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
fpetkovski
force-pushed
the
fix-label-values-segfault
branch
from
April 12, 2023 10:02
b46fdc9
to
6236c53
Compare
rabenhorst
added a commit
to rabenhorst/thanos
that referenced
this pull request
May 4, 2023
* mixins: Add code/grpc-code dimension to error widgets Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix messed up merge conflict resolution Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Readd empty line at the end of changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * mixin(Rule): Add rule evaluation failures to the Rule dashboard (thanos-io#6244) * Improve Thanos Rule dashboard legends Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add evaluations failed to Rule dashboard Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor rule dashboard Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add changelog entry Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Rerun CI Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * added thanos logo in react app (thanos-io#6264) Signed-off-by: hackeramitkumar <amit9116260192@gmail.com> * Add an experimental flag to block samples with timestamp too far in the future (thanos-io#6195) * Add an experimental flag to block samples with timestamp too far in the future Signed-off-by: Yi Jin <yi.jin@databricks.com> * fix bug Signed-off-by: Yi Jin <yi.jin@databricks.com> * address comments Signed-off-by: Yi Jin <yi.jin@databricks.com> * fix docs CI errors Signed-off-by: Yi Jin <yi.jin@databricks.com> * resolve merge conflicts Signed-off-by: Yi Jin <yi.jin@databricks.com> * resolve merge conflicts Signed-off-by: Yi Jin <yi.jin@databricks.com> * retrigger checks Signed-off-by: Yi Jin <yi.jin@databricks.com> --------- Signed-off-by: Yi Jin <yi.jin@databricks.com> * store/bucket: snappy-encoded postings reading improvements (thanos-io#6245) * store: pool input to snappy.Decode Pool input to snappy.Decode to avoid allocations. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: use s2 for decoding snappy It's faster hence use it. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: small code style adjustment Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: call closefns before returning err Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store/postings_codec: return both if possible Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store/bucket: always call close fns Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * truncateExtLabels support Unicode cut (thanos-io#6267) * truncateExtLabels support Unicode cut Signed-off-by: mickeyzzc <mickey_zzc@163.com> * update TestTruncateExtLabels and pass test Signed-off-by: mickeyzzc <mickey_zzc@163.com> --------- Signed-off-by: mickeyzzc <mickey_zzc@163.com> * Update mentorship links Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix segfault in LabelValues during head compaction (thanos-io#6271) * Fix segfault in LabelValues during head compaction Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data. This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add changelog entry Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Assert no error when querying labels Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Mixin: Allow specifying an instance name filter (thanos-io#6273) This commit allow specifying the instance name filter, in order to filter the datasources shown on the dashboards. For example, when generating the dashboards one can do the following (i.e in config.libsonnet) ``` dashboard+:: { prefix: 'Thanos / ', ... instance_name_filter: '/EU.*/' ``` Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> * Adds Deno to adopters.yml (thanos-io#6275) Signed-off-by: Will (Newby) Atlas <will@deno.com> * Bump `make test` timeout (thanos-io#6276) Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * fix 0.31 changelog (thanos-io#6278) Signed-off-by: junot <junotxiang@kubesphere.io> * Query: Switch Multiple Engines (thanos-io#6234) * Query: Switch engines using `engine` param Thanos query has two engine, prometheus (default) and thanos. A single engine runs through thanos query command at a time, and have to re run the command to switch between. This commit adds a functionality to run multiple engines at once and switch between them using `engine` query param inq query api. To avoid duplicate matrics registration, the thanos engine is provided with a different registerer having prefix `tpe_` (not been finalized yet). promql-engine command line flag has been removed that specifies the query engine to run. Currently this functionality not implemented on GRPCAPI. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add multiple engine support to GRPCAPI Fix build fail for thanos, adds support for multiple engine in GRPCAPI. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Create QueryEngineFactory to create engines QueryEngineFactory makes a collection for all promql engines used by thanos and returns it. Any engine can be created and returned using `GetXEngine` method. It is currently limited to 2 engines prometheus and thanos engines that get created on the first call. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Use QueryEngineFactory in query API thanos query commands pass `QueryEngineFactory` to query apis that will use engine based on query params. It will provide more flexibility to create multiple engines in thanos. Adds `defaultEngine` CLI flag, A default engine to use if not specified with query params. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Update Query API tests Fixes breaking tests Signed-off-by: Pradyumna Krishna <git@onpy.in> * Minor changes and Docs fixes * Move defaultEngine argument to reduce diff. * Generated Docs. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add Engine Selector/ Dropdown to Query UI Engine Selector is a dropdown that sets an engine to be used to run the query. Currently two engines `thanos` and `prometheus`. This dropdown sends a query param `engine` to query api, which runs the api using the engine provided. Provided to run query using multiple query engines from Query UI. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Move Engine Selector to Panel Removes Dropdown component, and renders Engine Selector directly. Receives defaultEngine from `flags` API. Updates parseOptions to parse engine query param and updates test for Panel and utils. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Upgrade promql-engine dependency Updates promql-engine that brings functionality to provide fallback engine using enigne Opts. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Add MinT to remote client MinT method was missing from Client due to updated promql-engine. This commits adds mint to the remote client. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Use prometheus fallback engine in thanos engine Thanos engine creates a fallback prometheus engine that conflicts with another prometheus engine created by thanos, while registering metrics. To fix this, provided created thanos engine as fallback engine to thanos engine in engine Opts. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Use enum for EngineType in GRPC GRPC is used for communication between thanos components and defaultEngine was a string before. Enum makes more sense, and hence the request.Enigne type has been changed to querypb.EngineType. Default case is handled with another default value provided in the enum. Signed-off-by: Pradyumna Krishna <git@onpy.in> * Update query UI bindata.go Compile react app using `make assets`. Signed-off-by: Pradyumna Krishna <git@onpy.in> --------- Signed-off-by: Pradyumna Krishna <git@onpy.in> * docs: mismatch in changelog Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com> * Updates busybox SHA (thanos-io#6283) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> * Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346 (thanos-io#6287) * Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346 Signed-off-by: Alex Le <leqiyue@amazon.com> * Reverted test code Signed-off-by: Alex Le <leqiyue@amazon.com> * Updated comment Signed-off-by: Alex Le <leqiyue@amazon.com> * docs: mismatch in changelog Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Updates busybox SHA (thanos-io#6283) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * trigger workflow Signed-off-by: Alex Le <leqiyue@amazon.com> * trigger workflow Signed-off-by: Alex Le <leqiyue@amazon.com> --------- Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com> Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Etienne Martel <etienne.martel.7@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> * Add CarTrade Tech as new adopter Signed-off-by: naveadkazi <navead@carwale.com> * tests: Remove custom Between test matcher (thanos-io#6310) * Remove custom Between test matcher The upstream PR to efficientgo/e2e has been merged, so we can use it from there. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Run go mod tidy Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * query frontend, query UI: Native histogram support (thanos-io#6071) * Implemented native histogram support for qfe and query UI Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed marshalling for histograms in qfe Started working on native histogram query ui Copied histogram implementation for graph Added query range support for native histograms in qfe Use prom model (un-)marshal for native histograms in qfe Use prom model (un-)marshal for native histograms in qfe Fixed sample and sample stream marshal fn Extended qfe native histogram e2e tests Added copyright to qfe queryrange compat Added query range test fo histograms and try to fix ui tests Fixed DataTable test Review feedback Fixed native histogram e2e test Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Add histogram support for ApplyCounterResetsSeriesIterator Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Made assets Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Add chnagelog Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed changelog Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed qfe Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Fixed PrometheusResponse minTime for histograms in qfe Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Updated prometheus common to v0.40.0 and queryrange.Sample fixes Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Updated Readme Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Addressed PR comments Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> trigger tests Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Made assets Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Made assets Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * fixed tsdbutil references Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * fixed imports Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Enabled pushdown for query native hist test and removed ToDo Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * Refactored native histogram query UI Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> --------- Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> * store: add streamed snappy encoding for postings list (thanos-io#6303) * store: add streamed snappy encoding for postings list We've noticed that decoding Snappy compressed postings list takes a lot of RAM: ``` (pprof) top Showing nodes accounting for 1427.30GB, 67.55% of 2112.82GB total Dropped 1069 nodes (cum <= 10.56GB) Showing top 10 nodes out of 82 flat flat% sum% cum cum% 0 0% 0% 1905.67GB 90.20% golang.org/x/sync/errgroup.(*Group).Go.func1 2.08GB 0.098% 0.098% 1456.94GB 68.96% github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings 1.64GB 0.078% 0.18% 1454.87GB 68.86% github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings 2.31GB 0.11% 0.29% 1258.15GB 59.55% github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).fetchPostings 1.48GB 0.07% 0.36% 1219.67GB 57.73% github.com/thanos-io/thanos/pkg/store.diffVarintSnappyDecode 1215.21GB 57.52% 57.87% 1215.21GB 57.52% github.com/klauspost/compress/s2.Decode ``` This is because we are creating a new []byte slice for the decoded data each time. To avoid this RAM usage problem, let's stream the decoding from a given buffer. Since Snappy block format doesn't support streamed decoding, let's switch to Snappy stream format which is made for exactly that. Notice that our current `index.Postings` list does not support going back through Seek() even if theoretically one could want something like that. Fortunately, to search for posting intersection, we need to only go forward. Benchmark data: ``` name time/op PostingsEncodingDecoding/10000/raw/encode-16 71.6µs ± 3% PostingsEncodingDecoding/10000/raw/decode-16 76.3ns ± 4% PostingsEncodingDecoding/10000#01/snappy/encode-16 73.3µs ± 1% PostingsEncodingDecoding/10000#01/snappy/decode-16 1.63µs ± 6% PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16 111µs ± 2% PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16 14.5µs ± 7% PostingsEncodingDecoding/100000/snappyStreamed/encode-16 1.09ms ± 2% PostingsEncodingDecoding/100000/snappyStreamed/decode-16 14.4µs ± 4% PostingsEncodingDecoding/100000#01/raw/encode-16 710µs ± 1% PostingsEncodingDecoding/100000#01/raw/decode-16 79.3ns ±13% PostingsEncodingDecoding/100000#02/snappy/encode-16 719µs ± 1% PostingsEncodingDecoding/100000#02/snappy/decode-16 13.5µs ± 4% PostingsEncodingDecoding/1000000/raw/encode-16 7.14ms ± 1% PostingsEncodingDecoding/1000000/raw/decode-16 81.7ns ± 9% PostingsEncodingDecoding/1000000#01/snappy/encode-16 7.52ms ± 3% PostingsEncodingDecoding/1000000#01/snappy/decode-16 139µs ± 4% PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16 11.4ms ± 4% PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16 15.5µs ± 4% name alloc/op PostingsEncodingDecoding/10000/raw/encode-16 13.6kB ± 0% PostingsEncodingDecoding/10000/raw/decode-16 96.0B ± 0% PostingsEncodingDecoding/10000#01/snappy/encode-16 25.9kB ± 0% PostingsEncodingDecoding/10000#01/snappy/decode-16 11.0kB ± 0% PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16 16.6kB ± 0% PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16 148kB ± 0% PostingsEncodingDecoding/100000/snappyStreamed/encode-16 148kB ± 0% PostingsEncodingDecoding/100000/snappyStreamed/decode-16 148kB ± 0% PostingsEncodingDecoding/100000#01/raw/encode-16 131kB ± 0% PostingsEncodingDecoding/100000#01/raw/decode-16 96.0B ± 0% PostingsEncodingDecoding/100000#02/snappy/encode-16 254kB ± 0% PostingsEncodingDecoding/100000#02/snappy/decode-16 107kB ± 0% PostingsEncodingDecoding/1000000/raw/encode-16 1.25MB ± 0% PostingsEncodingDecoding/1000000/raw/decode-16 96.0B ± 0% PostingsEncodingDecoding/1000000#01/snappy/encode-16 2.48MB ± 0% PostingsEncodingDecoding/1000000#01/snappy/decode-16 1.05MB ± 0% PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16 1.47MB ± 0% PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16 148kB ± 0% name allocs/op PostingsEncodingDecoding/10000/raw/encode-16 2.00 ± 0% PostingsEncodingDecoding/10000/raw/decode-16 2.00 ± 0% PostingsEncodingDecoding/10000#01/snappy/encode-16 3.00 ± 0% PostingsEncodingDecoding/10000#01/snappy/decode-16 4.00 ± 0% PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16 4.00 ± 0% PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16 5.00 ± 0% PostingsEncodingDecoding/100000/snappyStreamed/encode-16 4.00 ± 0% PostingsEncodingDecoding/100000/snappyStreamed/decode-16 5.00 ± 0% PostingsEncodingDecoding/100000#01/raw/encode-16 2.00 ± 0% PostingsEncodingDecoding/100000#01/raw/decode-16 2.00 ± 0% PostingsEncodingDecoding/100000#02/snappy/encode-16 3.00 ± 0% PostingsEncodingDecoding/100000#02/snappy/decode-16 4.00 ± 0% PostingsEncodingDecoding/1000000/raw/encode-16 2.00 ± 0% PostingsEncodingDecoding/1000000/raw/decode-16 2.00 ± 0% PostingsEncodingDecoding/1000000#01/snappy/encode-16 3.00 ± 0% PostingsEncodingDecoding/1000000#01/snappy/decode-16 4.00 ± 0% PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16 4.00 ± 0% PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16 5.00 ± 0% ``` Compression ratios are still the same like previously: ``` $ /bin/go test -v -timeout 10m -run ^TestDiffVarintCodec$ github.com/thanos-io/thanos/pkg/store [snip] === RUN TestDiffVarintCodec/snappy/i!~"2.*" postings_codec_test.go:73: postings entries: 944450 postings_codec_test.go:74: original size (4*entries): 3777800 bytes postings_codec_test.go:80: encoded size 44498 bytes postings_codec_test.go:81: ratio: 0.012 === RUN TestDiffVarintCodec/snappyStreamed/i!~"2.*" postings_codec_test.go:73: postings entries: 944450 postings_codec_test.go:74: original size (4*entries): 3777800 bytes postings_codec_test.go:80: encoded size 44670 bytes postings_codec_test.go:81: ratio: 0.012 ``` Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: clean up postings code Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: fix estimation Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store: use buffer.Bytes() Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * store/postings_codec: reuse extgrpc compressors/decompressors Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * CHANGELOG: add item Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * CHANGELOG: clean up whitespace Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * compact: atomically replace no compact marked map (thanos-io#6319) With lots of blocks it could take some time to fill this no compact marked map hence replace it atomically. I believe this leads to problems in the compaction planner where it picks up no compact marked blocks because meta syncer does synchronizations concurrently. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Fixed modules, logicalplan flag and more * Made assets * Removed unused test function * Sorted labels --------- Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Signed-off-by: hackeramitkumar <amit9116260192@gmail.com> Signed-off-by: Yi Jin <yi.jin@databricks.com> Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: mickeyzzc <mickey_zzc@163.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com> Signed-off-by: Will (Newby) Atlas <will@deno.com> Signed-off-by: junot <junotxiang@kubesphere.io> Signed-off-by: Pradyumna Krishna <git@onpy.in> Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com> Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: naveadkazi <navead@carwale.com> Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com> Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Co-authored-by: Amit kumar <amit9116260192@gmail.com> Co-authored-by: Yi Jin <96499497+jnyi@users.noreply.github.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: MickeyZZC <mickeyzzc@gmail.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Jacob Baungård Hansen <jacobbaungard@redhat.com> Co-authored-by: Will (Newby) Atlas <willnewby@gmail.com> Co-authored-by: junot <49136171+junotx@users.noreply.github.com> Co-authored-by: Pradyumna Krishna <git@onpy.in> Co-authored-by: Etienne Martel <etienne.martel.7@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com> Co-authored-by: Alex Le <emoc1989@gmail.com> Co-authored-by: naveadkazi <navead@carwale.com>
hczhu
pushed a commit
to databricks/thanos
that referenced
this pull request
Jun 27, 2023
* Fix segfault in LabelValues during head compaction Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data. This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add changelog entry Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Assert no error when querying labels Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
hczhu
pushed a commit
to databricks/thanos
that referenced
this pull request
Jun 27, 2023
* Fix segfault in LabelValues during head compaction Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data. This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add changelog entry Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Assert no error when querying labels Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> --------- Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data.
This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism.
Fixes #6190.
Changes
LabelValues
during head compaction.Verification
I managed to reproduce the issue with a unit test. After the fix I no longer see a panic in the store proxy.