Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault in LabelValues during head compaction #6271

Merged
merged 3 commits into from
Apr 12, 2023

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Apr 12, 2023

Head compaction causes blocks outside the retention period to get deleted. If there is an in-flight LabelValues request at the same time, deleting the block can cause the store proxy to panic since it loses access to the data.

This commit fixes the issue by copying label values from TSDB stores before returning them to the store proxy. I thought about exposing a Close method on the TSDB store which the Proxy can call, but this will not eliminate cases where gRPC defers sending data over a channel using its queueing mechanism.

Fixes #6190.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Fix segfault in LabelValues during head compaction.

Verification

I managed to reproduce the issue with a unit test. After the fix I no longer see a panic in the store proxy.

Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@GiedriusS GiedriusS enabled auto-merge (squash) April 12, 2023 08:51
@saswatamcode saswatamcode mentioned this pull request Apr 12, 2023
2 tasks
@saswatamcode
Copy link
Member

Docs check fails due to some broken links, fixed above

@fpetkovski
Copy link
Contributor Author

Thanks, should I rebase or we will fix them on main?

@saswatamcode
Copy link
Member

Let's rebase!

Head compaction causes blocks outside the retention period to get deleted.
If there is an in-flight LabelValues request at the same time, deleting
the block can cause the store proxy to panic since it loses access to
the data.

This commit fixes the issue by copying label values from TSDB stores
before returning them to the store proxy. I thought about exposing
a Close method on the TSDB store which the Proxy can call, but this will
not eliminate cases where gRPC defers sending data over a channel using its
queueing mechanism.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@saswatamcode saswatamcode enabled auto-merge (squash) April 12, 2023 10:02
@saswatamcode saswatamcode merged commit daf72a2 into thanos-io:main Apr 12, 2023
rabenhorst added a commit to rabenhorst/thanos that referenced this pull request May 4, 2023
* mixins: Add code/grpc-code dimension to error widgets

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update changelog

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix messed up merge conflict resolution

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Readd empty line at the end of changelog

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* mixin(Rule): Add rule evaluation failures to the Rule dashboard (thanos-io#6244)

* Improve Thanos Rule dashboard legends

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add evaluations failed to Rule dashboard

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Refactor rule dashboard

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add changelog entry

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* added thanos logo in react app (thanos-io#6264)

Signed-off-by: hackeramitkumar <amit9116260192@gmail.com>

* Add an experimental flag to block samples with timestamp too far in the future (thanos-io#6195)

* Add an experimental flag to block samples with timestamp too far in the future

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix bug

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix docs CI errors

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* resolve merge conflicts

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* resolve merge conflicts

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* retrigger checks

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* store/bucket: snappy-encoded postings reading improvements (thanos-io#6245)

* store: pool input to snappy.Decode

Pool input to snappy.Decode to avoid allocations.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: use s2 for decoding snappy

It's faster hence use it.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: small code style adjustment

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: call closefns before returning err

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store/postings_codec: return both if possible

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store/bucket: always call close fns

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* truncateExtLabels support Unicode cut (thanos-io#6267)

* truncateExtLabels support Unicode cut

Signed-off-by: mickeyzzc <mickey_zzc@163.com>

* update TestTruncateExtLabels and pass test

Signed-off-by: mickeyzzc <mickey_zzc@163.com>

---------

Signed-off-by: mickeyzzc <mickey_zzc@163.com>

* Update mentorship links

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix segfault in LabelValues during head compaction (thanos-io#6271)

* Fix segfault in LabelValues during head compaction

Head compaction causes blocks outside the retention period to get deleted.
If there is an in-flight LabelValues request at the same time, deleting
the block can cause the store proxy to panic since it loses access to
the data.

This commit fixes the issue by copying label values from TSDB stores
before returning them to the store proxy. I thought about exposing
a Close method on the TSDB store which the Proxy can call, but this will
not eliminate cases where gRPC defers sending data over a channel using its
queueing mechanism.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add changelog entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Assert no error when querying labels

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Mixin: Allow specifying an instance name filter (thanos-io#6273)

This commit allow specifying the instance name filter, in order to
filter the datasources shown on the dashboards.

For example, when generating the dashboards one can do the following
(i.e in config.libsonnet)

```
  dashboard+:: {
    prefix: 'Thanos / ',
    ...
    instance_name_filter: '/EU.*/'
```

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Adds Deno to adopters.yml (thanos-io#6275)

Signed-off-by: Will (Newby) Atlas <will@deno.com>

* Bump `make test` timeout (thanos-io#6276)

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* fix 0.31 changelog (thanos-io#6278)

Signed-off-by: junot <junotxiang@kubesphere.io>

* Query: Switch Multiple Engines (thanos-io#6234)

* Query: Switch engines using `engine` param

Thanos query has two engine, prometheus (default) and thanos.
A single engine runs through thanos query command at a time, and
have to re run the command to switch between.

This commit adds a functionality to run multiple engines at once
and switch between them using `engine` query param inq query api.

To avoid duplicate matrics registration, the thanos engine is
provided with a different registerer having prefix `tpe_` (not
been finalized yet).

promql-engine command line flag has been removed that specifies
the query engine to run.

Currently this functionality not implemented on GRPCAPI.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add multiple engine support to GRPCAPI

Fix build fail for thanos, adds support for multiple engine in
GRPCAPI.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Create QueryEngineFactory to create engines

QueryEngineFactory makes a collection for all promql engines used
by thanos and returns it. Any engine can be created and returned
using `GetXEngine` method.

It is currently limited to 2 engines prometheus and thanos engines
that get created on the first call.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Use QueryEngineFactory in query API

thanos query commands pass `QueryEngineFactory` to query apis
that will use engine based on query params. It will provide more
flexibility to create multiple engines in thanos.

Adds `defaultEngine` CLI flag, A default engine to use if not
specified with query params.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Update Query API tests

Fixes breaking tests

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Minor changes and Docs fixes

* Move defaultEngine argument to reduce diff.
* Generated Docs.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add Engine Selector/ Dropdown to Query UI

Engine Selector is a dropdown that sets an engine to be used to
run the query. Currently two engines `thanos` and `prometheus`.

This dropdown sends a query param `engine` to query api, which
runs the api using the engine provided. Provided to run query
using multiple query engines from Query UI.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Move Engine Selector to Panel

Removes Dropdown component, and renders Engine Selector directly.
Receives defaultEngine from `flags` API.
Updates parseOptions to parse engine query param and updates test
for Panel and utils.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Upgrade promql-engine dependency

Updates promql-engine that brings functionality to provide
fallback engine using enigne Opts.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add MinT to remote client

MinT method was missing from Client due to updated promql-engine.
This commits adds mint to the remote client.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Use prometheus fallback engine in thanos engine

Thanos engine creates a fallback prometheus engine that conflicts
with another prometheus engine created by thanos, while
registering metrics. To fix this, provided created thanos engine
as fallback engine to thanos engine in engine Opts.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Use enum for EngineType in GRPC

GRPC is used for communication between thanos components and
defaultEngine was a string before. Enum makes more sense, and
hence the request.Enigne type has been changed to
querypb.EngineType.
Default case is handled with another default value provided in
the enum.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Update query UI bindata.go

Compile react app using `make assets`.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

---------

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* docs: mismatch in changelog

Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com>

* Updates busybox SHA (thanos-io#6283)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>

* Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346 (thanos-io#6287)

* Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Reverted test code

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Updated comment

Signed-off-by: Alex Le <leqiyue@amazon.com>

* docs: mismatch in changelog

Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* Updates busybox SHA (thanos-io#6283)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>

* trigger workflow

Signed-off-by: Alex Le <leqiyue@amazon.com>

* trigger workflow

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com>
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Etienne Martel <etienne.martel.7@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>

* Add CarTrade Tech as new adopter

Signed-off-by: naveadkazi <navead@carwale.com>

* tests: Remove custom Between test matcher (thanos-io#6310)

* Remove custom Between test matcher

The upstream PR to efficientgo/e2e has been merged, so we can use  it from there.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Run go mod tidy

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* query frontend, query UI: Native histogram support (thanos-io#6071)

* Implemented native histogram support for qfe and query UI

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed marshalling for histograms in qfe

Started working on native histogram query ui

Copied histogram implementation for graph

Added query range support for native histograms in qfe

Use prom model (un-)marshal for native histograms in qfe

Use prom model (un-)marshal for native histograms in qfe

Fixed sample and sample stream marshal fn

Extended qfe native histogram e2e tests

Added copyright to qfe queryrange compat

Added query range test fo histograms and try to fix ui tests

Fixed DataTable test

Review feedback

Fixed native histogram e2e test

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Add histogram support for ApplyCounterResetsSeriesIterator

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Made assets

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Add chnagelog

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed changelog

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed qfe

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Fixed PrometheusResponse minTime for histograms in qfe

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Updated prometheus common to v0.40.0 and queryrange.Sample fixes

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Updated Readme

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Addressed PR comments

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

trigger tests

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

Made assets

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Made assets

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* fixed tsdbutil references

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* fixed imports

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Enabled pushdown for query native hist test and removed ToDo

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Refactored native histogram query UI

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

---------

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* store: add streamed snappy encoding for postings list (thanos-io#6303)

* store: add streamed snappy encoding for postings list

We've noticed that decoding Snappy compressed postings list
takes a lot of RAM:

```
(pprof) top
Showing nodes accounting for 1427.30GB, 67.55% of 2112.82GB total
Dropped 1069 nodes (cum <= 10.56GB)
Showing top 10 nodes out of 82
      flat  flat%   sum%        cum   cum%
         0     0%     0%  1905.67GB 90.20%  golang.org/x/sync/errgroup.(*Group).Go.func1
    2.08GB 0.098% 0.098%  1456.94GB 68.96%  github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings
    1.64GB 0.078%  0.18%  1454.87GB 68.86%  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings
    2.31GB  0.11%  0.29%  1258.15GB 59.55%  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).fetchPostings
    1.48GB  0.07%  0.36%  1219.67GB 57.73%  github.com/thanos-io/thanos/pkg/store.diffVarintSnappyDecode
 1215.21GB 57.52% 57.87%  1215.21GB 57.52%  github.com/klauspost/compress/s2.Decode
```

This is because we are creating a new []byte slice for the decoded data
each time. To avoid this RAM usage problem, let's stream the decoding
from a given buffer. Since Snappy block format doesn't support streamed
decoding, let's switch to Snappy stream format which is made for exactly
that.

Notice that our current `index.Postings` list does not
support going back through Seek() even if theoretically one could want
something like that. Fortunately, to search for posting intersection, we
need to only go forward.

Benchmark data:

```
name                                                          time/op
PostingsEncodingDecoding/10000/raw/encode-16                  71.6µs ± 3%
PostingsEncodingDecoding/10000/raw/decode-16                  76.3ns ± 4%
PostingsEncodingDecoding/10000#01/snappy/encode-16            73.3µs ± 1%
PostingsEncodingDecoding/10000#01/snappy/decode-16            1.63µs ± 6%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16     111µs ± 2%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16    14.5µs ± 7%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16      1.09ms ± 2%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16      14.4µs ± 4%
PostingsEncodingDecoding/100000#01/raw/encode-16               710µs ± 1%
PostingsEncodingDecoding/100000#01/raw/decode-16              79.3ns ±13%
PostingsEncodingDecoding/100000#02/snappy/encode-16            719µs ± 1%
PostingsEncodingDecoding/100000#02/snappy/decode-16           13.5µs ± 4%
PostingsEncodingDecoding/1000000/raw/encode-16                7.14ms ± 1%
PostingsEncodingDecoding/1000000/raw/decode-16                81.7ns ± 9%
PostingsEncodingDecoding/1000000#01/snappy/encode-16          7.52ms ± 3%
PostingsEncodingDecoding/1000000#01/snappy/decode-16           139µs ± 4%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16  11.4ms ± 4%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16  15.5µs ± 4%

name                                                          alloc/op
PostingsEncodingDecoding/10000/raw/encode-16                  13.6kB ± 0%
PostingsEncodingDecoding/10000/raw/decode-16                   96.0B ± 0%
PostingsEncodingDecoding/10000#01/snappy/encode-16            25.9kB ± 0%
PostingsEncodingDecoding/10000#01/snappy/decode-16            11.0kB ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16    16.6kB ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16     148kB ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16       148kB ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16       148kB ± 0%
PostingsEncodingDecoding/100000#01/raw/encode-16               131kB ± 0%
PostingsEncodingDecoding/100000#01/raw/decode-16               96.0B ± 0%
PostingsEncodingDecoding/100000#02/snappy/encode-16            254kB ± 0%
PostingsEncodingDecoding/100000#02/snappy/decode-16            107kB ± 0%
PostingsEncodingDecoding/1000000/raw/encode-16                1.25MB ± 0%
PostingsEncodingDecoding/1000000/raw/decode-16                 96.0B ± 0%
PostingsEncodingDecoding/1000000#01/snappy/encode-16          2.48MB ± 0%
PostingsEncodingDecoding/1000000#01/snappy/decode-16          1.05MB ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16  1.47MB ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16   148kB ± 0%

name                                                          allocs/op
PostingsEncodingDecoding/10000/raw/encode-16                    2.00 ± 0%
PostingsEncodingDecoding/10000/raw/decode-16                    2.00 ± 0%
PostingsEncodingDecoding/10000#01/snappy/encode-16              3.00 ± 0%
PostingsEncodingDecoding/10000#01/snappy/decode-16              4.00 ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16      4.00 ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16      5.00 ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16        4.00 ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16        5.00 ± 0%
PostingsEncodingDecoding/100000#01/raw/encode-16                2.00 ± 0%
PostingsEncodingDecoding/100000#01/raw/decode-16                2.00 ± 0%
PostingsEncodingDecoding/100000#02/snappy/encode-16             3.00 ± 0%
PostingsEncodingDecoding/100000#02/snappy/decode-16             4.00 ± 0%
PostingsEncodingDecoding/1000000/raw/encode-16                  2.00 ± 0%
PostingsEncodingDecoding/1000000/raw/decode-16                  2.00 ± 0%
PostingsEncodingDecoding/1000000#01/snappy/encode-16            3.00 ± 0%
PostingsEncodingDecoding/1000000#01/snappy/decode-16            4.00 ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16    4.00 ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16    5.00 ± 0%
```

Compression ratios are still the same like previously:

```
$ /bin/go test -v -timeout 10m -run ^TestDiffVarintCodec$ github.com/thanos-io/thanos/pkg/store
[snip]
=== RUN   TestDiffVarintCodec/snappy/i!~"2.*"
    postings_codec_test.go:73: postings entries: 944450
    postings_codec_test.go:74: original size (4*entries): 3777800 bytes
    postings_codec_test.go:80: encoded size 44498 bytes
    postings_codec_test.go:81: ratio: 0.012
=== RUN   TestDiffVarintCodec/snappyStreamed/i!~"2.*"
    postings_codec_test.go:73: postings entries: 944450
    postings_codec_test.go:74: original size (4*entries): 3777800 bytes
    postings_codec_test.go:80: encoded size 44670 bytes
    postings_codec_test.go:81: ratio: 0.012
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: clean up postings code

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: fix estimation

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: use buffer.Bytes()

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store/postings_codec: reuse extgrpc compressors/decompressors

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* CHANGELOG: add item

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* CHANGELOG: clean up whitespace

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* compact: atomically replace no compact marked map (thanos-io#6319)

With lots of blocks it could take some time to fill this no compact
marked map hence replace it atomically. I believe this leads to problems
in the compaction planner where it picks up no compact marked blocks
because meta syncer does synchronizations concurrently.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Fixed modules, logicalplan flag and more

* Made assets

* Removed unused test function

* Sorted labels

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: hackeramitkumar <amit9116260192@gmail.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: mickeyzzc <mickey_zzc@163.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
Signed-off-by: Will (Newby) Atlas <will@deno.com>
Signed-off-by: junot <junotxiang@kubesphere.io>
Signed-off-by: Pradyumna Krishna <git@onpy.in>
Signed-off-by: Etienne Martel <etienne.martel.7@gmail.com>
Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: Alex Le <leqiyue@amazon.com>
Signed-off-by: naveadkazi <navead@carwale.com>
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Amit kumar <amit9116260192@gmail.com>
Co-authored-by: Yi Jin <96499497+jnyi@users.noreply.github.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: MickeyZZC <mickeyzzc@gmail.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Jacob Baungård Hansen <jacobbaungard@redhat.com>
Co-authored-by: Will (Newby) Atlas <willnewby@gmail.com>
Co-authored-by: junot <49136171+junotx@users.noreply.github.com>
Co-authored-by: Pradyumna Krishna <git@onpy.in>
Co-authored-by: Etienne Martel <etienne.martel.7@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>
Co-authored-by: Alex Le <emoc1989@gmail.com>
Co-authored-by: naveadkazi <navead@carwale.com>
hczhu pushed a commit to databricks/thanos that referenced this pull request Jun 27, 2023
* Fix segfault in LabelValues during head compaction

Head compaction causes blocks outside the retention period to get deleted.
If there is an in-flight LabelValues request at the same time, deleting
the block can cause the store proxy to panic since it loses access to
the data.

This commit fixes the issue by copying label values from TSDB stores
before returning them to the store proxy. I thought about exposing
a Close method on the TSDB store which the Proxy can call, but this will
not eliminate cases where gRPC defers sending data over a channel using its
queueing mechanism.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add changelog entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Assert no error when querying labels

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
hczhu pushed a commit to databricks/thanos that referenced this pull request Jun 27, 2023
* Fix segfault in LabelValues during head compaction

Head compaction causes blocks outside the retention period to get deleted.
If there is an in-flight LabelValues request at the same time, deleting
the block can cause the store proxy to panic since it loses access to
the data.

This commit fixes the issue by copying label values from TSDB stores
before returning them to the store proxy. I thought about exposing
a Close method on the TSDB store which the Proxy can call, but this will
not eliminate cases where gRPC defers sending data over a channel using its
queueing mechanism.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add changelog entry

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Assert no error when querying labels

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LabelValues leads to panic in Receiver during compaction
3 participants