Skip to content

Commit

Permalink
Merge upstream 1839858 (thanos-io#69)
Browse files Browse the repository at this point in the history
* Update Thanos engine to latest version (thanos-io#6069)

This commit updates the Thanos PromQL engine to the latest version.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Receive: Tenants' external labels proposal (thanos-io#5720)

* Receive external labels proposal

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Restructure and edit proposal's content

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Update proposal

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Fix doc error

Signed-off-by: haanhvu <haanh6594@gmail.com>

Signed-off-by: haanhvu <haanh6594@gmail.com>

* fixing doc CI (thanos-io#6072)

Signed-off-by: Ben Ye <benye@amazon.com>

Signed-off-by: Ben Ye <benye@amazon.com>

* Fix stores filtering resets on reload (thanos-io#6063)

* Fix stores filtering resets on reload

`g0.store_matches` parameter appears in the url but doesn't applies
in the frontend. Looks like it has been done on purpose and by
removing a small piece of code fixes this issue.

variable named `debugMode` is used for the store filtering checkbox
which is an unappropriate name. Using `enableStoreFiltering`
variable to represent the state of checkbox.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Regenerate bindata.go

Signed-off-by: Pradyumna Krishna <git@onpy.in>

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Store: Make initial sync more robust

Added re-try mechanism for store inital sync, where if the initial sync fails, it tries to do the initial sync again for given timeout duration.

Signed-off-by: Kartik-Garg <kartik.garg@infracloud.io>

* Recover from panics in Series calls (thanos-io#6077)

* Recover from panics in Series calls

This commit adds panic recovery for Series calls in all Store servers.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Apply error suggestion

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* query: reuse our own gate (thanos-io#6079)

Do not call promgate directly but rather use our own wrapper that does
everything we want - duration histogram, current in-flight calls, total
calls.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Store: Support disable cache index header file. (thanos-io#5773)

* Store: Support disable cache index header file.

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* Store: add a seprate flag to disable caching index header file

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* Tools: add cleanup API for bucket web

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* resolve conversation

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* resolve confilcts

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* change the flag to `--cache-index-header`

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* Wrap mem writer in file writer

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* update CHANGELOG

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* update CHANGELOG

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

* fix bug

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>

---------

Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>
Co-authored-by: wanjunlei <wanjunlei@yujnify.com>

* CVE: Fix Receiver malicious tenant (thanos-io#5969)

If running as root or with enough privileges, receiver can create a
directory outside of the configured TenantHeader.

This commit fixes it up by sanitizing the user input and explicity not
allowing such behavior.

Signed-off-by: Daniel Mellado <dmellado@redhat.com>

* Add adopter Grupo MasMovil (thanos-io#6084)

Signed-off-by: Pablo Moncada Isla <pablo.moncada@masmovil.com>

* fix typo (thanos-io#6087)

Signed-off-by: cyip <cyip@jackhenry.com>
Co-authored-by: cyip <cyip@jackhenry.com>

* optimize selector to string (thanos-io#6076)

Signed-off-by: Kama Huang <kamatogo13@gmail.com>

* Fix: Failure to close BlockSeriesClient cause store-gateway deadlock (thanos-io#6086)

* Fix: Failure to close BlockSeriesClient cause store-gateway deadlock

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Adding tests

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* reverting the change on get series

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* fix lint

Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Cut 0.30.2 (thanos-io#6081)

* tracing: fixed panic because of nil sampler (thanos-io#6066)

* fixed panic because of nil sampler

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* added CHANGELOG entry

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>

* bump version to 0.30.2

Signed-off-by: Ben Ye <benye@amazon.com>

* Updates busybox SHA (thanos-io#6046)

Signed-off-by: GitHub <noreply@github.com>

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: yeya24 <yeya24@users.noreply.github.com>

* Use `e2edb.NewMinio` to disable SSE-S3 in e2e tests (thanos-io#6055)

* Use e2edb.NewMinio to disable SSE

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Use temp fork for TLS

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix broken rules api fanout test

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix broken query compatibility test

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Remove fork

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: yeya24 <yeya24@users.noreply.github.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* cherry pick store gateway fix to release 0.30 (thanos-io#6089)

* Fix: Failure to close BlockSeriesClient cause store-gateway deadlock (thanos-io#6086)

* Fix: Failure to close BlockSeriesClient cause store-gateway deadlock

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Adding tests

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* reverting the change on get series

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* fix lint

Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Co-authored-by: Alan Protasio <alanprot@gmail.com>

* fix changelog entries

Signed-off-by: Ben Ye <benye@amazon.com>

* docs: improving the description for tsdb.retention on the receiver

Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>

* Receiver: Use `intern` package when reallocating label strings (thanos-io#5926)

* Cleanup go mod

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Use string interning for labels realloc method

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Enhance label realloc benchmarks

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Make interning optional; put behind hiddend flag

Signed-off-by: Matej Gera <matej.gera@coralogix.com>

* Update CHANGELOG

Signed-off-by: Matej Gera <matej.gera@coralogix.com>

* Address feedback

- Fix wrong condition
- Adjust benchmarks

Signed-off-by: Matej Gera <matej.gera@coralogix.com>

---------

Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matej.gera@coralogix.com>
Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>

* Updaing README with drawing fixes and minor wording clarification (thanos-io#6078)

* New drawing and wording for Thanos other deployment models

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

* New drawing and wording for Thanos other deployment models

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

* Added comments to README.md and updated the quick-tutorial.md with the same diagram updates and text to match

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

* Ran make docs

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

---------

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

* Compact: Remove spam of replica label removed log (thanos-io#6088)

* Remove spam of replica label removed log

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reduce amount of removed replica label instead of removing it

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reformat code

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Store: Don't error when no stores are matched (thanos-io#6082)

It's normal and not an error if a query does not match due to no
downstream stores. This is common when querying with external labels and
tiered query servers.

This bug was introduced in thanos-io#5296

Fixes: thanos-io#5862

Signed-off-by: SuperQ <superq@gmail.com>

* docs: Fix must have Ruler alerts definition (thanos-io#6058)

* Fix must have Ruler alerts definition

ThanosRuler missing rule intervals metric used the wrong comparator sign, confusing users trying to create the rule.



Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com>

* Update docs/components/rule.md

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com>

---------

Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix conflicts

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Specify overwriting behavior in flag and add validation

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Add log and doc

Signed-off-by: haanhvu <haanh6594@gmail.com>

* Mixins(Rule): Fix query for long rule evaluations (thanos-io#6103)

* mixin(Rule): Fix query for long rule evaluations

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update changelog

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: haanhvu <haanh6594@gmail.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Pradyumna Krishna <git@onpy.in>
Signed-off-by: Kartik-Garg <kartik.garg@infracloud.io>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: wanjunlei <wanjunlei@kubesphere.io>
Signed-off-by: Daniel Mellado <dmellado@redhat.com>
Signed-off-by: Pablo Moncada Isla <pablo.moncada@masmovil.com>
Signed-off-by: cyip <cyip@jackhenry.com>
Signed-off-by: Kama Huang <kamatogo13@gmail.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matej.gera@coralogix.com>
Signed-off-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: SuperQ <superq@gmail.com>
Signed-off-by: Maxim Muzafarov <m.muzafarov@gmail.com>
Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Ha Anh Vu <75315486+haanhvu@users.noreply.github.com>
Co-authored-by: Ben Ye <benye@amazon.com>
Co-authored-by: Pradyumna Krishna <git@onpy.in>
Co-authored-by: Kartik-Garg <kartik.garg@infracloud.io>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: wanjunlei <53003665+wanjunlei@users.noreply.github.com>
Co-authored-by: wanjunlei <wanjunlei@yujnify.com>
Co-authored-by: Daniel Mellado <1313475+danielmellado@users.noreply.github.com>
Co-authored-by: Pablo Moncada <pmoncadaisla@gmail.com>
Co-authored-by: Chantel Yip <52993239+sshantel@users.noreply.github.com>
Co-authored-by: cyip <cyip@jackhenry.com>
Co-authored-by: Kama Huang <121007071+kama910@users.noreply.github.com>
Co-authored-by: Alan Protasio <alanprot@gmail.com>
Co-authored-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: yeya24 <yeya24@users.noreply.github.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Victor Fernandes <victorhbfernandes@gmail.com>
Co-authored-by: Matej Gera <38492574+matej-g@users.noreply.github.com>
Co-authored-by: Jonah Kowall <jkowall@kowall.net>
Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Co-authored-by: Ben Kochie <superq@gmail.com>
Co-authored-by: Maxim Muzafarov <m.muzafarov@gmail.com>
Co-authored-by: haanhvu <haanh6594@gmail.com>
  • Loading branch information
1 parent 456bfa2 commit af686b1
Show file tree
Hide file tree
Showing 33 changed files with 815 additions and 267 deletions.
7 changes: 4 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,15 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

- [#5995](https://github.com/thanos-io/thanos/pull/5995) Sidecar: Loads the TLS certificate during startup.
- [#6044](https://github.com/thanos-io/thanos/pull/6044) Receive: mark ouf of window errors as conflict, if out-of-window samples ingestion is activated
- [#6066](https://github.com/thanos-io/thanos/pull/6066) Tracing: fixed panic because of nil sampler
- [#6050](https://github.com/thanos-io/thanos/pull/6050) Store: Re-try bucket store initial sync upon failure.
- [#6067](https://github.com/thanos-io/thanos/pull/6067) Receive: fixed panic when querying uninitialized TSDBs.
- [#6082](https://github.com/thanos-io/thanos/pull/6082) Store: Don't error when no stores are matched.
- [#6103](https://github.com/thanos-io/thanos/pull/6103) Mixins(Rule): Fix query for long rule evaluations.

### Changed

- [#6010](https://github.com/thanos-io/thanos/pull/6010) *: Upgrade Prometheus to v0.41.0.
- [#5999](https://github.com/thanos-io/thanos/pull/5999) *: Upgrade Alertmanager dependency to v0.25.0.
- [#5887](https://github.com/thanos-io/thanos/pull/5887) Tracing: Make sure rate limiting sampler is the default, as was the case in version pre-0.29.0.
- [#6071](https://github.com/thanos-io/thanos/pull/6071) Query Frontend: *breaking :warning:* Aligned with Prometheus common model (cache reset required)

Expand All @@ -55,7 +58,6 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5880](https://github.com/thanos-io/thanos/pull/5880) Query Frontend: Fixes some edge cases of query sharding analysis.
- [#5893](https://github.com/thanos-io/thanos/pull/5893) Cache: Fixed redis client not respecting `SetMultiBatchSize` config value.
- [#5966](https://github.com/thanos-io/thanos/pull/5966) Query: Fixed mint and maxt when selecting series for the `api/v1/series` HTTP endpoint.
- [#5997](https://github.com/thanos-io/thanos/pull/5997) Rule: switch to miekgdns DNS resolver as the default one.
- [#5948](https://github.com/thanos-io/thanos/pull/5948) Store: `chunks_fetched_duration` wrong calculation.
- [#5910](https://github.com/thanos-io/thanos/pull/5910) Receive: Fixed ketama quorum bug that was could cause success response for failed replication. This also optimize heavily receiver CPU use.

Expand All @@ -81,7 +83,6 @@ NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unh
- [#5846](https://github.com/thanos-io/thanos/pull/5846) Query Frontend: vertical query sharding supports subqueries.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/manual/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5896](https://github.com/thanos-io/thanos/pull/5896) *: Upgrade Prometheus to v0.40.7 without implementing native histogram support. *Querying native histograms will fail with `Error executing query: invalid chunk encoding "<unknown>"` and native histograms in write requests are ignored.*
- [#5999](https://github.com/thanos-io/thanos/pull/5999) *: Upgrade Alertmanager dependency to v0.25.0.
- [#5909](https://github.com/thanos-io/thanos/pull/5909) Receive: Compact tenant head after no appends have happened for 1.5 `tsdb.max-block-size`.
- [#5838](https://github.com/thanos-io/thanos/pull/5838) Mixin: Added data touched type to Store dashboard.
- [#5922](https://github.com/thanos-io/thanos/pull/5922) Compact: Retry on clean, partial marked errors when possible.
Expand Down
15 changes: 10 additions & 5 deletions cmd/thanos/receive.go
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ func runReceive(
conf.allowOutOfOrderUpload,
hashFunc,
)
writer := receive.NewWriter(log.With(logger, "component", "receive-writer"), dbs)
writer := receive.NewWriter(log.With(logger, "component", "receive-writer"), dbs, conf.writerInterning)

var limitsConfig *receive.RootLimitsConfig
if conf.writeLimitsConfig != nil {
Expand Down Expand Up @@ -788,8 +788,9 @@ type receiveConfig struct {
tsdbMemorySnapshotOnShutdown bool
tsdbEnableNativeHistograms bool

walCompression bool
noLockFile bool
walCompression bool
noLockFile bool
writerInterning bool

hashFunc string

Expand Down Expand Up @@ -832,14 +833,14 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {

rc.objStoreConfig = extkingpin.RegisterCommonObjStoreFlags(cmd, "", false)

rc.retention = extkingpin.ModelDuration(cmd.Flag("tsdb.retention", "How long to retain raw samples on local storage. 0d - disables this retention. For more details on how retention is enforced for individual tenants, please refer to the Tenant lifecycle management section in the Receive documentation: https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management").Default("15d"))
rc.retention = extkingpin.ModelDuration(cmd.Flag("tsdb.retention", "How long to retain raw samples on local storage. 0d - disables the retention policy (i.e. infinite retention). For more details on how retention is enforced for individual tenants, please refer to the Tenant lifecycle management section in the Receive documentation: https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management").Default("15d"))

cmd.Flag("receive.hashrings-file", "Path to file that contains the hashring configuration. A watcher is initialized to watch changes and update the hashring dynamically.").PlaceHolder("<path>").StringVar(&rc.hashringsFilePath)

cmd.Flag("receive.hashrings", "Alternative to 'receive.hashrings-file' flag (lower priority). Content of file that contains the hashring configuration.").PlaceHolder("<content>").StringVar(&rc.hashringsFileContent)

hashringAlgorithmsHelptext := strings.Join([]string{string(receive.AlgorithmHashmod), string(receive.AlgorithmKetama)}, ", ")
cmd.Flag("receive.hashrings-algorithm", "The algorithm used when distributing series in the hashrings. Must be one of "+hashringAlgorithmsHelptext).
cmd.Flag("receive.hashrings-algorithm", "The algorithm used when distributing series in the hashrings. Must be one of "+hashringAlgorithmsHelptext+". Will be overwritten by the tenant-specific algorithm in the hashring config.").
Default(string(receive.AlgorithmHashmod)).
EnumVar(&rc.hashringsAlgorithm, string(receive.AlgorithmHashmod), string(receive.AlgorithmKetama))

Expand Down Expand Up @@ -905,6 +906,10 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").Hidden().BoolVar(&rc.tsdbEnableNativeHistograms)

cmd.Flag("writer.intern",
"[EXPERIMENTAL] Enables string interning in receive writer, for more optimized memory usage.").
Default("false").Hidden().BoolVar(&rc.writerInterning)

cmd.Flag("hash-func", "Specify which hash function to use when calculating the hashes of produced files. If no function has been specified, it does not happen. This permits avoiding downloading some files twice albeit at some performance cost. Possible values are: \"\", \"SHA256\".").
Default("").EnumVar(&rc.hashFunc, "SHA256", "")

Expand Down
35 changes: 30 additions & 5 deletions cmd/thanos/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,16 @@ import (
"github.com/thanos-io/thanos/pkg/ui"
)

const (
retryTimeoutDuration = 30
retryIntervalDuration = 10
)

type storeConfig struct {
indexCacheConfigs extflag.PathOrContent
objStoreConfig extflag.PathOrContent
dataDir string
cacheIndexHeader bool
grpcConfig grpcConfig
httpConfig httpConfig
indexCacheSizeBytes units.Base2Bytes
Expand Down Expand Up @@ -87,9 +93,12 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
sc.grpcConfig = *sc.grpcConfig.registerFlag(cmd)
sc.storeRateLimits.RegisterFlags(cmd)

cmd.Flag("data-dir", "Local data directory used for caching purposes (index-header, in-mem cache items and meta.jsons). If removed, no data will be lost, just store will have to rebuild the cache. NOTE: Putting raw blocks here will not cause the store to read them. For such use cases use Prometheus + sidecar.").
cmd.Flag("data-dir", "Local data directory used for caching purposes (index-header, in-mem cache items and meta.jsons). If removed, no data will be lost, just store will have to rebuild the cache. NOTE: Putting raw blocks here will not cause the store to read them. For such use cases use Prometheus + sidecar. Ignored if -no-cache-index-header option is specified.").
Default("./data").StringVar(&sc.dataDir)

cmd.Flag("cache-index-header", "Cache TSDB index-headers on disk to reduce startup time. When set to true, Thanos Store will download index headers from remote object storage on startup and create a header file on disk. Use --data-dir to set the directory in which index headers will be downloaded.").
Default("true").BoolVar(&sc.cacheIndexHeader)

cmd.Flag("index-cache-size", "Maximum size of items held in the in-memory index cache. Ignored if --index-cache.config or --index-cache.config-file option is specified.").
Default("250MB").BytesVar(&sc.indexCacheSizeBytes)

Expand Down Expand Up @@ -234,6 +243,11 @@ func runStore(
conf storeConfig,
flagsMap map[string]string,
) error {
dataDir := conf.dataDir
if !conf.cacheIndexHeader {
dataDir = ""
}

grpcProbe := prober.NewGRPC()
httpProbe := prober.NewHTTP()
statusProber := prober.Combine(
Expand Down Expand Up @@ -315,7 +329,7 @@ func runStore(
}

ignoreDeletionMarkFilter := block.NewIgnoreDeletionMarkFilter(logger, bkt, time.Duration(conf.ignoreDeletionMarksDelay), conf.blockMetaFetchConcurrency)
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, bkt, conf.dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg),
metaFetcher, err := block.NewMetaFetcher(logger, conf.blockMetaFetchConcurrency, bkt, dataDir, extprom.WrapRegistererWithPrefix("thanos_", reg),
[]block.MetadataFilter{
block.NewTimePartitionMetaFilter(conf.filterConf.MinTime, conf.filterConf.MaxTime),
block.NewLabelShardedMetaFilter(relabelConfig),
Expand Down Expand Up @@ -357,7 +371,7 @@ func runStore(
bs, err := store.NewBucketStore(
bkt,
metaFetcher,
conf.dataDir,
dataDir,
store.NewChunksLimiterFactory(conf.maxSampleCount/store.MaxSamplesPerChunk), // The samples limit is an approximation based on the max number of samples per chunk.
store.NewSeriesLimiterFactory(conf.maxTouchedSeriesCount),
store.NewBytesLimiterFactory(conf.maxDownloadedBytes),
Expand All @@ -383,14 +397,25 @@ func runStore(

level.Info(logger).Log("msg", "initializing bucket store")
begin := time.Now()
if err := bs.InitialSync(ctx); err != nil {

// This will stop retrying after set timeout duration.
initialSyncCtx, cancel := context.WithTimeout(ctx, retryTimeoutDuration*time.Second)
defer cancel()

// Retry in case of error.
err := runutil.Retry(retryIntervalDuration*time.Second, initialSyncCtx.Done(), func() error {
return bs.InitialSync(ctx)
})

if err != nil {
close(bucketStoreReady)
return errors.Wrap(err, "bucket store initial sync")
}

level.Info(logger).Log("msg", "bucket store ready", "init_duration", time.Since(begin).String())
close(bucketStoreReady)

err := runutil.Repeat(conf.syncInterval, ctx.Done(), func() error {
err = runutil.Repeat(conf.syncInterval, ctx.Done(), func() error {
if err := bs.SyncBlocks(ctx); err != nil {
level.Warn(logger).Log("msg", "syncing blocks failed", "err", err)
}
Expand Down
7 changes: 5 additions & 2 deletions docs/components/receive.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,9 @@ Flags:
the hashring configuration.
--receive.hashrings-algorithm=hashmod
The algorithm used when distributing series in
the hashrings. Must be one of hashmod, ketama
the hashrings. Must be one of hashmod, ketama.
Will be overwritten by the tenant-specific
algorithm in the hashring config.
--receive.hashrings-file=<path>
Path to file that contains the hashring
configuration. A watcher is initialized
Expand Down Expand Up @@ -366,7 +368,8 @@ Flags:
next startup.
--tsdb.path="./data" Data directory of TSDB.
--tsdb.retention=15d How long to retain raw samples on local
storage. 0d - disables this retention.
storage. 0d - disables the retention
policy (i.e. infinite retention).
For more details on how retention is
enforced for individual tenants, please
refer to the Tenant lifecycle management
Expand Down
2 changes: 1 addition & 1 deletion docs/components/rule.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ The most important metrics to alert on are:

* `prometheus_rule_evaluation_failures_total`. If greater than 0, it means that that rule failed to be evaluated, which results in either gap in rule or potentially ignored alert. This metric might indicate problems on the queryAPI endpoint you use. Alert heavily on this if this happens for longer than your alert thresholds. `strategy` label will tell you if failures comes from rules that tolerate [partial response](#partial-response) or not.

* `prometheus_rule_group_last_duration_seconds < prometheus_rule_group_interval_seconds` If the difference is large, it means that rule evaluation took more time than the scheduled interval. It can indicate that your query backend (e.g Querier) takes too much time to evaluate the query, i.e. that it is not fast enough to fill the rule. This might indicate other problems like slow StoreAPis or too complex query expression in rule.
* `prometheus_rule_group_last_duration_seconds > prometheus_rule_group_interval_seconds` If the difference is positive, it means that rule evaluation took more time than the scheduled interval, and data for some intervals could be missing. It can indicate that your query backend (e.g Querier) takes too much time to evaluate the query, i.e. that it is not fast enough to fill the rule. This might indicate other problems like slow StoreAPis or too complex query expression in rule.

* `thanos_rule_evaluation_with_warnings_total`. If you choose to use Rules and Alerts with [partial response strategy's](#partial-response) value as "warn", this metric will tell you how many evaluation ended up with some kind of warning. To see the actual warnings see WARN log level. This might suggest that those evaluations return partial response and might not be accurate.

Expand Down
13 changes: 10 additions & 3 deletions docs/components/store.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ Flags:
Number of goroutines to use when constructing
index-cache.json blocks from object storage.
Must be equal or greater than 1.
--cache-index-header Cache TSDB index-headers on disk to reduce
startup time. When set to true, Thanos Store
will download index headers from remote object
storage on startup and create a header file on
disk. Use --data-dir to set the directory in
which index headers will be downloaded.
--chunk-pool-size=2GB Maximum size of concurrently allocatable
bytes reserved strictly to reuse for chunks in
memory.
Expand All @@ -47,9 +53,10 @@ Flags:
purposes (index-header, in-mem cache items and
meta.jsons). If removed, no data will be lost,
just store will have to rebuild the cache.
NOTE: Putting raw blocks here will not cause
the store to read them. For such use cases use
Prometheus + sidecar.
NOTE: Putting raw blocks here will not
cause the store to read them. For such use
cases use Prometheus + sidecar. Ignored if
-no-cache-index-header option is specified.
--grpc-address="0.0.0.0:10901"
Listen ip:port address for gRPC endpoints
(StoreAPI). Make sure this address is routable
Expand Down
16 changes: 12 additions & 4 deletions docs/quick-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,21 @@ Following the [KISS](https://en.wikipedia.org/wiki/KISS_principle) and Unix phil
* Querier/Query: implements Prometheus's v1 API to aggregate data from the underlying components.
* Query Frontend: implements Prometheus's v1 API proxies it to Query while caching the response and optional splitting by queries day.

Deployment with Sidecar:
Deployment with Sidecar for Kubernetes:

![Sidecar](https://docs.google.com/drawings/d/e/2PACX-1vTBFKKgf8YDInJyRakPE8eZZg9phTlOsBB2ogNkFvhNGbZ8YDvz_cGMbxWZBG1G6hpsQfSX145FpYcv/pub?w=960&h=720)
<!---
Source file to copy and edit: https://docs.google.com/drawings/d/1AiMc1qAjASMbtqL6PNs0r9-ynGoZ9LIAtf0b9PjILxw/edit?usp=sharing
-->

Deployment with Receive:
![Sidecar](https://docs.google.com/drawings/d/e/2PACX-1vSJd32gPh8-MC5Ko0-P-v1KQ0Xnxa0qmsVXowtkwVGlczGfVW-Vd415Y6F129zvh3y0vHLBZcJeZEoz/pub?w=960&h=720)

![Receive](https://docs.google.com/drawings/d/e/2PACX-1vTfko27YB_3ab7ZL8ODNG5uCcrpqKxhmqaz3lW-yhGN3_oNxkTrqXmwwlcZjaWf3cGgAJIM4CMwwkEV/pub?w=960&h=720)
Deployment with Receive in order to scale out or implement with other remote write compatible sources:

<!---
Source file to copy and edit: https://docs.google.com/drawings/d/1iimTbcicKXqz0FYtSfz04JmmVFLVO9BjAjEzBm5538w/edit?usp=sharing
-->

![Receive](https://docs.google.com/drawings/d/e/2PACX-1vRdYP__uDuygGR5ym1dxBzU6LEx5v7Rs1cAUKPsl5BZrRGVl5YIj5lsD_FOljeIVOGWatdAI9pazbCP/pub?w=960&h=720)

### Sidecar

Expand Down
2 changes: 1 addition & 1 deletion docs/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -644,7 +644,7 @@ total 2209344
drwxr-xr-x 2 bwplotka bwplotka 4096 Dec 10 2019 chunks
-rw-r--r-- 1 bwplotka bwplotka 1962383742 Dec 10 2019 index
-rw-r--r-- 1 bwplotka bwplotka 6761 Dec 10 2019 meta.json
-rw-r--r-- 1 bwplotka bwplotka 111 Dec 10 2019 delete-mark.json # <-- Optional marker.
-rw-r--r-- 1 bwplotka bwplotka 111 Dec 10 2019 deletion-mark.json # <-- Optional marker.
-rw-r--r-- 1 bwplotka bwplotka 124 Dec 10 2019 no-compact-mark.json # <-- Optional marker.
01DN3SK96XDAEKRB1AN30AAW6E/chunks:
Expand Down
2 changes: 1 addition & 1 deletion examples/dashboards/rule.json
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@
"steppedLine": false,
"targets": [
{
"expr": "(\n max by(job, rule_group) (prometheus_rule_group_last_duration_seconds{job=~\"$job\"})\n >\n sum by(job, rule_group) (prometheus_rule_group_interval_seconds{job=~\"$job\"})\n)\n",
"expr": "(\n sum by(job, rule_group) (prometheus_rule_group_last_duration_seconds{job=~\"$job\"})\n >\n sum by(job, rule_group) (prometheus_rule_group_interval_seconds{job=~\"$job\"})\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ rule_group }}",
Expand Down
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,12 @@ require (

require (
go.opentelemetry.io/contrib/propagators/autoprop v0.34.0
go4.org/intern v0.0.0-20220617035311-6925f38cc365
golang.org/x/exp v0.0.0-20221212164502-fae10dda9338
)

require go4.org/unsafe/assume-no-moving-gc v0.0.0-20220617031537-928513b29760 // indirect

require (
cloud.google.com/go/compute/metadata v0.2.2 // indirect
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.32.3 // indirect
Expand Down
Loading

0 comments on commit af686b1

Please sign in to comment.