From a6845ad75f132edc2e557d4f4d18f4cde62dd5fb Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Tue, 14 Apr 2020 17:42:14 +0100 Subject: [PATCH 1/6] proposal: Added proposal for new Thanos component: Thanos Frontend. Signed-off-by: Bartlomiej Plotka --- .../202004_embedd_cortex_frontend.md | 119 ++++++++++++++++++ docs/proposals/_index.md | 10 ++ 2 files changed, 129 insertions(+) create mode 100644 docs/proposals/202004_embedd_cortex_frontend.md diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md new file mode 100644 index 0000000000..fc31aec74f --- /dev/null +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -0,0 +1,119 @@ +--- +title: Adding a New Thanos Component that Embeds Cortex Query Frontend +type: proposal +menu: proposals +status: in-review +owner: bwplotka +--- + +### Related Tickets + +* Response caching: https://github.com/thanos-io/thanos/issues/1651 +* Moving query frontend to separate repo: https://github.com/cortexproject/cortex/issues/1672 + +## Summary + +This proposal describes addition of a new Thanos command (component) into `cmd/thanos` called `frontend`. +This component will literally import a certain version of Cortex [frontend package](https://github.com/cortexproject/cortex/tree/4410bed704e7d8f63418b02b328ddb93d99fad0b/pkg/querier/frontend). + +We will go through rationales, and potential alternatives. + +## Motivation + +[Cortex Frontend](https://www.youtube.com/watch?v=eyBbImSDOrI&t=2s) was introduced by Tom in August 2019. It was designed +to be deployed in front of Prometheus Query API in order to ensure: + +* Query split by time. +* Query step alignment. +* Query retry logic +* Query limit logic +* Query response cache in memory, Memcached or Redis. + +Since the nature of Cortex backend is really similar to Thanos, with exactly the same PromQL API, and long term capabilities, the caching +work done for Cortex fits to Thanos. Given also our good collaboration in the past, it feels natural to reuse Cortex's code. +We even started discussion to move it to separate repo, but there was no motivation towards this, for a good reason. +At the end we were advertising to use cortex query frontend on production on top of Thanos and this works considerably well, with some +problems on edge cases and for downsampled data as mentioned [here]() + +However, we realized recently that asking users to install suddenly Cortex component on top of Thanos system is extremely confusing: + +* Cortex has totally different way of configuring services. It requires to decide what module you have in single YAML file. Thanos in opposite +have flags and different subcommand for each component. +* Cortex has bit different way of configuring memcached, which is inconsistent with what we have in Thanos Stote Gateway. +* There are many Cortex specific configuration items which can confuse Thanos user and increase complexity overall (e.g. ) +* We have many ideas how to improve Cortex Query Frontend on top of Thanos, but adding Thanos specific configuration options will increase +complexity on Cortex side as well. +* Cortex has no good example or tutorial on how to use frontend either. We have only [Observatorium example](https://github.com/observatorium/configuration/blob/master/environments/openshift/manifests/observatorium-template.yaml#L515). + +All of this were causing confusion and questions like [this](https://cloud-native.slack.com/archives/CK5RSSC10/p1586504362400300?thread_ts=1586492170.387900&cid=CK5RSSC10). + +At the end we decided with Thanos and Cortex maintainers that the ultimately it would be awesome to create a new Thanos subcommand called `frontend`. + +## Use Cases + +* User can cache responses for query range +* User can use the same configuration patterns as rest of Thanos components. +* Thanos Developer does not need to write and maintain custom response cache logic. + +## Goals of this design + +* Enable response caching that will easy to use for Thanos users. +* Keep it extensible and scalable for future improvements like advanced query planning, queuing, rate limiting etc. +* Reuse as much as possible between projects, contribute. + +## No Goals + +* Create Thanos specific response caching from scratch. + +## Proposal + +The idea is to create `thanos frontend` component that allows specifying following options: + +* `--query-range.split-interval`, `time.Duration` +* `--query-range.max-retries-per-request`, `int`, default = `5` +* `--query-range.disable-step-align`, `bool` +* `--query-range.response-cache-ttl` `time.Duration` default = `1m` +* `--query-range.response-cache-config(-file)` `pathorcontent` + [CacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) + +We plan to have in-mem, fifo and memcached support for now. Cache config will be exactly the same as the one used for Store Gateway. + +### Open Questions + +* Is `thanos frontend` a valid name? I feel it's short and verbose enough. + +### Alternatives + +#### Don't add anything, document Cortex query frontend and add examples of usage + +Unfortunately we tried this path already without success. Reasons were mentioned in [Motivation](202004_embedd_cortex_frontend.md#Motivation) + +#### Add response caching to Querier itself, in the same binary. + +This will definitely simplify deployment if Querier would allow caching directly. However, this way is not really scalable. + +Furthermore, eventually frontend will be responsible for more than just caching. It is meant to do query planning like splitting or even +advanced query paralization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying. +This means that at some point we would need an ability to scale query part and caching/query planner totally separately. + +NOTE: We can still consider just simple response caching inside Querier if user will request so. + +#### Write response caching from scratch. + +I think this does not need to be explained. Response caching has proven to be not trivial. It's really amazing that we +have opportunity to work towards something that works with experts in the field like @tomwilkie and others from Loki and Cortex Team. + +Overall, [Reusing is caring](https://www.bwplotka.dev/2020/how-to-became-oss-maintainer/#5-want-more-help-give-back-help-others). + +## Work Plan + +1. Refactor [IndexCacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) to generic cache config so we can reuse. +1. Add necessary changes to Cortex frontend + * Metric generalization (they are globals now). +1. Add `thanos frontend` subcommand. +1. Add proper e2e test using cache. +1. Document new subcommand +1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos0) + +## Future Work + +Improvements to Cortex query frontend, so Thanos `frontend` as described [here](https://github.com/thanos-io/thanos/issues/1651) diff --git a/docs/proposals/_index.md b/docs/proposals/_index.md index 906e867f16..7a77eb8108 100644 --- a/docs/proposals/_index.md +++ b/docs/proposals/_index.md @@ -1,3 +1,13 @@ --- title: "Proposals:" --- + +List of current proposals. + +Proposals can have 5 Statuses (`.Params.Status`): + +* accepted +* complete +* rejected +* in-review +* draft \ No newline at end of file From c7d16c027326c9880220034209268a1ddcecc859 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Tue, 14 Apr 2020 18:20:24 +0100 Subject: [PATCH 2/6] Added more rationales for separate binary. Signed-off-by: Bartlomiej Plotka --- docs/proposals/202004_embedd_cortex_frontend.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md index fc31aec74f..23b6b5d23b 100644 --- a/docs/proposals/202004_embedd_cortex_frontend.md +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -95,6 +95,8 @@ Furthermore, eventually frontend will be responsible for more than just caching. advanced query paralization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying. This means that at some point we would need an ability to scale query part and caching/query planner totally separately. +Last but not least splitting queries allows to perform request in parallel. Only if used in single binary we can achieve load balancing of those requests. + NOTE: We can still consider just simple response caching inside Querier if user will request so. #### Write response caching from scratch. From 589d5d9c9da738915b9c3980bc69c364c32924d8 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 15 Apr 2020 09:25:16 +0100 Subject: [PATCH 3/6] Addressed Marco comments. Signed-off-by: Bartlomiej Plotka --- .../202004_embedd_cortex_frontend.md | 29 ++++++++++++++----- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md index 23b6b5d23b..8d78a9ff83 100644 --- a/docs/proposals/202004_embedd_cortex_frontend.md +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -33,14 +33,14 @@ Since the nature of Cortex backend is really similar to Thanos, with exactly the work done for Cortex fits to Thanos. Given also our good collaboration in the past, it feels natural to reuse Cortex's code. We even started discussion to move it to separate repo, but there was no motivation towards this, for a good reason. At the end we were advertising to use cortex query frontend on production on top of Thanos and this works considerably well, with some -problems on edge cases and for downsampled data as mentioned [here]() +problems on edge cases and for downsampled data as mentioned [here](https://github.com/thanos-io/thanos/issues/1651). However, we realized recently that asking users to install suddenly Cortex component on top of Thanos system is extremely confusing: -* Cortex has totally different way of configuring services. It requires to decide what module you have in single YAML file. Thanos in opposite +* Cortex has totally different way of configuring services. It requires deciding what module you have in single YAML file. Thanos in opposite have flags and different subcommand for each component. -* Cortex has bit different way of configuring memcached, which is inconsistent with what we have in Thanos Stote Gateway. -* There are many Cortex specific configuration items which can confuse Thanos user and increase complexity overall (e.g. ) +* Cortex has bit different way of configuring memcached, which is inconsistent with what we have in Thanos Store Gateway. +* There are many Cortex specific configuration items which can confuse Thanos user and increase complexity overall. * We have many ideas how to improve Cortex Query Frontend on top of Thanos, but adding Thanos specific configuration options will increase complexity on Cortex side as well. * Cortex has no good example or tutorial on how to use frontend either. We have only [Observatorium example](https://github.com/observatorium/configuration/blob/master/environments/openshift/manifests/observatorium-template.yaml#L515). @@ -72,15 +72,26 @@ The idea is to create `thanos frontend` component that allows specifying followi * `--query-range.split-interval`, `time.Duration` * `--query-range.max-retries-per-request`, `int`, default = `5` * `--query-range.disable-step-align`, `bool` -* `--query-range.response-cache-ttl` `time.Duration` default = `1m` +* `--query-range.response-cache-ttl` `time.Duration` +* `--query-range.response-cache-max-freshness` `time.Duration` default = `1m` * `--query-range.response-cache-config(-file)` `pathorcontent` + [CacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) We plan to have in-mem, fifo and memcached support for now. Cache config will be exactly the same as the one used for Store Gateway. +This command will be placeholder for any query planning or queueing logic that we might want to add at some point. It will be not part of any gRPC API. + +To make this happen we will propose a small refactor in Cortex code to avoid unnecessary package dependencies. + ### Open Questions * Is `thanos frontend` a valid name? I feel it's short and verbose enough. +Other, considered options: +* `loadbalance`, `lb` +* `ingress` +* `edge` +* `planner` + ### Alternatives #### Don't add anything, document Cortex query frontend and add examples of usage @@ -92,12 +103,12 @@ Unfortunately we tried this path already without success. Reasons were mentioned This will definitely simplify deployment if Querier would allow caching directly. However, this way is not really scalable. Furthermore, eventually frontend will be responsible for more than just caching. It is meant to do query planning like splitting or even -advanced query paralization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying. +advanced query parallelization (query sharding). This might mean future improvements in terms of query scheduling, queuing and retrying. This means that at some point we would need an ability to scale query part and caching/query planner totally separately. Last but not least splitting queries allows to perform request in parallel. Only if used in single binary we can achieve load balancing of those requests. -NOTE: We can still consider just simple response caching inside Querier if user will request so. +NOTE: We can still consider just simple response caching inside the Querier if user will request so. #### Write response caching from scratch. @@ -109,12 +120,14 @@ Overall, [Reusing is caring](https://www.bwplotka.dev/2020/how-to-became-oss-mai ## Work Plan 1. Refactor [IndexCacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) to generic cache config so we can reuse. +Make it implement cortex cache.Cache interface. 1. Add necessary changes to Cortex frontend * Metric generalization (they are globals now). + * Avoid unnecessary dependencies. 1. Add `thanos frontend` subcommand. 1. Add proper e2e test using cache. 1. Document new subcommand -1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos0) +1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos) ## Future Work From 8ba9e6e69e676391a43183bfe725ab0581dff21c Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 15 Apr 2020 18:42:55 +0100 Subject: [PATCH 4/6] Addressed lucas comments. Signed-off-by: Bartlomiej Plotka --- .../202004_embedd_cortex_frontend.md | 42 ++++++++----------- 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md index 8d78a9ff83..181b7e39c3 100644 --- a/docs/proposals/202004_embedd_cortex_frontend.md +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -9,12 +9,13 @@ owner: bwplotka ### Related Tickets * Response caching: https://github.com/thanos-io/thanos/issues/1651 -* Moving query frontend to separate repo: https://github.com/cortexproject/cortex/issues/1672 +* Moving query frontend to separate repo: https://github.com/cortexproject/Cortex/issues/1672 +* Discussion about naming: https://cloud-native.slack.com/archives/CK5RSSC10/p1586939369171300 ## Summary -This proposal describes addition of a new Thanos command (component) into `cmd/thanos` called `frontend`. -This component will literally import a certain version of Cortex [frontend package](https://github.com/cortexproject/cortex/tree/4410bed704e7d8f63418b02b328ddb93d99fad0b/pkg/querier/frontend). +This proposal describes addition of a new Thanos command (component) into `cmd/thanos` called `query-frontend` +This component will literally import a certain version of Cortex [frontend package](https://github.com/cortexproject/Cortex/tree/4410bed704e7d8f63418b02b328ddb93d99fad0b/pkg/querier/frontend). We will go through rationales, and potential alternatives. @@ -31,8 +32,10 @@ to be deployed in front of Prometheus Query API in order to ensure: Since the nature of Cortex backend is really similar to Thanos, with exactly the same PromQL API, and long term capabilities, the caching work done for Cortex fits to Thanos. Given also our good collaboration in the past, it feels natural to reuse Cortex's code. -We even started discussion to move it to separate repo, but there was no motivation towards this, for a good reason. -At the end we were advertising to use cortex query frontend on production on top of Thanos and this works considerably well, with some +We even started discussion to move it to separate repo, but there was no motivation towards this, since there is no issue on using +the Cortex one, as Cortex is happy to take generalized contributions. + +At the end we were advertising to use Cortex query frontend on production on top of Thanos and this works considerably well, with some problems on edge cases and for downsampled data as mentioned [here](https://github.com/thanos-io/thanos/issues/1651). However, we realized recently that asking users to install suddenly Cortex component on top of Thanos system is extremely confusing: @@ -43,25 +46,26 @@ have flags and different subcommand for each component. * There are many Cortex specific configuration items which can confuse Thanos user and increase complexity overall. * We have many ideas how to improve Cortex Query Frontend on top of Thanos, but adding Thanos specific configuration options will increase complexity on Cortex side as well. -* Cortex has no good example or tutorial on how to use frontend either. We have only [Observatorium example](https://github.com/observatorium/configuration/blob/master/environments/openshift/manifests/observatorium-template.yaml#L515). +* Cortex has no good example or tutorial on how to use frontend either. We have only [Observatorium example](https://github.com/observatorium/configuration/blob/5129a8beb9507f29aec05566ca9a0f2ad82bbf76/environments/openshift/manifests/observatorium-template.yaml#L515). All of this were causing confusion and questions like [this](https://cloud-native.slack.com/archives/CK5RSSC10/p1586504362400300?thread_ts=1586492170.387900&cid=CK5RSSC10). -At the end we decided with Thanos and Cortex maintainers that the ultimately it would be awesome to create a new Thanos subcommand called `frontend`. +At the end we decided with Thanos and Cortex maintainers that, ultimately, it would be awesome to create a new Thanos service called `query-frontend`. ## Use Cases -* User can cache responses for query range -* User can use the same configuration patterns as rest of Thanos components. -* Thanos Developer does not need to write and maintain custom response cache logic. +* User can cache responses for query range. +* User can split query range queries. +* User can rate limit and retry range queries. ## Goals of this design * Enable response caching that will easy to use for Thanos users. * Keep it extensible and scalable for future improvements like advanced query planning, queuing, rate limiting etc. * Reuse as much as possible between projects, contribute. +* Use the same configuration patterns as rest of Thanos components. -## No Goals +## Non Goals * Create Thanos specific response caching from scratch. @@ -82,23 +86,13 @@ This command will be placeholder for any query planning or queueing logic that w To make this happen we will propose a small refactor in Cortex code to avoid unnecessary package dependencies. -### Open Questions - -* Is `thanos frontend` a valid name? I feel it's short and verbose enough. - -Other, considered options: -* `loadbalance`, `lb` -* `ingress` -* `edge` -* `planner` - ### Alternatives #### Don't add anything, document Cortex query frontend and add examples of usage Unfortunately we tried this path already without success. Reasons were mentioned in [Motivation](202004_embedd_cortex_frontend.md#Motivation) -#### Add response caching to Querier itself, in the same binary. +#### Add response caching to Querier itself, in the same process. This will definitely simplify deployment if Querier would allow caching directly. However, this way is not really scalable. @@ -120,11 +114,11 @@ Overall, [Reusing is caring](https://www.bwplotka.dev/2020/how-to-became-oss-mai ## Work Plan 1. Refactor [IndexCacheConfig](https://github.com/thanos-io/thanos/blob/55cb8ca38b3539381dc6a781e637df15c694e50a/pkg/store/cache/factory.go#L32) to generic cache config so we can reuse. -Make it implement cortex cache.Cache interface. +Make it implement Cortex cache.Cache interface. 1. Add necessary changes to Cortex frontend * Metric generalization (they are globals now). * Avoid unnecessary dependencies. -1. Add `thanos frontend` subcommand. +1. Add `thanos frontend` subcommand. Call it `query-frontend` in docs and in the communication. 1. Add proper e2e test using cache. 1. Document new subcommand 1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos) From adb66003af914a0e2c76f36a4a3ba353595909d1 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Wed, 15 Apr 2020 18:55:32 +0100 Subject: [PATCH 5/6] Changed to approved. Signed-off-by: Bartlomiej Plotka --- docs/proposals/202004_embedd_cortex_frontend.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md index 181b7e39c3..4a0b76ed63 100644 --- a/docs/proposals/202004_embedd_cortex_frontend.md +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -2,7 +2,7 @@ title: Adding a New Thanos Component that Embeds Cortex Query Frontend type: proposal menu: proposals -status: in-review +status: approved owner: bwplotka --- From 2eb0e6646f60665f9ea57a00550f9fc3b84e8611 Mon Sep 17 00:00:00 2001 From: Bartlomiej Plotka Date: Thu, 16 Apr 2020 09:57:29 +0100 Subject: [PATCH 6/6] Moved to query-frontend command. Signed-off-by: Bartlomiej Plotka --- docs/proposals/202004_embedd_cortex_frontend.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/proposals/202004_embedd_cortex_frontend.md b/docs/proposals/202004_embedd_cortex_frontend.md index 4a0b76ed63..da07c0e222 100644 --- a/docs/proposals/202004_embedd_cortex_frontend.md +++ b/docs/proposals/202004_embedd_cortex_frontend.md @@ -71,7 +71,7 @@ At the end we decided with Thanos and Cortex maintainers that, ultimately, it wo ## Proposal -The idea is to create `thanos frontend` component that allows specifying following options: +The idea is to create `thanos query-frontend` component that allows specifying following options: * `--query-range.split-interval`, `time.Duration` * `--query-range.max-retries-per-request`, `int`, default = `5` @@ -118,11 +118,11 @@ Make it implement Cortex cache.Cache interface. 1. Add necessary changes to Cortex frontend * Metric generalization (they are globals now). * Avoid unnecessary dependencies. -1. Add `thanos frontend` subcommand. Call it `query-frontend` in docs and in the communication. +1. Add `thanos query-frontend` subcommand. 1. Add proper e2e test using cache. 1. Document new subcommand 1. Add to [kube-thanos](https://github.com/thanos-io/kube-thanos) ## Future Work -Improvements to Cortex query frontend, so Thanos `frontend` as described [here](https://github.com/thanos-io/thanos/issues/1651) +Improvements to Cortex query frontend, so Thanos `query-frontend` as described [here](https://github.com/thanos-io/thanos/issues/1651)