query: Built-in cache for Thanos Querier backed up by Memcached #1006

bwplotka · 2019-04-03T10:37:47Z

We are missing caching layer for Querier.

There are multiple design choices we need to make on:

Should it be built-in in Querier or separate proxy
Should we use memcached backend? Should we support any others?
How to structure cache items
Should we cache Query API result or actually StoreAPI results?
Should we do it near QueryAPI or on federated Querier as well?
Should we just DON'T do it and leave that fully to Trickster: https://github.com/Comcast/trickster

AC:

Produce design proposal
Produce PoC with rough benchmarks

My plan is to start some yolo PoC for this while reusing awesome code that @tomwilkie created for Cortex: https://sourcegraph.com/github.com/cortexproject/cortex@1d0ff216199e43b7b221774b5cd56936e7d22440/-/blob/pkg/querier/frontend/frontend.go#L103 Hope I can just import it and "run" =D but probably will bumb into import issues, we will see.

Initial thoughts? Feedback? Issue for tracking mostly, and proper proposal will come after short spike.

bwplotka · 2019-04-03T10:45:13Z

Initial thoughts:

Should it be built-in in Querier or separate proxy

Built-in as first step, as we might want to make it more complex in future, plust it already alter a query bit (chop it, align) - (e.g mixed caching of results for QueryAPI and StoreAPI). We can always produce proxy-like bit in future..

Should we use memcached backend? Should we support any others?

I have a very good expierience with Memcached so far, we used it everywhere, again, there will be dep hell and code scope creep if we will allow ANY backend, so we need to be careful.

How to structure cache items

🤷‍♂️ Need to dive in to Cortex and Trickster caches.

Should we cache Query API result or actually StoreAPI results?

Result is easy win for now, but I feel like something in middle (caching PromQL evaluations) might be better. I think we should start with QueryAPI results, benchmark and iterate. Also worth to sync with Cortex guys on this - they are solving same problem.

Should we do it near QueryAPI or on federated Querier as well?

Query API for now, as we care about caching results as a first step.

Should we just DON'T do it and leave that fully to Trickster: https://github.com/Comcast/trickster

IMO, no as "Trickster" is not working well for users, mostly because of lack of understanding of PartialResponse strategies Querier allows. Also we would be forced to use results caching only.

SuperQ · 2019-04-03T11:05:32Z

From my observation the two most popular external network/cluster cache protocols right now are memcached and Redis. Both have support for self-hosting and cloud providers offer them as a service.

I would suggest sticking to external caching of data that could be shared between multiple query instances.

Trickster works reasonably OK. There is a next branch that should in theory improve a number of things. But, I agree, it doesn't really understand the data model, plus as a "dumb" cache, it can't to cache eviction.

bwplotka · 2019-04-06T10:53:25Z

Not bad in terms of deps I guess.

bwplotka · 2019-09-17T13:48:28Z

Update

The work to embed caching in Querier is potentially no longer needed as you can run Cortex query-frontend on top of any Prometheus query range API (: You can learn more about this in meetup video here

It's definitely a way to go, we already started to run thins on Production with Thanos (: We are now discussing the possibility to move query-frontend to separate neutral project: here

This will have many benefits e.g will allow us to properly document, and recommend using this. We can also definitely discuss the possibility to embed this logic inside Querier, but it will be much easier if query-frontend would be a separate project to do so (dependencies).

stale · 2020-01-11T05:42:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bwplotka added feature request/improvement difficulty: hard priority: P1 component: query labels Apr 3, 2019

bwplotka mentioned this issue Apr 3, 2019

[feature request] store: distributed queries against objstore backends #992

Closed

bwplotka mentioned this issue Apr 16, 2019

frontend: [Refactor] Isolated middlewares from frontend to allow usage from external projects. cortexproject/cortex#1332

Closed

bwplotka mentioned this issue Apr 29, 2019

store: Store gateway consuming lots of memory / OOMing #448

Closed

This was referenced Sep 16, 2019

querier: Moved query range API to middlewares reusing Cortex cache. #1039

Closed

proposal: Moving Caching part of query-frontend to separate project. cortexproject/cortex#1672

Closed

ppanyukov mentioned this issue Oct 14, 2019

Discussion: Improve memory use for queries #1649

Closed

bwplotka mentioned this issue Oct 15, 2019

Response caching for Thanos #1651

Closed

6 tasks

stale bot added the stale label Jan 11, 2020

stale bot closed this as completed Jan 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query: Built-in cache for Thanos Querier backed up by Memcached #1006

query: Built-in cache for Thanos Querier backed up by Memcached #1006

bwplotka commented Apr 3, 2019

bwplotka commented Apr 3, 2019

SuperQ commented Apr 3, 2019

bwplotka commented Apr 6, 2019

bwplotka commented Sep 17, 2019

stale bot commented Jan 11, 2020

query: Built-in cache for Thanos Querier backed up by Memcached #1006

query: Built-in cache for Thanos Querier backed up by Memcached #1006

Comments

bwplotka commented Apr 3, 2019

bwplotka commented Apr 3, 2019

SuperQ commented Apr 3, 2019

bwplotka commented Apr 6, 2019

bwplotka commented Sep 17, 2019

Update

stale bot commented Jan 11, 2020