Thanos (sidecar) returns no results with promxy used as Prometheus remote_read endpoint #355

frebib · 2020-10-07T13:31:31Z

Related to #350 and #351

For our uses, we are trying to set up an asymmetric Prometheus/Thanos setup using Promxy as a datacentre aggregator.
Here is a simplified view of what setup we're aiming to achieve. The problem part is the blue line from thanos query to promxy. The remote_read api call works as of #351 but returns no data. Cutting out promxy and targetting one of the Prometheus instance('s sidecar) directly works as expected and returns the relevant rows.

To replicate this, you'll need promxy and prometheus running, plus two thanos instances, one for the sidecar and one for the query webui.

thanos query --store=<grpc-addr-of-sidecar> --http-address=<bind-address-for-webui> --log.level=debug

thanos sidecar --grpc-address=<grpc-bind-address> --prometheus.url=<promxy-http-url> --log.level=debug

In trying to debug this, it seems that when pointing sidecar directly at Prometheus, it logs

level=debug ts=2020-10-07T12:06:30.476199814Z caller=prometheus.go:259 msg="started handling ReadRequest_STREAMED_XOR_CHUNKS streamed read response."
level=debug ts=2020-10-07T12:06:30.585904858Z caller=prometheus.go:335 msg="handled ReadRequest_STREAMED_XOR_CHUNKS request." frames=5816 series=5816

but when pointing sidecar at promxy, it instead logs

level=debug ts=2020-10-07T13:22:18.354275818Z caller=prometheus.go:214 msg="started handling ReadRequest_SAMPLED response type."
level=debug ts=2020-10-07T13:22:18.397907404Z caller=prometheus.go:254 msg="handled ReadRequest_SAMPLED request." series=5812

After reading your comment yesterday #352 (comment) combined with finding this change prometheus/prometheus@48b2c9c, I'm wondering if Thanos is expecting the remote_read reply in STREAMED_XOR_CHUNKS format instead of SAMPLED. (edit: It appears Thanos accepts both, although the STREAMED codepath is certainly more tested now as Prometheus uses it by default: https://github.com/thanos-io/thanos/blob/a7b2a449ce9aa77cc225a699c1f399a3528d97b3/pkg/store/prometheus.go#L206-L216). It's entirely possible it's not that but it is one difference I observed. This bug may also be fixed by #352 too, possibly

Before I start digging deep into the Prometheus/Thanos/Promxy code again, is there anything that jumps to mind that could cause this behaviour?

Thanks

The text was updated successfully, but these errors were encountered:

jacksontj · 2020-10-08T15:30:35Z

This is definitely a reasonable looking objective (prometheus local with recent data, remote thanos with more data). Based on the diagram above I'd expect that to work (although as mentioned in #350 I''m not aware of anyone using the remote_read into promxy), although it would have been broken until that PR yesterday.

One thing I'd suggest looking into as an efficiency improvement is trying to get promxy in front of the thanos querier. Promxy has the ability to sub out a query to many different nodes and requires significantly fewer resources to get the answer. I added an example explaining this a bit here but TLDR remote_read is an inefficient interface for queries. So If promxy could be in front of the stack then that enables some queries (data that is "recent") to be served using the regular query interface through promxy which is significantly cheaper (this would mean alerting would be dramatically cheaper since its acting on recent data).

I did see #352 but that issue seems to be some go.mod issue; in reality promxy is currently based on a prometheus 2.10 fork so that should be a non-issue. Now that does mean we aren't new enough to have that STREAMED_XOR_CHUNKS option and its also possible that prometheus 2.10 had a bug in the SAMPLED interface (it wouldn't surprise me, all the remote_read/write stuff is "unsupported"or "experimental" so there are bugs in there with some regularity). So with that I'd suggest trying your setup with prometheus 2.10 and if you see the same problem there -- then its likely some issue in the prometheus dep (which means it'd be time to update again).

jacksontj · 2021-02-01T23:05:14Z

Seems that there are no updates to this issue; so I'm going to close it out. If there is more to discuss or additional questions feel free to re-open!

jacksontj added the question label Oct 8, 2020

ottoyiu mentioned this issue Nov 6, 2020

Promxy returning empty results intermittently #362

Closed

jacksontj closed this as completed Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thanos (sidecar) returns no results with promxy used as Prometheus remote_read endpoint #355

Thanos (sidecar) returns no results with promxy used as Prometheus remote_read endpoint #355

frebib commented Oct 7, 2020 •

edited

Loading

jacksontj commented Oct 8, 2020

jacksontj commented Feb 1, 2021

Thanos (sidecar) returns no results with promxy used as Prometheus remote_read endpoint #355

Thanos (sidecar) returns no results with promxy used as Prometheus remote_read endpoint #355

Comments

frebib commented Oct 7, 2020 • edited Loading

jacksontj commented Oct 8, 2020

jacksontj commented Feb 1, 2021

frebib commented Oct 7, 2020 •

edited

Loading