-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promxy remote_read api endpoint hangs forever #350
Comments
Previously this value was unset, defaulting to zero. A zero-value initialised a channel with no buffer, causing all writes to the buffer to hang, and therefore all remote_read queries to hang until they timed-out. This is contradictory to the Prometheus documentation, which claims that a zero value allows infinite concurrent reads. Fixes: jacksontj#350 Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
remote_read has worked in the past, but this read concurrent limit is newer than the remote_read in promxy -- so likely it stopped working then? I am a bit confused as we have remote_read tests in CI which seem to be passing (https://github.com/jacksontj/promxy/blob/master/test/promql_test.go#L44)? Either way your linked PR seems reasonable; I'll get that merged in! |
Previously this value was unset, defaulting to zero. A zero-value initialised a channel with no buffer, causing all writes to the buffer to hang, and therefore all remote_read queries to hang until they timed-out. This is contradictory to the Prometheus documentation, which claims that a zero value allows infinite concurrent reads. Fixes: #350 Signed-off-by: Joe Groocock <jgroocock@cloudflare.com>
That statement enables remote_read from |
OIC; with that being the case we need a test case added for reading remote_read from promxy (which is a supported use-case; its just a lot less efficient than a promql query :) -- which is likely why no one has noticed! ). To expand a bit on that, the reason its less efficient is that the remote_read interface requires sending all the datapoints for a given timeseries, not the result. So as an example if you wanted Created #356 to add that test. |
I discovered this when trying to attach thanos-sidecar to promxy to leverage the remote_read api to fulfil the most recent 2 hours of metrics for a Thanos query.
It seems the hang starts here https://github.com/prometheus/prometheus/blob/8dbfa14607a5cb138df43638f06c1a938c73df84/web/api/v1/api.go#L1218 where the
remoteReadGate
blocks as there are no slots available. The gate is configured withremoteReadConcurrencyLimit
number of slots, as provided by theweb.Options
initialised in the promxymain.go
. The field is never set, so defaults to0
(which makes sense). When trying to obtain a slot with the gate, it hangs until the context expires, because it can never write the struct (as there is no buffer, and no readers): https://github.com/prometheus/prometheus/blob/8dbfa14607a5cb138df43638f06c1a938c73df84/pkg/gate/gate.go#L33.The puzzling part about this to me, is the contradictory documentation from
prometheus --help
:This implies a
0
would never block, but that can't be true because a channel with buffer of0
always blocks.Either way, by setting this value >0 allows remote_read to work again.
Have you experienced this? Has remote_read ever worked through promxy? Did something change on Prometheus side?
The text was updated successfully, but these errors were encountered: