-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async support for HttpSender #3717
Comments
I'm not sure we need an async abstraction to deal with the problem you mentioned. We just need to separate the steps of preparing the data to send and sending it, no? |
Hmm. I believe batches are there for a reason and one of them I believe is not to overload a single HTTP request with a Huge payload. But if you are referring to just constructing the payload at once but just splitting them into multiple requests, I guess that will lead to duplicate implementations in most of the Registries whereas a client that can return a CompletableFuture immediately can help in such scenarios. And you also don't want to queue the HTTP requests, especially in case of a high number of meters(some of the registries export Histograms as separate meters which can increase the publishing time to a huge extent and hence a huge gap in the values recorded for the step). |
#2818 is not a completely related issue. But, this comment of mine again points to registry implementations taking longer time to complete the publishing. |
The batching feature was implemented because some backends have a limit on the number of metrics that can be in a single request.
I'm not sure I see the point. Each registry already has to implement the logic of preparing the payload which will be unique in each registry implementation. That's already there. The only difference is collecting a list of payloads and then sending them outside of the batch loop.
I assume you mean that you don't want to send the requests in serial. Therein lies the complexity of what's being proposed here compared to the current arrangement. We currently publish on a single thread in most cases (some SDKs for metrics backends may use more threads under the hood, but they don't use HttpSender). If you're talking about making requests concurrently on multiple threads, now there is a lot more complexity to manage. We had (but never used) a configuration method called If you're wanting to minimize the time between a payload being prepared (metrics read) and its metrics stored in the backend, then the current arrangement should be as good or better than a parallel async sender one. How long are you seeing publishing taking with how many meters and which backend(s)? Is it the time taken to send the HTTP request that is costly? Sending more metrics payloads at once (what I think you're proposing) will exacerbate the issue in #2818. |
This is the problem, I am trying to bring up. Basically, the reading of meters in the registry should not be slowed down due to a blocking network call because this will skew the measurement window.
We have tried sending around 5-10 batches per publishing with batches ranging around 10k meters. This is a problem we started to see especially when we introduce a back-end that exports Histograms as individual metrics (unlike OTLP where everything is part of a single metric.) and has seen publishing lasting for more than 10 seconds. (step=60seconds)
Agreed. |
I think what Tommy is trying to say is that we can change this in a way that blocking network calls will not slow down reading of meters. If you "read-and-send" in a loop, they will be slowed down but if you read everything first (which usually does not include IO) and then send everything in batches, it should solve the issue, right? At least that's how I interpret this:
|
What I was looking for is more like a "fire and forget" HTTP call which will reduce the publish time. |
Have we managed to agree on something here? |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
I think resource consumption is a valid concern to not use the async sender. For now, I will let this one go. Let's see if we need something in the future. |
Please describe the feature request.
HttpUrlConnectionSender currently sends data in a synchronous fashion I,e the metrics are sent and we wait for the response blocking the thread. Since the publishing happens on a dedicated thread this might seem problematic when combined with timeouts for long pending requests. But, when applications have thousands of metrics that need to be batched and sent, which might result in metrics being measured for different intervals rather than for the same step.
Scenario:
Assume an application has 15,000 meters in the registry and the registry has a default batch size of 10,000. The scheduler calls the publishing method at 00:00:00. Reading meters is a faster task and we would complete reading 10,000 meters in x milliseconds but most of the registries call HttpUrlConnectionSender.send() for this first batch. Since this is a blocking I/O this depends on network latency and other external factors. And the reading of the next 5000 meters is delayed by the response time of the first HTTP request. This ensures that the first and second batch of meters doesn't measure the measurement for the same period of time.
This is mostly true for most of the registries that export each Histogram bucket as a Gauge. 1000 timers with 25 buckets can easily take up 25k meters and reading this in batches and sending them synchronously will take away the Step concept since the data sent is no more normalized to a fixed window.
Rationale
To make sure that we are reporting the measurements for the same step period. (more so that we minimize the gap that is there in the reading of data).
Probably, something like an AsyncHttpSender which will return a CompleteableFuture instead of blocking the thread. And all these futures can be awaited once all of them are submitted.
The text was updated successfully, but these errors were encountered: