-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector logs an error on packets that are too large #13175
Comments
So, this error is actually emitted when it tries to split an input batch but fails to do so successfully i.e. the new requests from the split inputs are still too big. It's definitely a little weird that it would fail after splitting since the limit is 3MB/62MB compressed/uncompressed... which can typically hold a lot of metrics. What do the |
Looks like in the past hour on datadog's graphs that this has averaged 52K |
I am also facing a similar issue while sending the metrics from datadog-agent -> vector-agent -> aggregator -> datadog. In datadog-agent: In vector-agent: In aggregator: Datadog agent version: 7.40.1-jmx |
## Context When support was added for encoding/sending sketches in #9178, logic was added to handle "splitting" payloads if a metric exceeded the (un)compressed payload limits. As we lacked (at the time) the ability to encode sketch metrics one-by-one, we were forced to collect all of them, and then attempt to encode them all at once, which had a tendency to grow the response size past the (un)compressed payload limits. This "splitting" mechanism allowed us to compensate for that. However, in order to avoid getting stuck in pathological loops where payloads were too big, and thus required multiple splits (after already attempting at least one split), the logic was configured such that a batch of metrics would only be split once, and if the two subsequent slices couldn't be encoded without also exceeding the limits, they would be dropped and we would give up trying to split further. Despite the gut feeling during that work that it should be exceedingly rare to ever need to split further, real life has shown otherwise: #13175 ## Solution This PR introduces proper incremental encoding of sketches, which doesn't eliminate the possibility of needing to split (more below) but significantly reduces the likelihood that splitting will need to happen down to a purely theoretical level. We're taking advantage of hidden-from-docs methods in `prost` to encode each `SketchPayload` object and append the bytes into a single buffer. This is possible due to how Protocol Buffers functions. Additionally, we're now generating "file descriptors" for our compiled Protocol Buffers definitions. We use this to let us programmatically query the field number of the "sketches" field in the `SketchPayload` message, which is a slightly more robust way than just hardcoding it and hoping it doesn't ever change in the future. In Protocol Buffers, each field in a message is written out such that the field data is preceded by the field number. This is part and parcel to its ability to allow for backwards compatible changes to a definition. Further, for repeated fields -- i.e. `Vec<Sketch>` -- the repetitive nature is determined simply by write the same field multiple times rather than needing to write everything all together. Practically speaking, this means that we can encode a vector of two messages, or encode those two messages individually, and end up with the same encoded output of `[field N][field data][field N][field data]`. ### Ancillary changes We've additionally fixed a bug with the "bytes sent" metric being reported for the `datadog_metrics` sink due to some very tangled and miswired code around how compressed/uncompressed/event bytes/etc sizes were being shuttled from the request builder logic down to `Driver`. We've also reworked some of the encoder error types just to clean them up and simplify things a bit. ## Reviewer notes ### Still needing to handle splits The encoder still does need to care about splits, in a theoretical sense, because while we can accurately track and avoid ever exceeding the uncompressed payload limit, we can't know the final compressed payload size until we finalize the builder/payload. Currently, the encoder does a check to see if adding the current metric would cause us to exceed the compressed payload limit, assuming the compressor couldn't actually compress the encoded metric at all. This is a fairly robust check since it tries to optimally account for the overhead of an entirely incompressible payload, and so on... but we really want to avoid dropping events if possible, obviously, and that's why the splitting code is still in place.
Closed by #17764 |
A note for the community
Problem
When processing entries vector encounters some input that is larger than it can handle:
It appears that the maximum payload size is limited.
When vector encounters packets that would need to be split it logs an error instead.
Configuration
Version
vector-0.22.0-1
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: