-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An error occurred that Vector couldn’t handle: failed to encode record: BufferTooSmall when using http server source causing the vector to stop ingesting data #18346
Comments
Thanks for this report @raghu999 ! Did you share the right logs though? The error logs there seem to be showing an error with the disk buffers. Do you have logs showing the panic in the |
@jszwedko Updated the issue we are only seeing the buffer error but it is the same dataset that we were seeing an error when using the Splunk HEC source and when we switched to using the HTTP server source we now are seeing a new error that is related to buffers. We confirmed that the disk buffer is not full when this issue happens. |
Gotcha, thanks for clarifying @raghu999 ! It sounds more like a bug in the disk buffer implementation than the |
This error comes from our limitations around the maximum record size allowed in a disk buffer, which is set to 128MB. A record is essentially a chunk of events written all together, which based on the sources used in the given configuration, would be any events decoded from a single request. This would imply that if the error above is being hit, that a single request sent to either the Based on your knowledge of the clients sending to your Vector instance, is it possible for requests to be that large (90-100MB)? |
Hi everyone! Facing the same issue here:
This is the transform:
In my case it is unlikely for requests to be even close to 90-100mb. |
We're seeing this error again with a different application. Some relevant log lines
The source is version we used: v0.34.0 A few questions:
|
I am seeing similar issues
The containers exists upon certain load , around 10k events/sec. |
@awangc have You managed to find the root cause or workaround ? I've seen similar issues while sending logs from upstream vector pods that are part of OpenShift and the default batch size is 10MB I believe. I've been testing with sending just generic nginx logs so really small events. Upon certain load the vector would exit with same error related to the buffer. I have observed the issue not happening when the batch size on upstream vector is set to 1MB (so 10 times less). @jszwedko is there any doc how are the transforms related to a disk buffers ? |
Transforms emit data to an output channel which is then consumed by buffers. Is that the sort of detail you were looking for? Or do you have different sort of question? For this issue, I know it is a lot to ask, but if anyone is able to create a standalone reproduction, that would aid with debugging. |
I have managed a create a test case to recreate this issue. It seems to be a combination of large events, a transform and disk buffers for a sink. I created a large json test event with a "message" field also containing json, approx 10Mb. I then just replicated that same event multiple times into 3 different files, each about 500Mb. I can provide the example event if required. Below is the config to replicate it along with the output of a run.
|
A note for the community
Problem
We had a similar issue when using the Splunk HEC source and raised a bug report #17670, We started using the HTTP source and are now seeing a buffer error with the HTTP source causing the vector to stop ingesting any new data the containers are entering a restart loop with OOM error on Kubernetes and on vector we are seeing the below error.
K8's container
Vector Error:
Configuration
Version
0.31.x
Debug Output
No response
Example Data
No response
Additional Context
Vector is running in Kubernetes and this specific client has large payloads with close to 6000-8000 fields in their entire dataset.
References
#17670 Faced similar issue with splunk hec source we moved to http source and are seeing a new error.
The text was updated successfully, but these errors were encountered: