Vector batch bytes limits are based on in-memory sizing of events #10020

jszwedko · 2021-11-12T22:20:57Z

@tobz pointed out that our current batching mechanism uses the in-memory representation of the events to determine their size which will not match their serialized size. This could result in Vector sending batches that are either above or below the configured batch size. Typically we think it will be below given that the in-memory size of the event has been observed to be generally much greater than the serialized size of the event so Vector will send suboptimal batches. However, if it does generate a batch greater than the configured batch size this could result in failed requests if the batch size was configured to match some sink API limit.

References:

The text was updated successfully, but these errors were encountered:

domfie · 2022-12-15T00:39:44Z

I guess I ran into this issue this week. We had vector 0.10 working since the release and wanted to upgrade to the latest vector version. During testing nothing seamed odd, but in production our firehose delivery stream got throttled. We were making 8000 requests per second instead of 16 to kinesis. We rolled the changes back and I had a look though the different source code versions of the kinesis firehose sink. The last one with working batching was before the great overhaul in v0.18.

smitthakkar96 · 2023-05-08T18:03:49Z

@jszwedko, any updates on this issue? Is it part of the roadmap by any chance?

jszwedko · 2023-05-08T18:06:37Z

@jszwedko, any updates on this issue? Is it part of the roadmap by any chance?

Unfortunately not yet. This'll be a difficult thing to fix.

jszwedko added type: bug A code related bug. domain: performance Anything related to Vector's performance labels Nov 12, 2021

jszwedko mentioned this issue Sep 14, 2022

gcp_cloud_storage sink ignoring batch.max_bytes #14426

Closed

jszwedko mentioned this issue Sep 28, 2022

Batch sizing for aws_s3 sink does not work #14416

Closed

jszwedko mentioned this issue May 15, 2023

chore: RFC for Data Volume Insights #17322

Merged

lukesteensen mentioned this issue Jul 13, 2023

Datadog Logs sink sacrifices 750KB on payload size for throughput and we'd like to avoid that sacrifice. #9202

Closed

jszwedko mentioned this issue Aug 1, 2023

vector 0.18.1 s3 sink does not use batch.max_bytes, creates small files on S3 #10535

Closed

This was referenced Nov 1, 2023

fix(datadog_logs sink): use actual JSON size for batching #19003

Closed

fix(datadog_logs sink): serialize before batching for more accurate request sizing #19037

Closed

This was referenced Nov 14, 2023

fix(datadog_logs sink): use estimated JSON size for batching and count escapable characters #19151

Closed

fix(datadog_logs sink): abort serialization and split batch when payload is too large #19189

Merged

fcoelho mentioned this issue Mar 13, 2024

Better support for large S3 batches #3829

Open

bennettbri62 mentioned this issue Aug 16, 2024

aws_s3 object sizes written by sink do not align with batch vector.yaml configuration settings. #21087

Closed

jszwedko mentioned this issue Nov 4, 2024

S3 sink batch size stuck at 2.4 MB sized files #21696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector batch bytes limits are based on in-memory sizing of events #10020

Vector batch bytes limits are based on in-memory sizing of events #10020

jszwedko commented Nov 12, 2021 •

edited

Loading

domfie commented Dec 15, 2022

smitthakkar96 commented May 8, 2023

jszwedko commented May 8, 2023

Vector batch bytes limits are based on in-memory sizing of events #10020

Vector batch bytes limits are based on in-memory sizing of events #10020

Comments

jszwedko commented Nov 12, 2021 • edited Loading

domfie commented Dec 15, 2022

smitthakkar96 commented May 8, 2023

jszwedko commented May 8, 2023

jszwedko commented Nov 12, 2021 •

edited

Loading