Create a buffered wrapper around BytesStream #1501

akoshelev · 2024-12-17T01:07:46Z

The need for it is driven by the behavior we're observing from Report Collector sending bytes down to individual shards. It writes data as it becomes available and Hyper does not accumulate it before sending. On the receiver side we are seeing chunks of size 1 received and that creates thrashing on sender/receiver side.

This change paves the path to use buffering on RC side.

The need for it is driven by the behavior we're observing from Report Collector sending bytes down to individual shards. It writes data as it becomes available and Hyper does not accumulate it before sending. On the receiver side we are seeing chunks of size 1 received and that creates thrashing on sender/receiver side. This change paves the path to use buffering on RC side.

codecov · 2024-12-17T01:36:30Z

Codecov Report

Attention: Patch coverage is 98.03922% with 3 lines in your changes missing coverage. Please review.

Project coverage is 93.24%. Comparing base (69f0a5d) to head (0b8254f).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
ipa-core/src/helpers/transport/stream/buffered.rs	98.03%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1501      +/-   ##
==========================================
+ Coverage   93.02%   93.24%   +0.22%     
==========================================
  Files         237      238       +1     
  Lines       43535    43688     +153     
==========================================
+ Hits        40498    40739     +241     
+ Misses       3037     2949      -88

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eriktaubeneck · 2024-12-17T01:35:48Z

ipa-core/src/helpers/transport/stream/buffered.rs

+            // verify_success(infallible_stream(12, 5), 12).await;
+            // verify_success(infallible_stream(12, 12), 12).await;
+            // verify_success(infallible_stream(24, 12), 12).await;
+            // verify_success(infallible_stream(24, 12), 1).await;


nit: clean these up

good catch, I want to get them back

cberkhoff · 2024-12-17T02:11:50Z

ipa-core/src/helpers/transport/stream/buffered.rs

+/// done. This may need to be used when writing into HTTP streams as Hyper
+/// does not provide any buffering functionality and we turn NODELAY on
+#[pin_project]
+pub struct BufferedBytesStream<S> {


a few nits

To me BufferedBytesStream doesn't tell why this stream is special. I think the key thing is the poll with is being "chunked". I would rather call this BufferedChunkedStream

why not have the trait bound S: BytesStream here as well? Thinking more about documentation/readability

Bounds need to be repeated everywhere if put on the struct - generally we try to avoid that if that's not necessary

this does not attempt to chunk the inner stream, the whole purpose of it is to accumulate enough bytes (buffer) before sending them down for processing.

But doesn't the output get chunked? This adapter is buffering and chunking... another name would be paginating.

I still think BufferedBytesStream is vague. My 2 cents.

cberkhoff · 2024-12-17T02:16:55Z

ipa-core/src/helpers/transport/stream/buffered.rs

+    /// Number of bytes released per single poll.
+    /// All items except the last one are guaranteed to have
+    /// exactly this number of bytes written to them.
+    sz: usize,


nit: chunk_size?

ipa-core/src/helpers/transport/stream/buffered.rs

Continuation of private-attribution#1501, we want to avoid submitting reports one by one from RC to each individual shard. That likely leads to fragmentation and we've been observing slow execution on the client side. This sets up the buffer size to be divisible by TCP MSS, but I don't have any real evidence that this is going to work well. We would need to experiment with it

* Use stream buffering in report collector Continuation of #1501, we want to avoid submitting reports one by one from RC to each individual shard. That likely leads to fragmentation and we've been observing slow execution on the client side. This sets up the buffer size to be divisible by TCP MSS, but I don't have any real evidence that this is going to work well. We would need to experiment with it * Use 8Kb buffers for streaming inside report collector

eriktaubeneck approved these changes Dec 17, 2024

View reviewed changes

Uncomment the code

9a0cfc8

cberkhoff approved these changes Dec 17, 2024

View reviewed changes

Feedback

0b8254f

akoshelev merged commit f148eac into private-attribution:main Dec 17, 2024
12 checks passed

akoshelev mentioned this pull request Dec 17, 2024

Use stream buffering in report collector #1504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a buffered wrapper around BytesStream #1501

Create a buffered wrapper around BytesStream #1501

akoshelev commented Dec 17, 2024

codecov bot commented Dec 17, 2024 •

edited

Loading

eriktaubeneck Dec 17, 2024

akoshelev Dec 17, 2024

cberkhoff Dec 17, 2024

akoshelev Dec 17, 2024

akoshelev Dec 17, 2024

cberkhoff Dec 17, 2024

cberkhoff Dec 17, 2024

Create a buffered wrapper around BytesStream #1501

Create a buffered wrapper around BytesStream #1501

Conversation

akoshelev commented Dec 17, 2024

codecov bot commented Dec 17, 2024 • edited Loading

Codecov Report

eriktaubeneck Dec 17, 2024

Choose a reason for hiding this comment

akoshelev Dec 17, 2024

Choose a reason for hiding this comment

cberkhoff Dec 17, 2024

Choose a reason for hiding this comment

akoshelev Dec 17, 2024

Choose a reason for hiding this comment

akoshelev Dec 17, 2024

Choose a reason for hiding this comment

cberkhoff Dec 17, 2024

Choose a reason for hiding this comment

cberkhoff Dec 17, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 17, 2024 •

edited

Loading