Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stream buffering in report collector #1504

Merged
merged 3 commits into from
Jan 9, 2025

Conversation

akoshelev
Copy link
Collaborator

Continuation of #1501, we want to avoid submitting reports one by one from RC to each individual shard. That likely leads to fragmentation and we've been observing slow execution on the client side.

This sets up the buffer size to be divisible by TCP MSS, but I don't have any real evidence that this is going to work well. We would need to experiment with it

Continuation of private-attribution#1501, we want to avoid submitting reports one by one from RC to each individual shard. That likely leads to fragmentation and we've been observing slow execution on the client side.

This sets up the buffer size to be divisible by TCP MSS, but I don't have any real evidence that this is going to work well. We would need to experiment with it
@@ -430,6 +431,9 @@ async fn hybrid(
count: usize,
set_fixed_polling_ms: Option<u64>,
) -> Result<(), Box<dyn Error>> {
// twice the size of TCP MSS. This may get messed up if TCP options are used which is not
// in our control, but hopefully fragmentation is not too bad
const BUF_SIZE: NonZeroUsize = NonZeroUsize::new(1072).unwrap();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyleiserson do you have any suggestion on picking up the buffer size?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokio and std buffered I/O helpers use 8 kB, so maybe use that? I don't think it's necessary to try and match this to TCP MSS, I might not even expose the buffer size from BufferedBytesStream until we identify a need to tune it for individual uses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's important is that the kernel has at least a full packet's worth of data, so that it can send full packets.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 96.15385% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.10%. Comparing base (c45ef9b) to head (8b953ab).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ipa-core/src/cli/playbook/streaming.rs 97.02% 3 Missing ⚠️
ipa-core/src/bin/report_collector.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1504      +/-   ##
==========================================
+ Coverage   92.87%   93.10%   +0.22%     
==========================================
  Files         242      242              
  Lines       44177    44269      +92     
==========================================
+ Hits        41031    41217     +186     
+ Misses       3146     3052      -94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -430,6 +431,9 @@ async fn hybrid(
count: usize,
set_fixed_polling_ms: Option<u64>,
) -> Result<(), Box<dyn Error>> {
// twice the size of TCP MSS. This may get messed up if TCP options are used which is not
// in our control, but hopefully fragmentation is not too bad
const BUF_SIZE: NonZeroUsize = NonZeroUsize::new(1072).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokio and std buffered I/O helpers use 8 kB, so maybe use that? I don't think it's necessary to try and match this to TCP MSS, I might not even expose the buffer size from BufferedBytesStream until we identify a need to tune it for individual uses.

@akoshelev
Copy link
Collaborator Author

I don't see any difference in performance on 1B records. I think the bottleneck is on the upload, so I'll hold off merging this

@akoshelev
Copy link
Collaborator Author

I intend to merge it - there is nothing bad about this change and buffered writes from RC can still be useful for submitting medium-sized inputs

@akoshelev akoshelev merged commit 837cdb4 into private-attribution:main Jan 9, 2025
12 checks passed
@akoshelev akoshelev deleted the buffered-rc branch January 9, 2025 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants