Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SO_ZEROCOPY and MSG_ZEROCOPY support for UDP and TCP #1720

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

davidBar-On
Copy link
Contributor

Add support for zero-copy for both UDP and TCP, using SO_ZEROCOPY/MSG_ZEROCOPY. Although it is not clear when and in what environments the added functionality improve performance, support was added to allow testing this functionality. The assumption is that different environments may have different levels of support for this functionality.
(Note that running bootstrap.sh; configure is required before make to support the new features.)

The added support is for MSG_ZEROCOPY. When used, socket option SO_ZEROCOPY is set and send(...., MSG_ZEROCOPY) is used instead of write(). MSG_ZEROCOPY is used in the following cases:

  • UDP: when -Z/--zerocopy option is set.
  • TCP: when --zerocopy=z is set. Otherwise, sendfile() continue to be used for TCP zero copy.

This PR is a rework of the original suggested support in #1690, as it was too naive and did not include support for notifications.

The main implementation is based on the msg_zerocopy.c example for supporting MSG_ZEROCOPY.

Note that per the Zero-copy networking discussion about the patch that added MSG_ZEROCOPY support, the expected performance improvement is expected to be less than 10% and only for large packets. Therefor, it would be helpful if this PR will be tested on different environments to see whether and when it improves performance.

@davidBar-On
Copy link
Contributor Author

Closing, as it is not clear that this functionality is valuable, as I wrote in the description. (Code is still available for testing or coming up with enhanced solution.)

@marcosfsch
Copy link
Contributor

marcosfsch commented Sep 18, 2024

MSG_ZEROCOPY with MTU 9K and optmem_max 1MB does improve performance a lot for large TCP flows, specially when used with pacing (up to 35%, from 37G to 50G). Brian Tiernery and I, made the tests and the results can be checker in [1, 2]. This results were run using an older patch [3], this current patch seems to have some big performance regressions.
I'll run new UDP tests to provide evidence if there are improvements and I hope that this patch can be upstreamed.

[1] https://github.com/marcosfsch/INDIS-2024/blob/main/AmLight/ubuntu22-6.8/summary.txt
[2] https://github.com/marcosfsch/INDIS-2024/blob/main/ESnet/ubuntu22-6.8/summary.txt
[3] #1690

@marcosfsch
Copy link
Contributor

marcosfsch commented Sep 18, 2024

For UDP, this patch is making performance worse, from 36G to 5.5G. But using a previous MSG_ZEROCOPY patch [1] I was able to get 18% better performance for UDP. From 36G to 42G for single UDP flow.

[1] #1690

@bltierney
Copy link
Contributor

Also, to get improvements with MSG_ZEROCOPY, you have to set this in /etc/sysctl.conf:

net.core.optmem_max = 1048576

But we are finding MSG_ZEROCOPY helps a lot on 100G hosts.

@davidBar-On
Copy link
Contributor Author

OK. Thanks for the input. @bmah888, can you please re-open this PR?

@bltierney
Copy link
Contributor

I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.

@davidBar-On
Copy link
Contributor Author

I should also mention that we were using your original patch 1690, not patch 1720. Seems like this should not matter, but thought I should mention that.

That is interesting to know. As far as I remember that should not matter. The reason I split #1960 to this PR and #1717 (SKIP-RX-COPY) is that per my internal tests I assumed there is more chance that SKIP-RX-COPY will be merged into the mainline. Per this discussion I now understand this may not be the case, but still I assume it is easier to merge less changes per each PT than merge all.

@davidBar-On davidBar-On reopened this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants