-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virtio-net: switch to readv
+ implement MRG_RXBUF
#4813
virtio-net: switch to readv
+ implement MRG_RXBUF
#4813
Conversation
ec7a706
to
1b3acb1
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4813 +/- ##
==========================================
+ Coverage 84.32% 84.42% +0.09%
==========================================
Files 249 252 +3
Lines 27512 27794 +282
==========================================
+ Hits 23200 23465 +265
- Misses 4312 4329 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
readv
+ implement MRG_RXBUF
1b3acb1
to
d9a884f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some usage of unsafe
is not sound :( I'd love to run miri over it, but it doesn't support memfd_create
😭
/// This saves 8 bytes compared to `VecDequeue` in the standard library. | ||
/// (24 bytes vs 32 bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really care this much about these 8 bytes? If I'm reading the code correctly, we only use a single one of these over the entire lifetime of the Firecracker process. If that's the case, then I'd prefer using an established stdlib type over saving 8 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially RxBuffer
type was 88 bytes and I wanted to make is smaller. Now it is 56 bytes, so maybe using VecDeque
now (so adding 8 bytes back) will not be that bad. But an additional benefit of this type is that it is fixed in buffer size, so we will never reallocate (so it is like an additional guarantee we can have)
855f872
to
052631f
Compare
Add new `IovRingBuffer` ring buffer type that is tailored for holding `struct iovec` objects that point to guest memory for IO. The `struct iovec` objects represent the memory that the guest passed to us as `Descriptors` in a VirtIO queue for performing some I/O operation. We plan to use this type to describe the guest memory we have available for doing network RX. This should facilitate us in optimizing the reception of data from the TAP device using `readv`, thus avoiding a memory copy. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add generic fixed size ring buffer type. The main difference from a standard `VecDequeue` is the use of `u32` as indexes. This shrinks the size of the ring buffer from 32 bytes for `VecDequeue` to 24 bytes for `RingBuffer`. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Split `add_used` internals into 2 new functions: `write_used_element` and `advance_used_ring`. This will be used in next commits to optimize RX path of net device. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
This type will be used in the next commit where we will switch to use `readv` for RX path. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Right now, we are performing two copies for writing a frame from the TAP device into guest memory: - read the frame into an internal array held by the Net device - copy that array into a buffers of a DescriptorChain. In order to avoid this double copy we can use the readv system call to read directly from the TAP device into the buffers described by DescriptorChain. The main challenge with this is that DescriptorChain objects describe memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO are enabled or 1526 otherwise and parsing the chain includes overhead which we pay even if the frame we are receiving is much smaller than these sizes. PR firecracker-microvm#4748 reduced the overheads involved with parsing DescriptorChain objects. To further avoid this overhead, move the parsing of DescriptorChain objects out of the hot path of process_rx() where we are actually receiving a frame into process_rx_queue_event() where we get the notification that the guest added new buffers for network RX. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Before `readv` change, `rx_deferred_frame` was used to signal if there was present packet in the internal rx buffer, so we would try to send it first before attempting to read new one. With `readv` we don't use internal buffer for RX packets anymore, so the logic of `deferred` packet changed. Now any packet is `deferred` when it is written to the guest memory. After the write we try to notify guest about it, but only if rate limiter allows it. To track this we now use `deferred_rx_bytes` which holds the number of bytes we yet to notify guest about. We can conveniently use `Option<NonZeroUsize>` for this where `Some(...)` will mean there are some `deferred` bytes we yet to notify guest about and `None` will mean we are done and can read the next packet. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Now virtio-net device VIRTIO_NET_F_MRG_RXBUF feature which allows it to write single packet into multiple descriptor chains. The amount of descriptor chains (also known as heads) is written into the `virtio_net_hdr_v1` structure which is located at the very begging of the packet. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Fix retransmission issues introduced by enabling MRG_RXBUF. Do this by tracking total capacity of the `RxBuffer` struct and only return an iov slice if it contains at least 64K bytes. Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
052631f
to
251417f
Compare
Changes
Switch virtio-net device to use
readv
to read from a tap device and implementMRG_RXBUF
flag.Reason
Improve performance of the network device.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
PR.
CHANGELOG.md
.TODO
s link to an issue.contribution quality standards.
rust-vmm
.