-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use readv
for the RX path of the network device to avoid one memory copy per frame
#4799
Conversation
266551b
to
f98af8a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4799 +/- ##
==========================================
+ Coverage 84.35% 84.42% +0.06%
==========================================
Files 249 250 +1
Lines 27505 27749 +244
==========================================
+ Hits 23202 23427 +225
- Misses 4303 4322 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
23551b8
to
424b001
Compare
a55e4c8
to
afbfb02
Compare
98b3278
to
40d7077
Compare
40d7077
to
36acc34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only made it half-way through commit 2 before my lunch break, but dumping these comments already in case they are helpful
17c443c
to
65010cf
Compare
readv
for the RX path of the network device to avoid one memory copy per framereadv
for the RX path of the network device to avoid one memory copy per frame
2b61f2e
to
c46d8e1
Compare
Add a ring buffer type that is tailored for holding `struct iovec` objects that point to guest memory for IO. The `struct iovec` objects represent the memory that the guest passed to us as `Descriptors` in a VirtIO queue for performing some I/O operation. We plan to use this type to describe the guest memory we have available for doing network RX. This should facilitate us in optimizing the reception of data from the TAP device using `readv`, thus avoiding a memory copy. Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es>
Allow IoVecBufferMut objects to store multiple DescriptorChain objects, so that we can describe guest memory meant to be used for receiving data (for example memory used for network RX) as a single (sparse) memory region. This will allow us to always keep track all the available memory we have for performing RX and use `readv` for copying memory from the TAP device inside guest memory avoiding the extra copy. In the future, it will also facilitate the implementation of mergeable buffers for the RX path of the network device. Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es>
Right now, we are performing two copies for writing a frame from the TAP device into guest memory. We first read the frame in an array held by the Net device and then copy that array in a DescriptorChain. In order to avoid the double copy use the readv system call to read directly from the TAP device into the buffers described by DescriptorChain. The main challenge with this is that DescriptorChain objects describe memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO are enabled or 1526 otherwise and parsing the chain includes overhead which we pay even if the frame we are receiving is much smaller than these sizes. PR firecracker-microvm#4748 reduced the overheads involved with parsing DescriptorChain objects. To further avoid this overhead, move the parsing of DescriptorChain objects out of the hot path of process_rx() where we are actually receiving a frame into process_rx_queue_event() where we get the notification that the guest added new buffers for network RX. Signed-off-by: Babis Chalios <bchalios@amazon.es>
Now, that we pre-process the buffers that guest provides for performing RX, we need to save them in the VM state snapshot file, for networking to work correctly post snapshot resume. Implement Persist for RxBuffers and and plug them in the (de)serialization logic of the network device. Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es>
IoVecBufferMut type now uses IovDeque as its backing memory. IovDeque is performing a custom memory allocation, using memfd_create() and a combination of mmap() calls in order to provide a memory layout where the iovec objects stored in the IovDeque will always be in consecutive memory. kani doesn't really get along with these system calls, which breaks our proof for IoVecBufferMut::write_volatile_at. Substitute memory allocation and deallocation with plain calls to std::alloc::(de)alloc when we run kani proofs. Also provide a stub for IovDeque::push_back to provide the same memory layout invariants. Signed-off-by: Babis Chalios <bchalios@amazon.es>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
The changes introduced in IoVecBufferMut that allowed implementing readv support for virtio-net regressed the performance of vsock device. This is because now, when we create IoVecBufferMut objects we do a bunch (1 memfd_create and 3 mmap) of system calls. virtio-net device avoids the performance issue because it creates a single IoVecBufferMut and re-uses it, whereas vsock is creating a new IoVecBufferMut object for every packet it receives. We have a fix for this, which essentially creates a single IoVecBufferMut that reuses for all the incoming vsock packets during a connection. The problem with the fix is that it makes unit-tests really unhappy and we need a significant amount of work to fix them. So, revert the PR to have main in a clean state. We will fix the vsock issues out-of-band and re-open the PR. This reverts commits: - bc0ba43 - 667aba4 - 5d718d4 - 14e6e33 - 1e4c632 from PR firecracker-microvm#4799 Signed-off-by: Babis Chalios <bchalios@amazon.es>
The changes introduced in IoVecBufferMut that allowed implementing readv support for virtio-net regressed the performance of vsock device. This is because now, when we create IoVecBufferMut objects we do a bunch (1 memfd_create and 3 mmap) of system calls. virtio-net device avoids the performance issue because it creates a single IoVecBufferMut and re-uses it, whereas vsock is creating a new IoVecBufferMut object for every packet it receives. We have a fix for this, which essentially creates a single IoVecBufferMut that reuses for all the incoming vsock packets during a connection. The problem with the fix is that it makes unit-tests really unhappy and we need a significant amount of work to fix them. So, revert the PR to have main in a clean state. We will fix the vsock issues out-of-band and re-open the PR. This reverts commits: - bc0ba43 - 667aba4 - 5d718d4 - 14e6e33 - 1e4c632 from PR firecracker-microvm#4799 Signed-off-by: Babis Chalios <bchalios@amazon.es>
Changes
Change the network device emulation to avoid the intermediate memory copy from the TAP device to Firecracker memory buffer. To achieve this, we use
readv
system call to read from the TAP device directly in guest memory.Reason
Currently, on the RX path of our network device we perform two memory copies for receiving a network frame from the TAP device to the guest driver. We first read the frame from the TAP device inside a memory buffer maintained by Firecracker. Next, we copy the frame from the buffer inside guest memory. We need the intermediate buffer because, typically, the guest side buffer is scattered across the guest memory.
We had tried to implement this optimization before but we observed performance regression which was due to overhead related with parsing of guest memory buffers (aka
DescriptorChain
objects).Recently, we merged a few improvements in the logic that parses
DescriptorChain
objects from VirtIO queues (#4723, #4748) which allow us to parse descriptor chains faster, so the overheads we experienced before reduce significantly.Moreover, we apply a further optimization, where we move the parsing of
DescriptorChain
objects outside the hot-path.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
PR.
CHANGELOG.md
.TODO
s link to an issue.contribution quality standards.
rust-vmm
.