Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use readv for the RX path of the network device to avoid one memory copy per frame #4799

Merged
merged 5 commits into from
Oct 7, 2024

Conversation

bchalios
Copy link
Contributor

@bchalios bchalios commented Sep 12, 2024

Changes

Change the network device emulation to avoid the intermediate memory copy from the TAP device to Firecracker memory buffer. To achieve this, we use readv system call to read from the TAP device directly in guest memory.

Reason

Currently, on the RX path of our network device we perform two memory copies for receiving a network frame from the TAP device to the guest driver. We first read the frame from the TAP device inside a memory buffer maintained by Firecracker. Next, we copy the frame from the buffer inside guest memory. We need the intermediate buffer because, typically, the guest side buffer is scattered across the guest memory.

We had tried to implement this optimization before but we observed performance regression which was due to overhead related with parsing of guest memory buffers (aka DescriptorChain objects).

Recently, we merged a few improvements in the logic that parses DescriptorChain objects from VirtIO queues (#4723, #4748) which allow us to parse descriptor chains faster, so the overheads we experienced before reduce significantly.
Moreover, we apply a further optimization, where we move the parsing of DescriptorChain objects outside the hot-path.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@bchalios bchalios force-pushed the net_rx_readv branch 3 times, most recently from 266551b to f98af8a Compare September 12, 2024 15:01
Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 93.54005% with 25 lines in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (da68f07) to head (7ce0211).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/net/persist.rs 50.00% 8 Missing ⚠️
src/vmm/src/devices/virtio/net/device.rs 94.85% 7 Missing ⚠️
src/vmm/src/devices/virtio/iovec.rs 91.52% 5 Missing ⚠️
src/vmm/src/devices/virtio/iov_deque.rs 97.40% 4 Missing ⚠️
src/vmm/src/devices/virtio/vsock/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4799      +/-   ##
==========================================
+ Coverage   84.35%   84.42%   +0.06%     
==========================================
  Files         249      250       +1     
  Lines       27505    27749     +244     
==========================================
+ Hits        23202    23427     +225     
- Misses       4303     4322      +19     
Flag Coverage Δ
5.10-c5n.metal 84.66% <93.54%> (+0.07%) ⬆️
5.10-m5n.metal 84.64% <93.54%> (+0.06%) ⬆️
5.10-m6a.metal 83.94% <93.54%> (+0.07%) ⬆️
5.10-m6g.metal 81.03% <93.54%> (+0.11%) ⬆️
5.10-m6i.metal 84.64% <93.54%> (+0.06%) ⬆️
5.10-m7g.metal 81.03% <93.54%> (+0.11%) ⬆️
6.1-c5n.metal 84.66% <93.54%> (+0.07%) ⬆️
6.1-m5n.metal 84.64% <93.54%> (+0.06%) ⬆️
6.1-m6a.metal 83.94% <93.54%> (+0.07%) ⬆️
6.1-m6g.metal 81.03% <93.54%> (+0.11%) ⬆️
6.1-m6i.metal 84.64% <93.54%> (+0.07%) ⬆️
6.1-m7g.metal 81.03% <93.54%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bchalios bchalios force-pushed the net_rx_readv branch 4 times, most recently from 23551b8 to 424b001 Compare September 13, 2024 08:17
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/rng/device.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/rng/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/rng/device.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/tap.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
@bchalios bchalios force-pushed the net_rx_readv branch 4 times, most recently from 98b3278 to 40d7077 Compare September 30, 2024 14:02
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only made it half-way through commit 2 before my lunch break, but dumping these comments already in case they are helpful

src/vmm/src/devices/virtio/iov_deque.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iov_deque.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/iovec.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/tap.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/device.rs Outdated Show resolved Hide resolved
@bchalios bchalios force-pushed the net_rx_readv branch 2 times, most recently from 17c443c to 65010cf Compare October 3, 2024 19:02
@bchalios bchalios changed the title WIP: Use readv for the RX path of the network device to avoid one memory copy per frame Use readv for the RX path of the network device to avoid one memory copy per frame Oct 3, 2024
@bchalios bchalios force-pushed the net_rx_readv branch 5 times, most recently from 2b61f2e to c46d8e1 Compare October 4, 2024 10:05
ShadowCurse
ShadowCurse previously approved these changes Oct 4, 2024
bchalios and others added 5 commits October 7, 2024 10:46
Add a ring buffer type that is tailored for holding `struct iovec`
objects that point to guest memory for IO. The `struct iovec` objects
represent the memory that the guest passed to us as `Descriptors` in a
VirtIO queue for performing some I/O operation.

We plan to use this type to describe the guest memory we have available
for doing network RX. This should facilitate us in optimizing the
reception of data from the TAP device using `readv`, thus avoiding a
memory copy.

Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Allow IoVecBufferMut objects to store multiple DescriptorChain objects,
so that we can describe guest memory meant to be used for receiving data
(for example memory used for network RX) as a single (sparse) memory
region.

This will allow us to always keep track all the available memory we have
for performing RX and use `readv` for copying memory from the TAP device
inside guest memory avoiding the extra copy. In the future, it will also
facilitate the implementation of mergeable buffers for the RX path of
the network device.

Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Right now, we are performing two copies for writing a frame from the TAP
device into guest memory. We first read the frame in an array held by
the Net device and then copy that array in a DescriptorChain.

In order to avoid the double copy use the readv system call to read
directly from the TAP device into the buffers described by
DescriptorChain.

The main challenge with this is that DescriptorChain objects describe
memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO
are enabled or 1526 otherwise and parsing the chain includes overhead
which we pay even if the frame we are receiving is much smaller than
these sizes.

PR firecracker-microvm#4748 reduced
the overheads involved with parsing DescriptorChain objects. To further
avoid this overhead, move the parsing of DescriptorChain objects out of
the hot path of process_rx() where we are actually receiving a frame
into process_rx_queue_event() where we get the notification that the
guest added new buffers for network RX.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Now, that we pre-process the buffers that guest provides for performing
RX, we need to save them in the VM state snapshot file, for networking
to work correctly post snapshot resume.

Implement Persist for RxBuffers and and plug them in the
(de)serialization logic of the network device.

Co-authored-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
IoVecBufferMut type now uses IovDeque as its backing memory. IovDeque is
performing a custom memory allocation, using memfd_create() and a
combination of mmap() calls in order to provide a memory layout where
the iovec objects stored in the IovDeque will always be in consecutive
memory.

kani doesn't really get along with these system calls, which breaks our
proof for IoVecBufferMut::write_volatile_at. Substitute memory
allocation and deallocation with plain calls to std::alloc::(de)alloc
when we run kani proofs. Also provide a stub for IovDeque::push_back to
provide the same memory layout invariants.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@bchalios bchalios merged commit bc0ba43 into firecracker-microvm:main Oct 7, 2024
6 of 7 checks passed
bchalios added a commit to bchalios/firecracker that referenced this pull request Oct 8, 2024
The changes introduced in IoVecBufferMut that allowed implementing readv
support for virtio-net regressed the performance of vsock device. This
is because now, when we create IoVecBufferMut objects we do a bunch (1
memfd_create and 3 mmap) of system calls. virtio-net device avoids the
performance issue because it creates a single IoVecBufferMut and re-uses
it, whereas vsock is creating a new IoVecBufferMut object for every
packet it receives.

We have a fix for this, which essentially creates a single
IoVecBufferMut that reuses for all the incoming vsock packets during a
connection. The problem with the fix is that it makes unit-tests really
unhappy and we need a significant amount of work to fix them.

So, revert the PR to have main in a clean state. We will fix the vsock
issues out-of-band and re-open the PR.

This reverts commits:
- bc0ba43
- 667aba4
- 5d718d4
- 14e6e33
- 1e4c632

from PR firecracker-microvm#4799

Signed-off-by: Babis Chalios <bchalios@amazon.es>
bchalios added a commit to bchalios/firecracker that referenced this pull request Oct 8, 2024
The changes introduced in IoVecBufferMut that allowed implementing readv
support for virtio-net regressed the performance of vsock device. This
is because now, when we create IoVecBufferMut objects we do a bunch (1
memfd_create and 3 mmap) of system calls. virtio-net device avoids the
performance issue because it creates a single IoVecBufferMut and re-uses
it, whereas vsock is creating a new IoVecBufferMut object for every
packet it receives.

We have a fix for this, which essentially creates a single
IoVecBufferMut that reuses for all the incoming vsock packets during a
connection. The problem with the fix is that it makes unit-tests really
unhappy and we need a significant amount of work to fix them.

So, revert the PR to have main in a clean state. We will fix the vsock
issues out-of-band and re-open the PR.

This reverts commits:
- bc0ba43
- 667aba4
- 5d718d4
- 14e6e33
- 1e4c632

from PR firecracker-microvm#4799

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants