Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-net: switch to readv + implement MRG_RXBUF #4813

Closed

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented Sep 19, 2024

Changes

Switch virtio-net device to use readv to read from a tap device and implement MRG_RXBUF flag.

Reason

Improve performance of the network device.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this Sep 19, 2024
@ShadowCurse ShadowCurse force-pushed the readv_mrg_rx_buf branch 6 times, most recently from ec7a706 to 1b3acb1 Compare September 20, 2024 15:07
Copy link

codecov bot commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 95.90361% with 17 lines in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (3acf37d) to head (d9a884f).
Report is 16 commits behind head on main.

Current head d9a884f differs from pull request most recent head 251417f

Please upload reports for the commit 251417f to get more accurate results.

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/net/persist.rs 12.50% 7 Missing ⚠️
src/vmm/src/devices/virtio/net/device.rs 92.95% 5 Missing ⚠️
src/vmm/src/devices/virtio/net/rx_buffer.rs 97.58% 3 Missing ⚠️
src/vmm/src/devices/virtio/iov_ring_buffer.rs 98.52% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4813      +/-   ##
==========================================
+ Coverage   84.32%   84.42%   +0.09%     
==========================================
  Files         249      252       +3     
  Lines       27512    27794     +282     
==========================================
+ Hits        23200    23465     +265     
- Misses       4312     4329      +17     
Flag Coverage Δ
5.10-c5n.metal ?
5.10-m5n.metal ?
5.10-m6a.metal 83.97% <95.90%> (+0.15%) ⬆️
5.10-m6g.metal ?
5.10-m6i.metal ?
5.10-m7g.metal ?
6.1-c5n.metal ?
6.1-m5n.metal ?
6.1-m6a.metal 83.96% <95.90%> (+0.13%) ⬆️
6.1-m6g.metal ?
6.1-m6i.metal 84.65% <95.90%> (+0.12%) ⬆️
6.1-m7g.metal 81.08% <95.90%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ShadowCurse ShadowCurse marked this pull request as ready for review September 20, 2024 15:38
@ShadowCurse ShadowCurse changed the title readv + MRG_RXBUF virtio-net: switch to readv + implement MRG_RXBUF Sep 20, 2024
@ShadowCurse ShadowCurse added Type: Enhancement Indicates new feature requests Type: Performance labels Sep 20, 2024
@ShadowCurse ShadowCurse added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Sep 23, 2024
Copy link
Contributor

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some usage of unsafe is not sound :( I'd love to run miri over it, but it doesn't support memfd_create 😭

src/vmm/src/devices/virtio/net/tap.rs Show resolved Hide resolved
src/vmm/src/devices/virtio/net/test_utils.rs Outdated Show resolved Hide resolved
Comment on lines +9 to +10
/// This saves 8 bytes compared to `VecDequeue` in the standard library.
/// (24 bytes vs 32 bytes)
Copy link
Contributor

@roypat roypat Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care this much about these 8 bytes? If I'm reading the code correctly, we only use a single one of these over the entire lifetime of the Firecracker process. If that's the case, then I'd prefer using an established stdlib type over saving 8 bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially RxBuffer type was 88 bytes and I wanted to make is smaller. Now it is 56 bytes, so maybe using VecDeque now (so adding 8 bytes back) will not be that bad. But an additional benefit of this type is that it is fixed in buffer size, so we will never reallocate (so it is like an additional guarantee we can have)

src/vmm/src/devices/virtio/net/rx_buffer.rs Outdated Show resolved Hide resolved
src/vmm/src/devices/virtio/net/rx_buffer.rs Outdated Show resolved Hide resolved
@ShadowCurse ShadowCurse force-pushed the readv_mrg_rx_buf branch 6 times, most recently from 855f872 to 052631f Compare September 25, 2024 16:19
ShadowCurse and others added 8 commits September 26, 2024 10:16
Add new `IovRingBuffer` ring buffer type that is tailored for holding
`struct iovec` objects that point to guest memory for IO. The
`struct iovec` objects represent the memory that the guest passed to
us as `Descriptors` in a VirtIO queue for performing some I/O operation.

We plan to use this type to describe the guest memory we have available
for doing network RX. This should facilitate us in optimizing the
reception of data from the TAP device using `readv`, thus avoiding a
memory copy.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add generic fixed size ring buffer type. The main
difference from a standard `VecDequeue` is the use of
`u32` as indexes. This shrinks the size of the ring buffer
from 32 bytes for `VecDequeue` to 24 bytes for `RingBuffer`.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Split `add_used` internals into 2 new functions: `write_used_element`
and `advance_used_ring`. This will be used in next commits to optimize
RX path of net device.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
This type will be used in the next commit
where we will switch to use `readv` for RX path.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Right now, we are performing two copies for writing a frame from the TAP
device into guest memory:
- read the frame into an internal array held by the Net device
- copy that array into a buffers of a DescriptorChain.

In order to avoid this double copy we can use the readv system call to
read directly from the TAP device into the buffers described by
DescriptorChain.

The main challenge with this is that DescriptorChain objects describe
memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO
are enabled or 1526 otherwise and parsing the chain includes overhead
which we pay even if the frame we are receiving is much smaller than
these sizes.

PR firecracker-microvm#4748 reduced
the overheads involved with parsing DescriptorChain objects. To further
avoid this overhead, move the parsing of DescriptorChain objects out of
the hot path of process_rx() where we are actually receiving a frame
into process_rx_queue_event() where we get the notification that the
guest added new buffers for network RX.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Before `readv` change, `rx_deferred_frame` was used
to signal if there was present packet in the internal
rx buffer, so we would try to send it first before
attempting to read new one. With `readv` we don't
use internal buffer for RX packets anymore, so the
logic of `deferred` packet changed. Now any packet
is `deferred` when it is written to the guest memory.
After the write we try to notify guest about it, but
only if rate limiter allows it. To track this we
now use `deferred_rx_bytes` which holds the number of
bytes we yet to notify guest about. We can conveniently
use `Option<NonZeroUsize>` for this where `Some(...)`
will mean there are some `deferred` bytes we yet to
notify guest about and `None` will mean we are done and
can read the next packet.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Now virtio-net device VIRTIO_NET_F_MRG_RXBUF feature which allows it
to write single packet into multiple descriptor chains.
The amount of descriptor chains (also known as heads) is written into
the `virtio_net_hdr_v1` structure which is located at the very begging
of the packet.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Fix retransmission issues introduced by enabling
MRG_RXBUF. Do this by tracking total capacity of the
`RxBuffer` struct and only return an iov slice if
it contains at least 64K bytes.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse
Copy link
Contributor Author

Closing in favor of #4799 and #4834

@ShadowCurse ShadowCurse closed this Oct 4, 2024
@ShadowCurse ShadowCurse deleted the readv_mrg_rx_buf branch October 29, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests Type: Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants