Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Faster UDP/IO on Apple platforms #1993

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

larseggert
Copy link
Contributor

@larseggert larseggert commented Sep 20, 2024

This uses Apple's private sendmsg_x and recvmsg_x system calls for multi-packet UDP I/O.

CC @mxinden

@Ralith
Copy link
Collaborator

Ralith commented Sep 20, 2024

Is there interest in seeing TX support via sendmsg_x?

We found there wasn't much performance benefit, and was considerable difficulty taking advantage of, sendmmsg-style batching. IIRC the _x functions on macOS have more to offer than that, though. Will this unblock segmentation offload or other incidental optimizations?

quinn-udp/build.rs Outdated Show resolved Hide resolved
@larseggert
Copy link
Contributor Author

Bench on main:

test large_data_10_streams  ... bench:  27,558,791 ns/iter (+/- 13,459,810) = 380 MB/s
test large_data_1_stream    ... bench:  24,324,266 ns/iter (+/- 19,219,937) = 43 MB/s
test small_data_100_streams ... bench:  19,437,900 ns/iter (+/- 20,065,941)
test small_data_1_stream    ... bench:  11,465,128 ns/iter (+/- 8,699,934)

Bench with this PR:

test large_data_10_streams  ... bench:  28,829,216 ns/iter (+/- 15,924,956) = 363 MB/s
test large_data_1_stream    ... bench:  14,354,999 ns/iter (+/- 20,039,122) = 73 MB/s
test small_data_100_streams ... bench:  14,061,741 ns/iter (+/- 17,311,517)
test small_data_1_stream    ... bench:  19,194,441 ns/iter (+/- 5,012,070)

Surprised that large_data_10_streams and small_data_1_stream are slower...

@Ralith
Copy link
Collaborator

Ralith commented Sep 23, 2024

Those tests tend to be extremely noisy, as the huge variance suggests. A targeted quinn-udp benchmark might be more useful.

@larseggert
Copy link
Contributor Author

We've also found on neqo that multi-packet RX without multi-packet TX has limited benefits, since the RX batch size will be very small.

@larseggert
Copy link
Contributor Author

I added sendmsg_x support, mostly to see what the performance difference would be. But it seems that none of the benches or tests call send with a Transmit struct where segment_size is not None?

@larseggert larseggert marked this pull request as ready for review September 23, 2024 09:10
@mxinden
Copy link
Contributor

mxinden commented Sep 23, 2024

A targeted quinn-udp benchmark might be more useful.

How about using the throughput.rs benchmark @larseggert?

https://github.com/quinn-rs/quinn/blob/main/quinn-udp/benches/throughput.rs

@larseggert
Copy link
Contributor Author

larseggert commented Sep 23, 2024

With @mxinden's benchmark. Baseline:

gso_true/throughput     time:   [58.076 ms 58.230 ms 58.387 ms]
                        thrpt:  [171.27 MiB/s 171.73 MiB/s 172.19 MiB/s]

Only sendmsg_x:

gso_true/throughput     time:   [15.143 ms 15.189 ms 15.236 ms]
                        thrpt:  [656.35 MiB/s 658.36 MiB/s 660.37 MiB/s]
                 change:
                        time:   [-74.028% -73.915% -73.808%] (p = 0.00 < 0.05)
                        thrpt:  [+281.80% +283.36% +285.04%]
                        Performance has improved.

Both sendmsg_x and recvmsg_x:

gso_true/throughput     time:   [12.632 ms 12.682 ms 12.731 ms]
                        thrpt:  [785.46 MiB/s 788.53 MiB/s 791.61 MiB/s]
                 change:
                        time:   [-78.321% -78.221% -78.112%] (p = 0.00 < 0.05)
                        thrpt:  [+356.88% +359.16% +361.27%]
                        Performance has improved.

Both sendmsg_x and recvmsg_x with BATCH_SIZE of 64:

gso_true/throughput     time:   [11.640 ms 11.682 ms 11.725 ms]
                        thrpt:  [852.85 MiB/s 856.00 MiB/s 859.07 MiB/s]
                 change:
                        time:   [-80.030% -79.938% -79.844%] (p = 0.00 < 0.05)
                        thrpt:  [+396.13% +398.45% +400.75%]
                        Performance has improved.

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Please squash all of the changes into a single commit?

quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive results. Great to see the MacOS _x syscalls work for QUIC UDP IO.

quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
Ralith
Ralith previously requested changes Sep 23, 2024
Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable real GSO/GRO using these interfaces?

quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
@larseggert
Copy link
Contributor Author

No. They are the equivalent of the mmsg Linux calls. AFAIK Apple doesn't have GSO/GRO via the socket interface.

quinn-udp/src/cmsg/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
@larseggert
Copy link
Contributor Author

Are you waiting on anything from me on this?

Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quinn-udp/benches/throughput.rs will need more changes to still support non-apple platform. @larseggert I believe we will need to either run it multi-threaded, or use some kind of executor, e.g. tokio. I can prepare a commit in the next couple of days. Sorry for missing this in earlier reviews.

Changes itself look good to me.

quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
quinn-udp/benches/throughput.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Show resolved Hide resolved
@mxinden mxinden mentioned this pull request Oct 8, 2024
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
@djc
Copy link
Member

djc commented Oct 9, 2024

@Ralith can you do another round on this one?

@larseggert
Copy link
Contributor Author

larseggert commented Oct 10, 2024

Once @mxinden's fix to the bench is in, I will rebase and squash this PR.

@AndrewDryga
Copy link

AndrewDryga commented Oct 10, 2024

Hey guys 👋, is there any chance Apple won't approve apps that are using those private syscalls in the App Store? They are notorious for doing so and even de-listing apps for using anything "undocumented". See 2.5.1 here: https://developer.apple.com/app-store/review/guidelines/

See one of such cases here: https://9to5mac.com/2019/11/04/electron-app-rejections/

How they will find out? Apple employs automated tools to scan apps for the usage of private APIs. If sendmsg_x and recvmsg_x are detected, the app is at risk of being flagged.

@larseggert
Copy link
Contributor Author

The use of the private syscalls is now behind a non-default feature.

@AndrewDryga
Copy link

AndrewDryga commented Oct 10, 2024

@larseggert should we add a big fat warning saying that if you enable this flag you will violate Apple ToS so it's only should be enabled if app is not distributed via App Store (or notarized for EU)?

@djc
Copy link
Member

djc commented Oct 11, 2024

The cfg flag is set at build time.

I don't understand why this is relevant. Both cargo features and rustc cfgs happen at build time, i.e. not at runtime.

Because cfg flags are set from the command/environment where the end artifact is built, whereas Cargo features can be set by other dependencies which often aren't under your control.

Cargo features are propagated through the dependency graph so that if you add a dependency to your binary that has a feature-enabled dependency on one of your dependencies, that feature now also gets enabled for your project.

Correct. My argument above is, that this isn't a problem here. I argue that if it is safe for one consumer of quinn-udp to use sendmsg_x and recvmsg_x, it is safe for all consumers of quinn-udp within the same binary to use sendmsg_x and recvmsg_x.

Can you think of a scenario where that is not the case?

This can "silently" happen without you noticing, so it's a bit of a footgun.

In this particular case, if one doesn't trust one's dependency to make wise choices on their quinn-udp feature selection, I don't think one should use that dependency in the first place.

I don't think we're going to solve the crates.io supply chain issues here, and I think not everyone is as careful as Mozilla as vetting their dependencies (and any updates happening to them over time). As such, IMO it is a best practice that additional functionality enabled via features should additionally be toggled by API choices. On the other hand, (a) I'm not sure how ergonomic that would be in this case, (b) I don't like the ergonomics of cfg as an alternative.

I am happy for it to be opt-out.

I suggest for it to be opt-in for now, given that we have little to no experience with these APIs, nor are we aware of any other applications using these APIs.

This seems reasonable to me.

@mxinden
Copy link
Contributor

mxinden commented Oct 11, 2024

The cfg flag is set at build time.

I don't understand why this is relevant. Both cargo features and rustc cfgs happen at build time, i.e. not at runtime.

Because cfg flags are set from the command/environment where the end artifact is built, whereas Cargo features can be set by other dependencies which often aren't under your control.

Thank you for expanding on this. That makes sense.

@larseggert
Copy link
Contributor Author

larseggert commented Oct 11, 2024

See 4f86c50. Let me know if I misunderstood something.

With this change, how will CI cover this code? AFAIK cargo hack only iterates over feature permutations.

quinn-udp/src/unix.rs Outdated Show resolved Hide resolved
@djc
Copy link
Member

djc commented Oct 14, 2024

See 4f86c50. Let me know if I misunderstood something.

Discussed this with @Ralith privately, we feel that the use of cfg has a negative impact on the ergonomics and the chance of issues arising from silent feature enabling is pretty limited in this case, so would prefer to stick with a feature for this for now.

@larseggert larseggert force-pushed the feat-apple-datapath branch 3 times, most recently from a11fa38 to de8f280 Compare October 14, 2024 09:48
@djc
Copy link
Member

djc commented Oct 24, 2024

Sorry it took a while to get the benchmark PR merged, can you rebase on top of main?

This uses Apple's private sendmsg_x and recvmsg_x system calls for
multi-packet UDP I/O.
@larseggert
Copy link
Contributor Author

Shouldn't the features CI workflow have included fast-apple-datapath? I don't see it in the logs. (Or I am missing something.)

@mxinden
Copy link
Contributor

mxinden commented Oct 24, 2024

Should be fine I believe:

running `cargo check --no-default-features --features direct-log,fast-apple-datapath,log,tracing` on quinn-udp (1754/1772)
running `cargo check --no-default-features --features default` on quinn-udp (1755/1772)
running `cargo check --no-default-features --features direct-log` on quinn-udp (1756/1772)
running `cargo check --no-default-features --features default,direct-log` on quinn-udp (1757/1772)
running `cargo check --no-default-features --features fast-apple-datapath` on quinn-udp (1758/1772)
running `cargo check --no-default-features --features default,fast-apple-datapath` on quinn-udp (1759/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath` on quinn-udp (1760/1772)
running `cargo check --no-default-features --features default,direct-log,fast-apple-datapath` on quinn-udp (1761/1772)
running `cargo check --no-default-features --features log` on quinn-udp (1762/1772)
running `cargo check --no-default-features --features direct-log,log` on quinn-udp (1763/1772)
running `cargo check --no-default-features --features fast-apple-datapath,log` on quinn-udp (1764/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath,log` on quinn-udp (1765/1772)
running `cargo check --no-default-features --features tracing` on quinn-udp (1766/1772)
running `cargo check --no-default-features --features direct-log,tracing` on quinn-udp (1767/1772)
running `cargo check --no-default-features --features fast-apple-datapath,tracing` on quinn-udp (1768/1772)
running `cargo check --no-default-features --features direct-log,fast-apple-datapath,tracing` on quinn-udp (1769/1772)
running `cargo check --no-default-features --features log,tracing` on quinn-udp (1770/1772)
running `cargo check --no-default-features --features direct-log,log,tracing` on quinn-udp (1771/1772)
running `cargo check --no-default-features --features fast-apple-datapath,log,tracing` on quinn-udp (1772/1772)

@djc djc added this pull request to the merge queue Oct 25, 2024
Merged via the queue into quinn-rs:main with commit adc4a06 Oct 25, 2024
14 checks passed
@larseggert larseggert deleted the feat-apple-datapath branch October 25, 2024 07:29
mxinden added a commit to mxinden/neqo that referenced this pull request Oct 30, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case mozilla#2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 30, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Oct 31, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
github-merge-queue bot pushed a commit to mozilla/neqo that referenced this pull request Nov 1, 2024
Currently we use `quinn-udp` `v0.5.4`.

`quinn-udp` `v0.5.5` fixes [`recvmmsg` calls on Android x86](quinn-rs/quinn#1966).

`quinn-udp` `v0.5.6` adds [experimental multi-message support on Apple
platforms](quinn-rs/quinn#1993) and [fixes an
unnecessary `windows-sys` version
restriction](quinn-rs/quinn#2021).

While not strictly necessary, given that our current version specification (i.e.
`version = "0.5.4"`) already allows users to use Neqo with `quinn-udp` `v0.5.6`,
this commit updates to `quinn-udp` `v0.5.6` anyways, thus making sure CI tests
with latest version.

In case #2208 lands, future compatible
version updates would touch the `Cargo.lock` file, not `Cargo.toml`.

Co-authored-by: Lars Eggert <lars@eggert.org>
psumbera added a commit to psumbera/quinn that referenced this pull request Nov 5, 2024
psumbera added a commit to psumbera/quinn that referenced this pull request Nov 5, 2024
djc pushed a commit that referenced this pull request Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants