Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(server): remove ServerConnectionIdGenerator #1981

Merged
merged 4 commits into from
Jul 17, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Jul 12, 2024

A neqo_transport::Server manages multiple neqo_transport::Connections. A Server keeps a HashMap of all Connections, indexed by ConnectionId.

Indexing by ConnectionId is difficult as:

  • The ConnectionId is not known before constructing the connection.
  • The ConnectionId might change throughout the lifetime of a connection.

Previously the index was kept up-to-date through a wrapper around the ConnectionIdGenerator provided to a Connection by a Server. This wrapper would be provided a shared reference to the Servers HashMap of Connections. Whenever the Server would process a Connection and that Connection would generate a new ConnectionId via ConnectionIdGenerator::generate_cid, the wrapped ConnectionIdGenerator would also insert the Connection keyed by the new ConnectionId into the Server's HashMap of Connections.

This has two problems:

  • Complexity through indirection where a Connection owned by a Server can indirectly mutate the Server through the provided wrapped ConnectionIdGenerator having access to the Server's set of Connections.

  • Double RefCell borrow e.g. when a Server would iterate its Connections, process each, where one of them might generate a new ConnectionId via the provided ConnectionIdGenerator, which in turn would insert the Connection into the currently iterated set of Connections of the Server.

This commit replaces the HashMap<ConnectionId, Rc<RefCell<Connection>>> of the Server with a simple Vec<Connection>. Given the removal of the index, the ConnectionIdGenerator wrapper (i.e. ServerConnectionIdGenerator) is no longer needed and removed and thus the shared reference to the Servers Connection HashMap is no longer needed and thus the above mentioned double borrow is made impossible.

Downside to the removal of the index by ConnectionId is a potential performance hit. When the Server manages a large set of Connections, finding the Connection corresponding to a ConnectionId (e.g. from an incoming Datagram) is O(n) (n equal the number of connections).


Fixes #1975.

Simplifies neqo_transport::Server once more.
image

Draft for now to test for performance regressions first.

neqo-bin/src/server/http09.rs Show resolved Hide resolved
neqo-transport/src/connection/mod.rs Show resolved Hide resolved
neqo-transport/tests/retry.rs Show resolved Hide resolved
Copy link

github-actions bot commented Jul 12, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

Copy link

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert
Copy link
Collaborator

You might be able to mitigate some of the performance hit by using an LRU queue for this, under the assumption that the hot set of connections is a subset of the total.

A `neqo_transport::Server` manages multiple `neqo_transport::Connection`s. A
`Server` keeps a `HashMap` of all `Connection`s, indexed by `ConnectionId`.

Indexing by `ConnectionId` is difficult as:

- The `ConnectionId` is not known before constructing the connection.
- The `ConnectionId` might change throughout the lifetime of a connection.

Previously the index was kept up-to-date through a wrapper around the
`ConnectionIdGenerator` provided to a `Connection` by a `Server`. This wrapper
would be provided a shared reference to the `Server`s `HashMap` of
`Connection`s. Whenever the `Server` would `process` a `Connection` and that `Connection` would generate a new `ConnectionId` via
`ConnectionIdGenerator::generate_cid`, the wrapped `ConnectionIdGenerator` would
also insert the `Connection` keyed by the new `ConnectionId` into the `Server`'s
`HashMap` of `Connection`s.

This has two problems:

- Complexity through indirection where a `Connection` owned by a `Server` can
indirectly mutate the `Server` through the provided wrapped
`ConnectionIdGenerator` having access to the `Server`'s set of `Connection`s.

- Double `RefCell` borrow e.g. when a `Server` would iterate its `Connection`s,
process each, where one of them might generate a new `ConnectionId` via the
provided `ConnectionIdGenerator`, which in turn would insert the `Connection`
into the currently iterated set of `Connection`s of the `Server`.

This commit replaces the `HashMap<ConnectionId, Connection>` of the `Server`
with a simple `Vec<Connection>`. Given the removal of the index, the
`ConnectionIdGenerator` wrapper (i.e. `ServerConnectionIdGenerator`) is no
longer needed and removed and thus the shared reference to the `Server`s
`Connection` `HashMap` is no longer needed and thus the above mentioned double
borrow is made impossible.

Downside to the removal of the index by `ConnectionId` is a potential
performance hit. When the `Server` manages a large set of `Connection`s, finding
the `Connection` corresponding to a `ConnectionId` (e.g. from an incoming
`Datagram`) is `O(n)` (`n` equal the number of connections).
@mxinden mxinden force-pushed the remove-connection-id-generator branch from 3d7e8fd to a30fb14 Compare July 12, 2024 18:14
Copy link

codecov bot commented Jul 12, 2024

Codecov Report

Attention: Patch coverage is 96.15385% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.96%. Comparing base (9f0a86d) to head (05442f5).
Report is 10 commits behind head on main.

Files Patch % Lines
neqo-transport/src/server.rs 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1981      +/-   ##
==========================================
- Coverage   94.97%   94.96%   -0.01%     
==========================================
  Files         112      112              
  Lines       36509    36411      -98     
==========================================
- Hits        34673    34578      -95     
+ Misses       1836     1833       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jul 12, 2024

Benchmark results

Performance differences relative to 80c7969.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [193.49 ns 194.05 ns 194.65 ns]
       change: [-0.3556% +0.0926% +0.5540%] (p = 0.70 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [235.87 ns 236.39 ns 236.96 ns]
       change: [-0.3982% -0.0041% +0.3582%] (p = 0.98 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [236.33 ns 237.24 ns 238.26 ns]
       change: [-0.0234% +0.4406% +0.8854%] (p = 0.07 > 0.05)

Found 9 outliers among 100 measurements (9.00%)
9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [214.51 ns 217.22 ns 223.54 ns]
       change: [-1.4275% +4.5299% +15.875%] (p = 0.65 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low mild
4 (4.00%) high mild
9 (9.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [120.61 ms 120.78 ms 121.02 ms]
       change: [-0.0592% +0.1758% +0.4131%] (p = 0.17 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

transfer/Run multiple transfers with varying seeds: No change in performance detected.
       time:   [55.298 ms 58.551 ms 61.808 ms]
       thrpt:  [64.717 MiB/s 68.317 MiB/s 72.335 MiB/s]
change:
       time:   [-1.6575% +5.8133% +14.737%] (p = 0.16 > 0.05)
       thrpt:  [-12.844% -5.4939% +1.6855%]
transfer/Run multiple transfers with the same seed: No change in performance detected.
       time:   [70.347 ms 76.581 ms 82.778 ms]
       thrpt:  [48.322 MiB/s 52.232 MiB/s 56.861 MiB/s]
change:
       time:   [-5.2024% +8.0126% +22.304%] (p = 0.23 > 0.05)
       thrpt:  [-18.236% -7.4182% +5.4879%]

Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low mild

1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
       time:   [274.92 ms 290.67 ms 301.62 ms]
       thrpt:  [331.54 MiB/s 344.03 MiB/s 363.74 MiB/s]
change:
       time:   [-4.1268% +1.4939% +5.6874%] (p = 0.63 > 0.05)
       thrpt:  [-5.3813% -1.4719% +4.3044%]

Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: 💚 Performance has improved.
       time:   [410.18 ms 413.27 ms 416.34 ms]
       thrpt:  [24.019 Kelem/s 24.197 Kelem/s 24.380 Kelem/s]
change:
       time:   [-7.5798% -6.5718% -5.5734%] (p = 0.00 < 0.05)
       thrpt:  [+5.9024% +7.0340% +8.2014%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [68.157 ms 68.496 ms 68.882 ms]
       thrpt:  [14.518  elem/s 14.599  elem/s 14.672  elem/s]
change:
       time:   [-1.2897% -0.5202% +0.2671%] (p = 0.21 > 0.05)
       thrpt:  [-0.2664% +0.5229% +1.3065%]

Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low mild
1 (1.00%) high mild
6 (6.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 204.7 ± 126.4 108.3 529.6 1.00
neqo msquic reno on 290.6 ± 64.1 257.1 469.6 1.00
neqo msquic reno 313.6 ± 82.9 254.2 475.7 1.00
neqo msquic cubic on 264.8 ± 11.6 249.3 286.7 1.00
neqo msquic cubic 348.6 ± 92.6 254.6 482.2 1.00
msquic neqo reno on 202.4 ± 68.1 157.2 371.2 1.00
msquic neqo reno 202.4 ± 80.5 120.3 352.7 1.00
msquic neqo cubic on 234.2 ± 95.0 148.2 396.1 1.00
msquic neqo cubic 217.5 ± 82.4 119.9 397.6 1.00
neqo neqo reno on 215.7 ± 61.4 168.9 394.0 1.00
neqo neqo reno 219.5 ± 70.6 165.0 377.2 1.00
neqo neqo cubic on 222.5 ± 71.2 161.3 388.3 1.00
neqo neqo cubic 229.4 ± 90.2 170.3 466.1 1.00

⬇️ Download logs

@mxinden mxinden marked this pull request as ready for review July 13, 2024 10:22
@larseggert
Copy link
Collaborator

You might be able to mitigate some of the performance hit by using an LRU queue for this, under the assumption that the hot set of connections is a subset of the total.

Or you could swap an element in the Vec with its left neighbor on every access, bubbling frequently-accessed elements to the front over time.

@mxinden
Copy link
Collaborator Author

mxinden commented Jul 15, 2024

You might be able to mitigate some of the performance hit by using an LRU queue for this, under the assumption that the hot set of connections is a subset of the total.

Or you could swap an element in the Vec with its left neighbor on every access, bubbling frequently-accessed elements to the front over time.

According to the latest benchmark run, this pull request does not have a performance impact. Given that the Neqo server implementation is for testing and benchmarking only, do we need any of the above optimizations? Given that they come with (minimal) complexity, I would prefer introducing them only once needed, i.e. when we have a benchmark showing their impact. Thoughts @larseggert?

@larseggert
Copy link
Collaborator

Given that the Neqo server implementation is for testing and benchmarking only, do we need any of the above optimizations?

No, we don't. I was just putting this here since you asked :-)

Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits.

neqo-transport/src/server.rs Outdated Show resolved Hide resolved
neqo-transport/tests/server.rs Show resolved Hide resolved
Two packets with the same connection id but different source addresses are
assigned to the same connection. Thus there is no need to use the remote address
to disambiguate the packets. Thus there is no need for `AttemptKey`.
@larseggert
Copy link
Collaborator

@KershawChang or @martinthomson, could you take a look? This would unblock vendoring in a new neqo release into Gecko.

@mxinden mxinden mentioned this pull request Jul 17, 2024
@larseggert larseggert added the needs-review Needs a review by a CODEOWNER. label Jul 17, 2024
Copy link
Collaborator

@KershawChang KershawChang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinden
Copy link
Collaborator Author

mxinden commented Jul 17, 2024

I've also pushed this to try server.
https://treeherder.mozilla.org/jobs?repo=try&revision=1ef0d1a461cb9a381b8279da79b8e08dea64ab08&selectedTaskRun=VOdG5FRbQ-28ReByKNjk4w.0

Thank you! I see at least one potentially related test failure:

[task 2024-07-17T14:36:20.766Z] 14:36:20  WARNING -  TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_altsvc_http3.js | Test timed out

https://treeherder.mozilla.org/logviewer?job_id=466715649&repo=try&lineNumber=4167

@larseggert larseggert added this pull request to the merge queue Jul 17, 2024
@larseggert larseggert removed the needs-review Needs a review by a CODEOWNER. label Jul 17, 2024
Merged via the queue into mozilla:main with commit 19ec1a9 Jul 17, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BorrowMutError in neqo-transport/src/server.rs
3 participants