Skip to content

Conversation

@dignifiedquire
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3624/docs/iroh/

Last updated: 2025-11-13T16:35:39Z

@n0bot n0bot bot added this to iroh Nov 7, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Nov 7, 2025
// relies on quinn::EndpointConfig::grease_quic_bit being set to `false`,
// which we do in Endpoint::bind.
if let Some((sender, sealed_box)) = disco::source_and_box(datagram) {
trace!(src = ?source_addr, len = datagram.len(), "UDP recv: DISCO packet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these show up in perf?

But yeah - perhaps it's fair to remove these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do 😭

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're removed in the QNT branch too :)

But really, you should be benching with the tracing static verbosity levels off, no? i.e. use max_level_off and/or release_max_level_off: https://docs.rs/tracing/latest/tracing/level_filters/index.html#compile-time-filters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I didn't know about this, but in practice this will show up for folks though

@dignifiedquire dignifiedquire force-pushed the feat-multipath-perf-dig branch from 10f7e56 to 877cdcf Compare November 11, 2025 13:48
@dignifiedquire dignifiedquire changed the title [WIP] perf: various improvements perf: various improvements Nov 13, 2025
@dignifiedquire dignifiedquire marked this pull request as ready for review November 13, 2025 09:19
dignifiedquire and others added 3 commits November 13, 2025 10:20
Bumps the `n0-watcher` dependency to the released version.

Depends on `net-tools` also bumping the n0-watcher version:
- [x] n0-computer/net-tools#64

<!-- Remove any that are not relevant. -->
- [x] Self-review.
@dignifiedquire dignifiedquire force-pushed the feat-multipath-perf-dig branch from 2b4da13 to 722ae84 Compare November 13, 2025 09:24

poll_recv_counter: AtomicUsize,
poll_recv_counter: usize,
source_addrs: tinyvec::TinyVec<[Addr; 4]>, // cache for source addrs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 might be a little conservative, depending on if your system has GSO enabled or not. E.g. I consistently get 32 as the value for metas.len().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's a pub const in quinn-udp: BATCH_SIZE. Perhaps this should just refer to that constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to fill the stack that much, we already fill it quite a bit. So I would only want to increase this if we have benchmarks showing this is clearly worth it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 * Addr = 1280 bytes. The Transport struct is put on the stack in the async fn Handle::new. Those cases can cause ballooning stack sizes when the struct has to be held across an await point, which currently isn't the case.
In fact, it's moved into MagicTransport::new which immediately gets Box::newd.

I don't think that 1.3kB on the stack is too bad.

But also, given it's moved into a Box, doesn't that mean it'll live on the heap anyways?
Is there a performance difference to using a Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a performance difference to using a Vec?

yeah, at least in my tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to leave a comment why this was put here. Otherwise I can imagine it being removed easily in a future refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment and switched to an array for now

Copy link
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from that one constant


poll_recv_counter: AtomicUsize,
poll_recv_counter: usize,
source_addrs: tinyvec::TinyVec<[Addr; 4]>, // cache for source addrs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to leave a comment why this was put here. Otherwise I can imagine it being removed easily in a future refactor.


let mut source_addrs = vec![Addr::default(); metas.len()];
match self.inner_poll_recv(cx, bufs, metas, &mut source_addrs)? {
self.source_addrs.resize_with(metas.len(), Addr::default);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would making this an array also have been an option? You know the exact size... Though if it's too big you don't want it to be on the stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the exact size is runtime dependent, so I can not make it an array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you know the maximum size will always be quinn_udp::BATCH_SIZE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that matters though because Addr isn't/cannot be Copy

Comment on lines +555 to +556
// zip is slow :(
for i in 0..metas.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is because of the double zip. The QNT branch only has a single zip here I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem like zip generally optimizes badly: rust-lang/rust#143966 (comment)

@github-actions
Copy link

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: c609be3

@dignifiedquire dignifiedquire merged commit 8d819f0 into feat-multipath Nov 13, 2025
17 of 28 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Nov 13, 2025
@dignifiedquire dignifiedquire deleted the feat-multipath-perf-dig branch November 13, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants