-
Notifications
You must be signed in to change notification settings - Fork 315
perf: various improvements #3624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3624/docs/iroh/ Last updated: 2025-11-13T16:35:39Z |
| // relies on quinn::EndpointConfig::grease_quic_bit being set to `false`, | ||
| // which we do in Endpoint::bind. | ||
| if let Some((sender, sealed_box)) = disco::source_and_box(datagram) { | ||
| trace!(src = ?source_addr, len = datagram.len(), "UDP recv: DISCO packet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these show up in perf?
But yeah - perhaps it's fair to remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're removed in the QNT branch too :)
But really, you should be benching with the tracing static verbosity levels off, no? i.e. use max_level_off and/or release_max_level_off: https://docs.rs/tracing/latest/tracing/level_filters/index.html#compile-time-filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, I didn't know about this, but in practice this will show up for folks though
10f7e56 to
877cdcf
Compare
Bumps the `n0-watcher` dependency to the released version. Depends on `net-tools` also bumping the n0-watcher version: - [x] n0-computer/net-tools#64 <!-- Remove any that are not relevant. --> - [x] Self-review.
2b4da13 to
722ae84
Compare
iroh/src/magicsock/transports.rs
Outdated
|
|
||
| poll_recv_counter: AtomicUsize, | ||
| poll_recv_counter: usize, | ||
| source_addrs: tinyvec::TinyVec<[Addr; 4]>, // cache for source addrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 might be a little conservative, depending on if your system has GSO enabled or not. E.g. I consistently get 32 as the value for metas.len().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that's a pub const in quinn-udp: BATCH_SIZE. Perhaps this should just refer to that constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to fill the stack that much, we already fill it quite a bit. So I would only want to increase this if we have benchmarks showing this is clearly worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 * Addr = 1280 bytes. The Transport struct is put on the stack in the async fn Handle::new. Those cases can cause ballooning stack sizes when the struct has to be held across an await point, which currently isn't the case.
In fact, it's moved into MagicTransport::new which immediately gets Box::newd.
I don't think that 1.3kB on the stack is too bad.
But also, given it's moved into a Box, doesn't that mean it'll live on the heap anyways?
Is there a performance difference to using a Vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance difference to using a
Vec?
yeah, at least in my tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to leave a comment why this was put here. Otherwise I can imagine it being removed easily in a future refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment and switched to an array for now
matheus23
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from that one constant
iroh/src/magicsock/transports.rs
Outdated
|
|
||
| poll_recv_counter: AtomicUsize, | ||
| poll_recv_counter: usize, | ||
| source_addrs: tinyvec::TinyVec<[Addr; 4]>, // cache for source addrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to leave a comment why this was put here. Otherwise I can imagine it being removed easily in a future refactor.
iroh/src/magicsock/transports.rs
Outdated
|
|
||
| let mut source_addrs = vec![Addr::default(); metas.len()]; | ||
| match self.inner_poll_recv(cx, bufs, metas, &mut source_addrs)? { | ||
| self.source_addrs.resize_with(metas.len(), Addr::default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would making this an array also have been an option? You know the exact size... Though if it's too big you don't want it to be on the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the exact size is runtime dependent, so I can not make it an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you know the maximum size will always be quinn_udp::BATCH_SIZE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that matters though because Addr isn't/cannot be Copy
| // zip is slow :( | ||
| for i in 0..metas.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is because of the double zip. The QNT branch only has a single zip here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like zip generally optimizes badly: rust-lang/rust#143966 (comment)
No description provided.