-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relay send path can overwrite wakers and might report Poll::Pending
more than necessary
#3067
Comments
Poll::Pending
more than necessaryPoll::Pending
more than necessary
4 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 3, 2025
…MagicSock (#3062) ## Description This refactors how datagrams flow from the MagicSock (AsyncUdpSocket) to relay server and back. It also vastly simplifies the actors involved in communicating with a relay server. - The `RelayActor` managed all connections to relay servers. - It starts a new `ActiveRelayActor` for each relay server needed. - The `ActiveRelayActor` will exit when unused. - Unless it is for the home relay, this one never exits. - Each `ActiveRelayActor` uses a relay `Client`. - The relay `Client` is now a `Stream` and `Sink` directly connected to the `TcpStream` connected to the relay server. This eliminates several actors previously used here in the `Client` and `Conn`. - Each `ActiveRelayActor` will try and maintain a connection with the relay server. - If connections fail, exponential backoff is used for reconnections. - When `AsyncUdpSocket` needs to send datagrams: - It (now) puts them on a queue to the `RelayActor`. - The `RelayActor` ensures the correct `ActiveRelayActor` is running and forwards datagrams to it. - The `ActiveRelayActor` sends datagrams directly to the relay server. - The relay receive path is now: - Whenever `ActiveRelayActor` is connected it reads from the underlying `TcpStream`. - Received datagrams are placed on an mpsc channel that now bypasses the `RelayActor` and goes straight to the `AsyncUpdSocket` interface. Along the way many bugs are fixed. Some of them: - The relay datagrams send and receive queue now behave more correctly when they are full. So the `AsyncUdpSocket` behaves better. - Though there still is a bug with the send queue not waking up all the tasks that might be waiting to send. This needs a followup: #3067. - The `RelayActor` now avoids blocking. This means it can still react to events when the datagrams queues are full and reconnect relay servers etc as needed to unblock. - The `ActiveRelayActor` also avoids blocking. Allowing it to react to connection breakage and the need to reconnect at any time. - The `ActiveRelayActor` now correctly handles connection errors and retries with backoff. - The `ActiveRleayActor` will no longer queue unsent datagrams forever, but flush them every 400ms. - This also stops the send queue into the `RelayActor` from completely blocking. ## Breaking Changes ### iroh-relay - `Conn` is no longer public. - The `Client` is completely changed. See module docs. ## Notes & open questions - Potentially the relay `Conn` and `Client` don't need to be two separate things now? Though Client is convenient as it only implements one Sink interface, while Conn is also a Frame sink. This means on Conn you often have to use very annoying syntax when calling things like .flush() or .close() etc. - Maybe a few items from the `ActiveRelayActor` can be moved back into the relay `Client`, though that would probably require some gymnastics. The current structure of `ActiveRelayActor` is fairly reasonable and handles things correctly. Though it does have a lot of client knowledge baked in. Being able to reason about the client as a stream + sink is what enabled me to write the good `ActiveRelayActor` though, so I'm fairly happy that this code makes sense as it is. If all goes well this should: Closes #3008 Closes #2971 Closes #2951 ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented. --------- Co-authored-by: Friedel Ziegelmayer <me@dignifiedquire.com>
4 tasks
2 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 10, 2025
## Description <!-- A summary of what this pull request achieves and a rough list of changes. --> Closes #3067 Since multiple `Connection`s can have multiple `AsyncUdpSocket` `IpPollers`, it's incorrect to assume that a single `AtomicWaker` can wake up these tasks correctly. Instead, if there's only one `AtomicWaker` for all of them, only the last registered waker will be woken when there's capacity again. I'm changing this such that `poll_writable` will actually add the current task's waker to a list, if it hasn't been added yet. And successfully `recv`ing an item will wake all tasks. A second issue was that *between* the call to `self.sender.capacity()` (it being 0) and `self.waker.register`, another thread might have successfully `recv`d an item *and* called `self.waker.wake()`. This means that the first thread could've registered a waker, while the capacity is actually non-zero. Not terrible (it's very likely that it'll be woken up when the capacity jumps to 2 this time), but also not great. The fix is to re-check the capacity after registering the waker. The downside is that we keep the waker and thus potentially get a spurious wakeup, but that's fine. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> This... "fixes" the concerns, but I'd actually like to write a loom test for the concerns eventually. It's just... a *lot* of work, and I'd rather prioritize other things, but also improve this part of the code at the same time. ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [x] All breaking changes documented.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
2001616
This likely doesn't turn into a pressing problem in practice, but could become an issue eventually/could help improve performance.
A lot of the difficulty in fixing this is working around the API bottleneck that is the
&self
reference in theAsyncUdpSocket::try_send
API.The text was updated successfully, but these errors were encountered: