Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework session keep-alive logic #1357

Closed
wants to merge 4 commits into from

Conversation

wisp3rwind
Copy link
Contributor

Tentative fix for #1340. Also add some logging to show the timing (at info level for now, so we're sure that it's available from everyone who tests it).

@wisp3rwind
Copy link
Contributor Author

Updated to implement a modified timeout handling:

/// Returns an error when we haven't received a Ping/PongAck for too long.
///
/// The expected keepalive sequence is
/// - Server: Ping
/// - wait 60s
/// - Client: Pong
/// - Server: PongAck
/// - wait 60s
/// - repeat
///
/// This means that we silently lost connection to Spotify servers if
/// - we don't receive a Ping 60s after the last PongAck, or
/// - we don't receive a PongAck immediately after our Pong.
///
/// Currently, we add a safety margin of 5s to these expected deadlines.

Unfortunately, this has turned the code quite a bit more complicated. What do you think? Is this overkill? Any suggestion on improving (simplifying) it?

@wisp3rwind wisp3rwind marked this pull request as ready for review October 1, 2024 15:31
@roderickvd
Copy link
Member

First of all, thanks a lot for this. I'm thrilled to see the recent influx of good contributions here. v0.5 is shaping up to be a solid release, getting these last issues ironed out like this!

Unfortunately, this has turned the code quite a bit more complicated. What do you think? Is this overkill? Any suggestion on improving (simplifying) it?

My take: correctness goes first. Then, making the code as simple as can be - maintaining correctness.

So principally I don't mind more complex code if that's what it takes. But, indeed, maybe a counter-suggestion that I've applied elsewhere, if it fits this use-case also:

Have two watchdogs:

watchdog_rx: Pin<Box<tokio::time::Sleep>>,
watchdog_tx: Pin<Box<tokio::time::Sleep>>,

Polling it:

self.watchdog_tx = tokio::time::sleep(PONG_TIMEOUT);
self.watchdog_rx = tokio::time::sleep(PING_TIMEOUT);

tokio::select! {
    () = &mut self.watchdog_tx, if self.ping_received => {
        self.send_pong().await
    }

    () = &mut self.watchdog_rx => {
        // connection to server timed out
    }
}

And something like:

fn handle_ping(&mut self) {
   self.ping_received = true;
   
}

fn send_pong(&mut self) {
   self.ping_received = false;
   // send the pong, then reset the timer
   self.watchdog_tx = tokio::time::sleep(PONG_TIMEOUT);   
}

fn handle_pong_ack(&mut self) {
    self.watchdog_rx = tokio::time::sleep(PING_TIMEOUT);
}

@wisp3rwind
Copy link
Contributor Author

Hmm, I don't see how this would work out in this case: With the current design, the DispatchTask and keep_alive_task futures only have access to a SessionWeak rather than any shared mutable data &mut self. Thus, the shared state (watchdog futures and the ping_received flag would require additional synchronization. In my code, I've achieved that using the tokio::sync::watch::channel, and opted to keep all Sleep futures in one of the tasks for simplicity.

It might be possible to implement your design by moving the watchdog task from the Session into the DispatchTask. I'll play around with that to see whether it works out. (An argument against that could be if there's a chance that the watchdog task should ever check keep-alive sequences of another transport. On the other hand, it might be easier to have separate watchdogs in that case.)

Am I missing something here?

@wisp3rwind
Copy link
Contributor Author

Closing in favor of #1359.

@wisp3rwind wisp3rwind closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants