Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement server recovery and peer routing #15

Merged
merged 7 commits into from
Oct 29, 2023
Merged

Conversation

ekzhang
Copy link
Owner

@ekzhang ekzhang commented Oct 29, 2023

This change does two things, through the new StorageMesh object, which combines persistent storage and server cluster support:

  • sshx can now run in three modes
    • None: same as before. server must stay up, and any time it goes down, all previous sessions are lost.
    • Storage: requires Redis. server regularly snapshots session state in Redis and can tolerate a restart without losing existing session data
    • Meshed Storage: requires Redis. multiple servers all handle a subset of sessions, and requests are routed to the closest server. the server that "controls" each session is based on geographic distance to the terminal backend client. frontend WebSockets are each proxied to the controlling session

In Meshed Storage mode, sessions are automatically terminated using a Redis PUB/SUB topic that allows servers to transfer control of a session from another server. The lending server will immediately relinquish all connections, and web clients will reconnect.

This is everything we need to run sshx as a globally distributed cluster.

TODO

  • Make persistence immediate whenever shells are created/deleted, or next_sid/next_uid is changed, since that's critical
  • Reduce per-shell snapshot data size to like 32 KiB
  • TLS for redis
  • Write tests (gah, got to manage complexity)
  • Expire a session from local storage after 10 minutes without activity (including heartbeats — so it was disconnected)

@ekzhang ekzhang linked an issue Oct 29, 2023 that may be closed by this pull request
@ekzhang ekzhang merged commit 53e0055 into main Oct 29, 2023
3 checks passed
@ekzhang ekzhang deleted the ekzhang/routing-recovery branch October 29, 2023 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement server discovery, peer routing, and recovery
1 participant