Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for a "fallback" upstream server #125

Open
Shigbeard opened this issue Mar 9, 2018 · 2 comments
Open

Option for a "fallback" upstream server #125

Shigbeard opened this issue Mar 9, 2018 · 2 comments

Comments

@Shigbeard
Copy link

As we all know, Starbound is a very... ahem... special piece of software, and is prone to failures (be it due to Lua users who know how cause the server to have a heart attack, or servers that are just a little overloaded).

There are various tools out there to manage downtime in the event of server failure. I'd like to see StarryPy3k support these tools by providing a "upstream_fallback" ip and port option. Essentially what this would do is (possibly somewhere in the main server loop) attempt to establish a TCP "handshake" with the upstream server every time a player attempts to connect. If the handshake could be established, we send the player to the upstream server. Otherwise, we assume the upstream server is dead and send them to the fallback server. Some communities may utilize this to send players to a backup server, others may simply utilize this to redirect to packages such as pseudoStarbound.

@GermaniumSystem
Copy link
Contributor

I quite like this idea (go figure) but there are a few issues that should be considered.

Most notable is timeouts. Detecting a down service and failing over mid-connection could work if the upstream service refuses the connection, but may fail if the upstream service simply doesn't respond (as Starbound tends to do). This may not be as much of an issue, as Starbound clients appear to have an astronomically high timeout, but we'll have to verify this. Additionally, some servers (such as mine) have a significantly increased timeout to address other Starbound oddities.

Next is the issue of intermittent dropped connections. Sometimes Starbound just stops accepting connections for a minute or two before resuming normal operation. As such, dealing with a flapping service should be considered.

Finally, would using an upstream service be best? Doing so makes StarryPy3k's fairly involved installation process even more involved, and dummy-services like pseudoStarbound are simple enough to be rolled into the plugin. However, doing so would limit the possible implementations of this feature and would result in another feature that must be fixed with each update.

@Shigbeard
Copy link
Author

Well as it stands I have difficulties routing traffic to pseudoStarbound in the event of server downtime. This is largely due to the fact that Starbound accepts and transmits connection information via TCP, but streams game data such as tiles, background, actor positions etc via UDP (Which makes sense, right? Those things are non-critical, and if the packets were somehow dropped or malformed, it's a simple case of the client requesting them again, and even if they weren't dropped or malformed the client hardly needs to verify they got it)

But anyway, lets go over your concerns in a point-by-point case. You stressed that

  • Starbound sometimes ignores/declines connections for a couple of minutes for no reason.
    • This can be worked with. Ultimately if the server is ignoring requests for X amount of time it's safe to presume the server is not working at that moment, and therefor it would be safe to send the connection to a service such as pseudoStarbound (or outright drop the connection with a suitable error message).
    • You could theoretically "retry" the server every couple of seconds when a player connects, not too disimilar to Source Engine's connections.
  • There's a difference between a refused connection and an unresponsive connection
    • Again, we have plenty of time before the client gives up on the connection (and we could theoretically feed the client with data crafted to keep them waiting) during which time we can easily tell the difference between a refused connection and an unresponsive one.
    • In the event it is unresponsive, we have plenty of time to determine this before the client gives up on the connection. We could assume the server to be crashed if it doesnt respond within a certain timeframe, which is reasonable to assume.
  • Is this a suitable feature considering the lack of support from Chucklefish Studios and how involved setup and futureproofing is?
    • This is truly the most reasonable argument against this feature. Ultimately it is on Chucklefish Studios to fix a large number of issues present in Starbound's server build. This suggestion for one, is in direct response to the instability of the server instance. Should that be resolved, there'd be far less of a requirement to monitor and react to Starbound downtime.
    • That and naughty Lua users...

Not gonna lie, at this point I've low-key lost my train of thought. What I think I've been trying to say is that the various ways Starbound can fail to accept connections can be compensated for rather effortlessly, but I cannot justify making it harder to future-proof or setup StarryPy3k.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants