Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC(ws): On a Gateway to Hel...Discord #8083

Closed
vladfrangu opened this issue Jun 12, 2022 · 5 comments · Fixed by #8260
Closed

RFC(ws): On a Gateway to Hel...Discord #8083

vladfrangu opened this issue Jun 12, 2022 · 5 comments · Fixed by #8260

Comments

@vladfrangu
Copy link
Member

vladfrangu commented Jun 12, 2022

Preface: this RFC (Request For Comments) is meant to gather feedback and opinions for the feature set and some implementation details for @discordjs/ws! While a lot of the things mentioned in this RFC will be opinionated from my side (like what I’d like to see implemented in the module), not everything might be added in the initial version or some extra things might be added as extras.

This RFC is split into two parts: the WebSocketManager (called manager from this point forward) and WebSocketShard (called shards from this point forward). One thing to keep in mind is that, in the context of @discordjs/ws and this RFC, a shard represents more than what the connection to Discord that has a shard id and count.

Before anyone begins to work on this, I’d like to ask them kindly to not take much if any inspiration from the current websocket management implementation in discord.js past some things that would be replicated (like decompression). The current implementation has many pitfalls and chances to get into a deadlock. Thanks.

This module also does not need to use EventEmitters for emitting information. If desired, it could use just a callback provided as an option in the manager options. If this is chosen, it should log a warning to the console if the callback is missing.


WebSocketManager

This is the entry point in using this module. It’s job is to handle the creation of shards, and to be the central place where shards can intercommunicate if needed (useful for fetching session ids, ensuring the 5 second delay between identifies is satisfied while accounting for max_concurrency, and so on), as well as the place that emits all payloads shards will receive (at the time of writing this refers only to DISPATCH payloads, with other miscellaneous payloads being optionally emitted in a diagnostics channel (to be decided)).

The manager should support storing and retrieving session information via options to allow for restarts of the process without loosing the session.

interface SessionInfo {
	sessionId: string;
	sequence: number;
	shardId: number;
	shardCount: number;
}

const manager = new WebSocketManager({
	async retrieveSessionInfo(shardId): Awaitable<SessionInfo | null> {
		// Fetch this info from redis or similar
		return { sessionId: string, sequence: number };
		// Return null if no information is found
	},
	async updateSessionInfo(sessionInfo) {
		// Store the info somewhere
	}
});

The manager should support changing the gateway url, version and compression support, just like @discordjs/rest does. For max_concurrency, users should either manually provide it, or the manager should fetch it when you call .start() (or whatever the method to start creating shards will be called).

The manager should have a robust queueing system to ensure shards that want to connect can do so while respecting the max_concurrency as well as the 5 second identify limit between shards.

The manager should allow you to specify ids in the following formats:

  • an array of numbers (where each element will be spawned)
  • ranges you want to spawn
  • a number (maybe we shouldn’t allow this but always require an array or a range object?)

No matter the input, the manager should ensure the biggest id provided is smaller than the total shard count before attempting to create shards.

const manager = new WebSocketManager({
	shardCount: 100,
	// Shards can be a number (for one shard), an array of numbers (spawn all
	// shards with those ids), or a range object as shown below.
	// Internally the manager should parse this in a usable form that doesn't
	// require re-parsing.
	// Given the example below it should spawn all shards from id 0 to id 49
	// inclusive (in total this manager would manage 50 shards)
	shards: {
		start: 0,
		end: 49
	}
});

The manager should allow the users to decide how they want shards to be spawned: all in the current process, all in a worker_thread, each in its own worker_thread, or N per worker_thread.

Note! While the above is not a requirement for v1 of this module, it should be built in mind with an easy conversion to add support to it.

Inspirations from other modules we can consider for future versions

  • Possibly consider implementing an automatic tracker for shard count vs guild count, to allow automatic resharding based on guild count + user configuration
    • Is this even something we want to do or should we just provide users with a function that will warn them once shards are approaching the limit, so they can take action?
    • This could yield duplicated events in the worst case. Is that something that we consider a dealbreaker for this, if we add this?

WebSocketShard

This represents a connection to Discord for a given shard id and count. It should be able to self-manage as much as possible (for everything except identifies, updating the session info and broadcasting the DISPATCH payloads it receives, it should not need to interact with the manager).

On first connection, it should fetch the gateway url, version, compression method and session info for its shard configuration. If it receives session information from the manager, it should attempt to reconnect and resume. If that fails, it should enqueue an identify. It should follow the flow mentioned in Discord’s documentation.

The shard should support:

  • No compression, JSON encoding
  • No compression, ETF encoding (?)
  • zlib-sync compression, JSON encoding
  • zlib-sync compression, ETF encoding (even bigger ?)

The shard should tell the manager to update its session information for every payload it receives (to ensure the sequence is up to date) or, if more ideal, on a timer. This should be configurable by the user at the manager level, but no matter what it should always update its information for READY, RESUMED and INVALID_SESSION.

The shard should dispatch the DISPATCH payloads it receives to the manager, and the manager should emit it alongside the shard it came from. How users consume it from that point forward is their choice (redis, NATS, etc).

The shard may optionally dispatch all non-DISPATCH payloads it receives if the user enables a debug option. This is to be decided.

If a heartbeat ack hasn’t been received since the last heartbeat, the connection should be assumed as invalid/zombie and be immediately closed. You should not wait for a close event on the actual web socket connection. You should attempt to resume the session, and if that fails, enqueue an identify.

EDIT: The shard should prevent payloads from sending until the shard is READY. Only payloads that can bypass this check are HEARTBEAT, IDENTIFY, RESUME or anything thats strictly related to the auth flow of shard connections.


Anything else missing

If you think this RFC is missing any critical/crucial bits of information, let me know. I will add it to the main issue body with a bolded EDIT tag to ensure everyone sees it (if they don’t want to check the edited history).


Changelog

  • For the strategy for spawning shards, support for N shards per worker thread was added after internal discussions

  • Added shard payload send queue reference to prevent issues where someone sends payloads on an interval and the connection is not ready yet

@Holo-Buckshot
Copy link

Possibly consider implementing an automatic tracker for shard count vs guild count, to allow automatic resharding based on guild count + user configuration
Is this even something we want to do or should we just provide users with a function that will warn them once shards are approaching the limit, so they can take action?
This could yield duplicated events in the worst case. Is that something that we consider a dealbreaker for this, if we add this?

This sounds like an all-around bad idea given the fact that a bot could lose or gain several servers at any given time. As for providing a function to notify them, it is of my personal opinion that the developer themself would know best when they need to add/remove shards. Having the websocket manager do this automatically could lead to awkward bot lag while the shards restart themselves or, in the case of bots that use a great many shards on a single server, cause them to be down for a significant period while the websockets reconnect themselves. You could always make it an option, like 'autoreconfigure' or something, but it should definitely default to false.

@ckohen
Copy link
Member

ckohen commented Jul 12, 2022

There would be a significant over and undershoot before automatically changing shards. Kinda like an air conditioner.

@Deivu
Copy link
Contributor

Deivu commented Jul 13, 2022

Worker threads per shard sounds like a good idea so parsing the data happens on the worker then just relay it back to the process where it would get used. This should improve heartbeat intervals, avoids delays on it as well as we are ensured that the thread will only handle websocket shards. Though the idea could be improved by instead of spawning one worker per shard, probably an x amount of shards (user provided or automatic) per worker so we could reduce the amount of process we spawn?

@didinele
Copy link
Member

The manager should allow the users to decide how they want shards to be spawned: all in the current process, all in a worker_thread, each in its own worker_thread, or N per worker_thread.

It's already in the RFC.

@didinele
Copy link
Member

todo for the future, we should find a way to support waiting for our next identify cross-manager (e.g. some storage mechanism like sessions)

@vladfrangu vladfrangu unpinned this issue Aug 7, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants