RFC(ws): On a Gateway to Hel...Discord #8083

vladfrangu · 2022-06-12T19:12:43Z

Preface: this RFC (Request For Comments) is meant to gather feedback and opinions for the feature set and some implementation details for @discordjs/ws! While a lot of the things mentioned in this RFC will be opinionated from my side (like what I’d like to see implemented in the module), not everything might be added in the initial version or some extra things might be added as extras.

This RFC is split into two parts: the WebSocketManager (called manager from this point forward) and WebSocketShard (called shards from this point forward). One thing to keep in mind is that, in the context of @discordjs/ws and this RFC, a shard represents more than what the connection to Discord that has a shard id and count.

Before anyone begins to work on this, I’d like to ask them kindly to not take much if any inspiration from the current websocket management implementation in discord.js past some things that would be replicated (like decompression). The current implementation has many pitfalls and chances to get into a deadlock. Thanks.

This module also does not need to use EventEmitters for emitting information. If desired, it could use just a callback provided as an option in the manager options. If this is chosen, it should log a warning to the console if the callback is missing.

WebSocketManager

This is the entry point in using this module. It’s job is to handle the creation of shards, and to be the central place where shards can intercommunicate if needed (useful for fetching session ids, ensuring the 5 second delay between identifies is satisfied while accounting for max_concurrency, and so on), as well as the place that emits all payloads shards will receive (at the time of writing this refers only to DISPATCH payloads, with other miscellaneous payloads being optionally emitted in a diagnostics channel (to be decided)).

The manager should support storing and retrieving session information via options to allow for restarts of the process without loosing the session.

interface SessionInfo {
	sessionId: string;
	sequence: number;
	shardId: number;
	shardCount: number;
}

const manager = new WebSocketManager({
	async retrieveSessionInfo(shardId): Awaitable<SessionInfo | null> {
		// Fetch this info from redis or similar
		return { sessionId: string, sequence: number };
		// Return null if no information is found
	},
	async updateSessionInfo(sessionInfo) {
		// Store the info somewhere
	}
});

The manager should support changing the gateway url, version and compression support, just like @discordjs/rest does. For max_concurrency, users should either manually provide it, or the manager should fetch it when you call .start() (or whatever the method to start creating shards will be called).

The manager should have a robust queueing system to ensure shards that want to connect can do so while respecting the max_concurrency as well as the 5 second identify limit between shards.

The manager should allow you to specify ids in the following formats:

an array of numbers (where each element will be spawned)
ranges you want to spawn
a number (maybe we shouldn’t allow this but always require an array or a range object?)

No matter the input, the manager should ensure the biggest id provided is smaller than the total shard count before attempting to create shards.

const manager = new WebSocketManager({
	shardCount: 100,
	// Shards can be a number (for one shard), an array of numbers (spawn all
	// shards with those ids), or a range object as shown below.
	// Internally the manager should parse this in a usable form that doesn't
	// require re-parsing.
	// Given the example below it should spawn all shards from id 0 to id 49
	// inclusive (in total this manager would manage 50 shards)
	shards: {
		start: 0,
		end: 49
	}
});

The manager should allow the users to decide how they want shards to be spawned: all in the current process, all in a worker_thread, each in its own worker_thread, or N per worker_thread.

Note! While the above is not a requirement for v1 of this module, it should be built in mind with an easy conversion to add support to it.

Inspirations from other modules we can consider for future versions

Possibly consider implementing an automatic tracker for shard count vs guild count, to allow automatic resharding based on guild count + user configuration
- Is this even something we want to do or should we just provide users with a function that will warn them once shards are approaching the limit, so they can take action?
- This could yield duplicated events in the worst case. Is that something that we consider a dealbreaker for this, if we add this?

WebSocketShard

This represents a connection to Discord for a given shard id and count. It should be able to self-manage as much as possible (for everything except identifies, updating the session info and broadcasting the DISPATCH payloads it receives, it should not need to interact with the manager).

On first connection, it should fetch the gateway url, version, compression method and session info for its shard configuration. If it receives session information from the manager, it should attempt to reconnect and resume. If that fails, it should enqueue an identify. It should follow the flow mentioned in Discord’s documentation.

The shard should support:

No compression, JSON encoding
No compression, ETF encoding (?)
zlib-sync compression, JSON encoding
zlib-sync compression, ETF encoding (even bigger ?)

The shard should tell the manager to update its session information for every payload it receives (to ensure the sequence is up to date) or, if more ideal, on a timer. This should be configurable by the user at the manager level, but no matter what it should always update its information for READY, RESUMED and INVALID_SESSION.

The shard should dispatch the DISPATCH payloads it receives to the manager, and the manager should emit it alongside the shard it came from. How users consume it from that point forward is their choice (redis, NATS, etc).

The shard may optionally dispatch all non-DISPATCH payloads it receives if the user enables a debug option. This is to be decided.

If a heartbeat ack hasn’t been received since the last heartbeat, the connection should be assumed as invalid/zombie and be immediately closed. You should not wait for a close event on the actual web socket connection. You should attempt to resume the session, and if that fails, enqueue an identify.

EDIT: The shard should prevent payloads from sending until the shard is READY. Only payloads that can bypass this check are HEARTBEAT, IDENTIFY, RESUME or anything thats strictly related to the auth flow of shard connections.

Anything else missing

If you think this RFC is missing any critical/crucial bits of information, let me know. I will add it to the main issue body with a bolded EDIT tag to ensure everyone sees it (if they don’t want to check the edited history).

Changelog

For the strategy for spawning shards, support for N shards per worker thread was added after internal discussions
Added shard payload send queue reference to prevent issues where someone sends payloads on an interval and the connection is not ready yet

The text was updated successfully, but these errors were encountered:

Holo-Buckshot · 2022-07-12T20:16:22Z

Possibly consider implementing an automatic tracker for shard count vs guild count, to allow automatic resharding based on guild count + user configuration
Is this even something we want to do or should we just provide users with a function that will warn them once shards are approaching the limit, so they can take action?
This could yield duplicated events in the worst case. Is that something that we consider a dealbreaker for this, if we add this?

This sounds like an all-around bad idea given the fact that a bot could lose or gain several servers at any given time. As for providing a function to notify them, it is of my personal opinion that the developer themself would know best when they need to add/remove shards. Having the websocket manager do this automatically could lead to awkward bot lag while the shards restart themselves or, in the case of bots that use a great many shards on a single server, cause them to be down for a significant period while the websockets reconnect themselves. You could always make it an option, like 'autoreconfigure' or something, but it should definitely default to false.

ckohen · 2022-07-12T20:22:29Z

There would be a significant over and undershoot before automatically changing shards. Kinda like an air conditioner.

Deivu · 2022-07-13T05:47:34Z

Worker threads per shard sounds like a good idea so parsing the data happens on the worker then just relay it back to the process where it would get used. This should improve heartbeat intervals, avoids delays on it as well as we are ensured that the thread will only handle websocket shards. Though the idea could be improved by instead of spawning one worker per shard, probably an x amount of shards (user provided or automatic) per worker so we could reduce the amount of process we spawn?

didinele · 2022-07-13T06:44:17Z

The manager should allow the users to decide how they want shards to be spawned: all in the current process, all in a worker_thread, each in its own worker_thread, or N per worker_thread.

It's already in the RFC.

didinele · 2022-07-15T10:33:12Z

todo for the future, we should find a way to support waiting for our next identify cross-manager (e.g. some storage mechanism like sessions)

vladfrangu added feature request gateway packages:ws labels Jun 12, 2022

vladfrangu added this to the ws v1 milestone Jun 12, 2022

vladfrangu pinned this issue Jun 12, 2022

kyranet mentioned this issue Jun 13, 2022

RFC(sharder): Scaling from the threads to the cores and beyond #8084

Open

didinele mentioned this issue Jul 9, 2022

feat: @discordjs/ws #8260

Merged

14 tasks

vladfrangu linked a pull request Jul 9, 2022 that will close this issue

feat: @discordjs/ws #8260

Merged

14 tasks

iCrawl closed this as completed in #8260 Jul 22, 2022

vladfrangu unpinned this issue Aug 7, 2022

github-actions bot locked as resolved and limited conversation to collaborators Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC(ws): On a Gateway to Hel...Discord #8083

RFC(ws): On a Gateway to Hel...Discord #8083

vladfrangu commented Jun 12, 2022 •

edited

Loading

Holo-Buckshot commented Jul 12, 2022

ckohen commented Jul 12, 2022

Deivu commented Jul 13, 2022

didinele commented Jul 13, 2022

didinele commented Jul 15, 2022

RFC(ws): On a Gateway to Hel...Discord #8083

RFC(ws): On a Gateway to Hel...Discord #8083

Comments

vladfrangu commented Jun 12, 2022 • edited Loading

WebSocketManager

WebSocketShard

Anything else missing

Changelog

Holo-Buckshot commented Jul 12, 2022

ckohen commented Jul 12, 2022

Deivu commented Jul 13, 2022

didinele commented Jul 13, 2022

didinele commented Jul 15, 2022

vladfrangu commented Jun 12, 2022 •

edited

Loading