Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Messages duplicates (channel.send) under 100% server CPU load #6692

Closed
driver-by opened this issue Sep 25, 2021 · 5 comments
Closed

Bug: Messages duplicates (channel.send) under 100% server CPU load #6692

driver-by opened this issue Sep 25, 2021 · 5 comments
Labels

Comments

@driver-by
Copy link

Issue description

I have an issue with messages duplication under high server CPU load. It happens not every time, hardly reproducible on my servers (reproduced once) but users of my bot report about 2-3-4 messages duplicates.
I checked plenty of things in my bot and think about some discord.js issue (or good thing went wrong), because I log related channel.send usage and in my logs, I get the message only once (I checked specifically cases when it was duplicated).
So the discord.js somehow sends more messages with one channel.send and it happens only when CPU is close to 100%.

I was on v12 and migrated to v13, the same story.
Current code:
channel.send({content: msg, embeds: embed ? [embed] : null})
in v12 it was
channel.send(msg, embed)

Additional info: I use retryLimit: 5 in the Client constructor. However, from my understanding, it applies to cases when the bot gets 5xx errors and retries it. So it shouldn't be related to the issue.

Codesample

No response

discord.js version

13.1.0

Node.js version

16.9.1

Operating system

Ubuntu 20.04.2 LTS

Priority this issue should have

Medium (should be fixed soon)

Which partials do you have configured?

No Partials

Which gateway intents are you subscribing to?

GUILD_MESSAGES

I have tested this issue on a development release

No response

@JMTK
Copy link
Contributor

JMTK commented Sep 25, 2021

What I've found usually happens is that the socket will disconnect when it can't process anything(since node is single threaded) and if you have retries on after it reconnects, it probably re-sends it. You might be able to easily replicate it by putting a breakpoint in a debugger for your code and stepping over the send call, waiting for 30 seconds and seeing if it sends multiple

@driver-by
Copy link
Author

@JMTK thanks for your input. I can't reproduce it with the debugger breakpoint, but trying to run my bot without the retryLimit. Will update if it helps.
@JMTK from your understanding of how it works. If I remove the retryLimit, I'll prevent duplicates but might have missed messages (because of my 100% CPU load), right?

@driver-by
Copy link
Author

I have to add that removing retryLimit: 5 didn't help.

@kyranet
Copy link
Member

kyranet commented Sep 28, 2021

I'd label this as an edge-case. It's never optimal to have the CPU at 100%, specially since it can lead to all kinds of unexpected behaviour, timeouts, high latency, etc.

We use a timer to automatically abort requests after a [configurable] amount of time, as seen here:

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), this.client.options.restRequestTimeout).unref();
return fetch(url, {
method: this.method,
headers,
agent,
body,
signal: controller.signal,
}).finally(() => clearTimeout(timeout));

When Node.js's event loop is very busy, the timers can be inaccurate and either reject very early, or reject very late, and besides that, the serialization and processing of the payload to be later sent by fetch() may take longer, as is the deserialization of the data received from Discord.

When the timer reaches its end, the request is automatically aborted, and if retryLimit is set to a large-enough value, it'll consider the request failed due to the packet being lost in Internet (which happens quite frequently).

That processing sadly happens independently of the fetch() call, and there is no way for us to avoid that, and this might be the issue you're having. You can also hit the same timer by sending very large payloads (+8MB files over a slow or congested network).

To solve the 100% CPU usage, I recommend:

  • Profiling your Node app, there are several profilers available out there, this will help you detect the code that takes the longest to run, so you can later optimise them.
  • Thread the slowest operations, modern OS's are able to load-balance, so even if you have 100% CPU, as long as your Node app has a very low CPU footprint, it'll be able to operate normally-ish.
  • Escalate vertically - get a more powerful server (or device), although this can be costly, it's an easy solution that can be considered regardless.
  • Escalate horizontally - get another server (or device) and micro-service the heaviest parts of your Node app. This is the most complicate approach, but overall very highly future-proof.

I hope you find this advice useful in solving your issue!

@driver-by
Copy link
Author

driver-by commented Sep 28, 2021

@kyranet thx for the detailed description 👍 It helps me to understand what's happening.
From my side, I'm doing a rework to create a microservice for a heavy job of my app. And want to run it on a separate server (or servers later) to keep a bot itself in peace to interact with the discord smoothly. When ready I can try to run it in a separate thread on the same server at least. I didn't think about your point 2, thx.

In general, I just wanted to understand if I can prevent it from happening while it's being reworked. However. it seems that it will be unreliable in any case. Will do the temp solution, running my heavy job not so often.

It's not really a bug, so the issue could be closed.

@kyranet kyranet closed this as completed Sep 28, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants