-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bot is unable to properly connect to certain Discord voice servers #714
Comments
sounds to me like there's a problem with the tel Aviv voice endpoint. bots and clients do not use the same api or same endpoint and we have no control over what http response code discord sends.
…On 18 Jun 2023, 17:11, at 17:11, KontraCity ***@***.***> wrote:
**Git commit reference**
cd5b9ca
**Describe the bug**
The problem is that `on_voice_ready` callback is not called when bot
connects to certain voice channels.
Example of successful voice server connection:
1. DPP logs show this:
`DEBUG: Sending op 4 to join VC, guild xxx channel xxx`
`DEBUG: Connecting voice for guild xxx channel xxx`
`DEBUG: Connecting new voice session...`
`DEBUG: Voice websocket established; UDP endpoint: xxx.xxx.xxx.xxx:xxx
[ssrc=xxx] with 7 modes`
`DEBUG: External IP address: xxx.xxx.xxx.xxx`
2. Bot connects with following request:
`GET /?v=4 HTTP/1.1`
`Host: russia8508.discord.media`
`pragma: no-cache`
`User-Agent: DPP/0.1`
`Upgrade: WebSocket`
`Connection: Upgrade`
`Sec-WebSocket-Key: xxx`
`Sec-WebSocket-Version: 13`
3. Server responds with HTTP 101 response:
`HTTP/1.1 101 Switching Protocols`
`Connection: upgrade`
`sec-websocket-accept: xxx`
`upgrade: websocket`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
4. Server sends JSON payload:
`1{"op":8,"d":{"v":4,"heartbeat_interval":13750.0}}`
`Connection: upgrade`
`sec-websocket-accept: xxx`
`upgrade: websocket`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
5. Connection proceeds. `on_voice_ready` callback is called.
Example of erroneous voice server connection in different Discord
guild:
1. DPP logs show this:
`DEBUG: Sending op 4 to join VC, guild xxx channel xxx`
`DEBUG: Connecting voice for guild xxx channel xxx`
Nothing happens after that.
2. Bot connects with following request:
`GET /?v=4 HTTP/1.1`
`Host: tel-aviv10001.discord.media`
`pragma: no-cache`
`User-Agent: DPP/0.1`
`Upgrade: WebSocket`
`Connection: Upgrade`
`Sec-WebSocket-Key: xxx`
`Sec-WebSocket-Version: 13`
3. Server responds with HTTP 400 response:
`HTTP/1.1 400 Bad request`
`Content-Type: text/html`
`Transfer-Encoding: chunked`
`Connection: keep-alive`
`Cache-Control: no-cache`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
`5a`
`<html><body><h1>400 Bad request</h1>`
`Your browser sent an invalid request.`
`</body></html>`
4. By design of `dpp::discord_voice_client::thread_run()` function bot
tries to reconnect. Steps 2 and 3 are repeated infinitely.
It may be important to mention that in Discord client bot actually
connects to voice channel and sits there even when HTTP 400 error
occurs.
**To Reproduce**
Issue appears to only occur when bot connects to
`tel-avivXXXXX.discord.media` Discord voice servers. The problematic
servers do work in Discord desktop clients, so I think the problem is
with the library.
Steps to reproduce the behavior:
1. Connect to voice channel that is served by
`tel-avivXXXXX.discord.media` voice server.
2. Make the bot connect to it, for example call
`dpp::guild::connect_member_voice()`.
3. Observe that `on_voice_ready` callback is never called although bot
connects to voice channel in Discord UI.
**Expected behavior**
Bot should properly connect to voice server and `on_voice_ready`
callback should be called.
**System Details:**
- Tried it on both
`Linux PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP x86_64 x86_64 x86_64
GNU/Linux`
and
`Linux rpi4 6.1.21-v8+ #1642 SMP PREEMPT aarch64 GNU/Linux`
OSes.
- Desktop Discord Client was used for testing
Some of my assumptions/terms may be wrong because I tried to get how
DPP source code works manually.
--
Reply to this email directly or view it on GitHub:
#714
You are receiving this because you were assigned.
Message ID: ***@***.***>
|
the 400 error is actually cloudflare error not discord error btw.
…On 18 Jun 2023, 17:11, at 17:11, KontraCity ***@***.***> wrote:
**Git commit reference**
cd5b9ca
**Describe the bug**
The problem is that `on_voice_ready` callback is not called when bot
connects to certain voice channels.
Example of successful voice server connection:
1. DPP logs show this:
`DEBUG: Sending op 4 to join VC, guild xxx channel xxx`
`DEBUG: Connecting voice for guild xxx channel xxx`
`DEBUG: Connecting new voice session...`
`DEBUG: Voice websocket established; UDP endpoint: xxx.xxx.xxx.xxx:xxx
[ssrc=xxx] with 7 modes`
`DEBUG: External IP address: xxx.xxx.xxx.xxx`
2. Bot connects with following request:
`GET /?v=4 HTTP/1.1`
`Host: russia8508.discord.media`
`pragma: no-cache`
`User-Agent: DPP/0.1`
`Upgrade: WebSocket`
`Connection: Upgrade`
`Sec-WebSocket-Key: xxx`
`Sec-WebSocket-Version: 13`
3. Server responds with HTTP 101 response:
`HTTP/1.1 101 Switching Protocols`
`Connection: upgrade`
`sec-websocket-accept: xxx`
`upgrade: websocket`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
4. Server sends JSON payload:
`1{"op":8,"d":{"v":4,"heartbeat_interval":13750.0}}`
`Connection: upgrade`
`sec-websocket-accept: xxx`
`upgrade: websocket`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
5. Connection proceeds. `on_voice_ready` callback is called.
Example of erroneous voice server connection in different Discord
guild:
1. DPP logs show this:
`DEBUG: Sending op 4 to join VC, guild xxx channel xxx`
`DEBUG: Connecting voice for guild xxx channel xxx`
Nothing happens after that.
2. Bot connects with following request:
`GET /?v=4 HTTP/1.1`
`Host: tel-aviv10001.discord.media`
`pragma: no-cache`
`User-Agent: DPP/0.1`
`Upgrade: WebSocket`
`Connection: Upgrade`
`Sec-WebSocket-Key: xxx`
`Sec-WebSocket-Version: 13`
3. Server responds with HTTP 400 response:
`HTTP/1.1 400 Bad request`
`Content-Type: text/html`
`Transfer-Encoding: chunked`
`Connection: keep-alive`
`Cache-Control: no-cache`
`CF-Cache-Status: DYNAMIC`
`Report-To:
{"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}`
`NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}`
`Strict-Transport-Security: max-age=15552000; includeSubDomains;
preload`
`X-Content-Type-Options: nosniff`
`Server: cloudflare`
`CF-RAY: xxx-DME`
`5a`
`<html><body><h1>400 Bad request</h1>`
`Your browser sent an invalid request.`
`</body></html>`
4. By design of `dpp::discord_voice_client::thread_run()` function bot
tries to reconnect. Steps 2 and 3 are repeated infinitely.
It may be important to mention that in Discord client bot actually
connects to voice channel and sits there even when HTTP 400 error
occurs.
**To Reproduce**
Issue appears to only occur when bot connects to
`tel-avivXXXXX.discord.media` Discord voice servers. The problematic
servers do work in Discord desktop clients, so I think the problem is
with the library.
Steps to reproduce the behavior:
1. Connect to voice channel that is served by
`tel-avivXXXXX.discord.media` voice server.
2. Make the bot connect to it, for example call
`dpp::guild::connect_member_voice()`.
3. Observe that `on_voice_ready` callback is never called although bot
connects to voice channel in Discord UI.
**Expected behavior**
Bot should properly connect to voice server and `on_voice_ready`
callback should be called.
**System Details:**
- Tried it on both
`Linux PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP x86_64 x86_64 x86_64
GNU/Linux`
and
`Linux rpi4 6.1.21-v8+ #1642 SMP PREEMPT aarch64 GNU/Linux`
OSes.
- Desktop Discord Client was used for testing
Some of my assumptions/terms may be wrong because I tried to get how
DPP source code works manually.
--
Reply to this email directly or view it on GitHub:
#714
You are receiving this because you were assigned.
Message ID: ***@***.***>
|
I tried using another bot with the endpoint and it works. Sounds like the problem is not with it or bot API to me. |
it could be related to this: discord/discord-api-docs#6145 (comment) an error 400 is not something we can prevent or control and is generated by cloudflare not discord. your IP might be shadow banned from bot API perhaps. bot voice uses different endpoints to the user client and different API versions. |
Tried running the bot under VPN to another country, checked that
Not running in "Cloudflare Worker" and the mentioned issue is about authentication error rather than bad request, unlikely to help here. |
I think this might also be the cause of |
where in the code in dpp does it open the new descriptor? |
What i understand from here is that it caches the socket connection if |
no, these keepalives are never used right now in dpp (and when they were used, were not used and are not valid for a websocket connection upgrade) |
yeah had to double check but socket reuse is disabled by default and nothing enables it |
ah i see, the |
can anyone actually duplicate this anywhere but the tel aviv endpoint? if its specific to one endpoint it smells of discord bug, so not much we can do. |
Yes, I am able to reproduce the problem. Let me show how: BotWe'll make a small bot for testing:
A fresh bot user and guild may be created for cleanliness of testing, but I don't think that this is necessary. LibraryWe'll modify library code to show us HTTP response information:
An additional OutputWith this setup:
is considered to be successful output;
is considered to be erroneous output. TestingRegion overriding in voice channel settings can be used to cycle bot through several voice servers.
My resultsaarch64
x86_64
VPN bonus to eliminate IP problems possibility
I can tell that the problem is not unique to Tel-Aviv servers nor to any region. It is rather a lottery and I can't come up with a solution. |
look at the geography of the majority of these places... nearly all are Oceania. it smells of a problem with cloudflare to me. |
Why other bots work then? The servers dislike the request if it is |
if this was the case they would all be broken |
This issue has had no activity and is being marked as stale. If you still wish to continue with this issue please comment to reopen it. |
bump |
So, we can rule out tel-aviv because, as OP mentioned, it's experimental. It's likely discord doesn't want many people touching that until they get more answers. For the other regions, here are my test results (Recorded from the UK):
From this, I'm assuming anything above or equal to 142ms, is random as to if you can join. I'd like to even say India with 131ms could possibly do this too. Automated testing is required to get more answers, but I'm assuming discord can't handle bots that are too far away, requiring you to get shards and deploy them in different locations. This data, ofc, isn't accurate. It could be more or less. I'll do more research and I'll find out what's going on. |
After further discussion in Discord to try and find the issue out, it seems that it's not a library issue at all, my previous assumption is not entirely true either. While doing even more testing, we found that this can happen on any location, at any endpoint. More specifically, I had the rotterdam endpoint Connection definitely plays a part, but this definitely seems like a Discord issue and not a D++ issue. |
#883 now stops this from infinite looping. It's not a fix, but it improves the logging (so DPP will natively tell you about the http error code) and, like I said at the start of this post, stops the infinite loop happening that you mentioned :) |
we have reached the limit of what we can do to fix this. I've raised it with discord and if they say there is anything we can do, i shall reopen this issue. |
The cause of the error was so simple that it wasn't worth this long discussion.
Almost all Discord servers ignore this field, but the problematic ones notice the problem and respond with
seems to work, although I didn't test it in every region. |
Bit of a harsh reply, the long discussion was required. It's not an easy issue to fix and I needed to narrow down the issues. I tested adding the WebSocket key when I was debugging the issue and it didn't work, it's possible it's generated wrong but upon looking at other libraries (like I mentioned I did in my previous comment), they don't use this to communicate via audio either. If you want to be certain, run the same test you ran before in all regions and see if it's fixed. A PR is more than welcome if it's fixed! |
if this was the concrete problem, no websockets would ever connect. all dpp websockets use this key as it is not checked by dpp. I have a feeling that implementing this fix will not actually solve the problem. |
Using the same test bot, test code for library and temporary fix, although I had to rename one variable to avoid name conflict with new
Test result:
Make sure it doesn't look like this: (this was my initial error)
because it compiles to this:
You can always double check:
|
seems like a legit fix, will test that on Musicat sometime later |
thanks for the testing and diagnosis; it looks to me that perhaps it is only failing when the random number we used as a key caused an error when passed through a base64 decoder. this would mean it would be time based. I don't think it needs any further thought though. did you use a constant for the websocket key? as shown in the example? or is it a base64 version of the time? I think either would work. |
I used the constant from my first reply. I don't think it really matters, as it is not very important and afaik DPP doesn't check server response to the key, but it's a dirty fix, should be done properly |
👏 👏 👏 , much love, well done! |
this can now be closed but on discussion we dont know if this should be reported to cloudflare. There is an issue here of RFC non compliance which lies on cloudflare's doorstep. If you read chapter 6 carefully which I did, including the example below it, cloudflare or any RFC compliant websocket server endpoint should not attempt to base64 decode the websocket-sec-key. cloudflare should take the websocket-src-key as is, like raw text,.prepend it to a constant GUID, sha1 sum the resulting string and pass that resulting string back to us in the 101 switching protocols reply. what it seems to do instead is attempt to base64 decode the string, then convert the output to utf8. if the conversion fails and contains an invalid encoding then you get an error 400. what's more only certain endpoints implement this jank. it seems to me to be some kind of check for bad web actors that is broken. the only mistake we made was assuming the other end will follow the word of the RFC to the letter which clearly gives an explicit example of "leave this text alone and just put it here". people never learn. anyhow many thanks for your efforts and patience with this issue. if you're ever on the discord server ping me and I'll be happy to give you a contributor role. |
Unfortunately commit 87ec1a1eb5cc03a304d0a211dac51b7972823fd1 didn't fix Musicat failing to connect to US West and Japan region, Rotterdam was also a region which Musicat can't connect to a few week ago but it successfully connect now, US West was working until lately. Did you also test with the old non-base64 header along with your fix test 12 hours ago @KontraCity ? It seemingly just happen randomly, might be caused by heavy traffic or simply network issue and the header might not play a role in this since we don't know if the server actually check and validate the header, or the fix isn't really correct, who knows. Here's some log of Musicat trying to connect to voice channel with US West region track position: 'Oh Oh Oh Sexy Vampire ft Fright Ranger (S3RL vs Justin B remix) - Disko Warp' 0
[INFO] New thread spawned: 413484736
[INFO] Total active thread: 6
[2023-09-29 10:11:54] DEBUG: Sending op 4 to join VC, guild 1133586655811481770 channel 1133586656478371883
[THUMB HQ_MAX] Inserted thumb: 'http://i3.ytimg.com/vi/Q-2T0K3u_3E/maxresdefault.jpg'
[THUMB HQ_MAX] Inserted thumb: 'http://i3.ytimg.com/vi/Q-2T0K3u_3E/hqdefault.jpg'
[Manager::clear_connecting]: 1133586655811481770
[INFO] Thread done: 413484736
[INFO] Total active thread: 6
[2023-09-29 10:11:55] DEBUG: Connecting voice for guild 1133586655811481770 channel 1133586656478371883
[2023-09-29 10:11:56] WARN: Received unhandled code: HTTP/1.1 400 Bad request
[2023-09-29 10:11:56] DEBUG: Attempting to reconnect the websocket...
[INFO] Total thread done: 1
[INFO] Total thread done and joined: 1
[INFO] Total active thread: 5
[2023-09-29 10:11:57] WARN: Received unhandled code: HTTP/1.1 400 Bad request
[2023-09-29 10:11:57] DEBUG: Attempting to reconnect the websocket...
[2023-09-29 10:11:57] WARN: Received unhandled code: HTTP/1.1 400 Bad request
[2023-09-29 10:11:57] DEBUG: Attempting to reconnect the websocket...
[2023-09-29 10:11:58] WARN: Received unhandled code: HTTP/1.1 400 Bad request
[2023-09-29 10:11:58] DEBUG: Attempting to reconnect the websocket...
[2023-09-29 10:11:59] WARN: Received unhandled code: HTTP/1.1 400 Bad request
[2023-09-29 10:11:59] WARN: Reached max loops whilst attempting to read from the websocket. Aborting websocket.
[Manager::clear_wait_vc_ready]: 1133586655811481770
[Manager::clear_connecting]: 1133586655811481770
[Manager::set_vc_ready_timeout WARN] Connection timeout
[2023-09-29 10:12:04] DEBUG: Disconnecting voice, guild: 1133586655811481770
[INFO] Thread done: 367343296
[INFO] Total active thread: 5
[EVENT] on_voice_state_leave: 1133586655811481770
[INFO] Total thread done: 1
[INFO] Total thread done and joined: 1
[INFO] Total active thread: 4 |
I just tried Japan region again and now it connects, US West still failing |
Because the
In DPP code
bytes. |
I don’t know how to make a PR, but this would my C++11 solution:
|
I took it from |
I tested, this:
works, but dirty because constant. |
Can confirm this finally works.
|
Git commit reference
cd5b9ca
Describe the bug
The problem is that
on_voice_ready
callback is not called when bot connects to certain voice channels.Example of successful voice server connection:
DEBUG: Sending op 4 to join VC, guild xxx channel xxx
DEBUG: Connecting voice for guild xxx channel xxx
DEBUG: Connecting new voice session...
DEBUG: Voice websocket established; UDP endpoint: xxx.xxx.xxx.xxx:xxx [ssrc=xxx] with 7 modes
DEBUG: External IP address: xxx.xxx.xxx.xxx
GET /?v=4 HTTP/1.1
Host: russia8508.discord.media
pragma: no-cache
User-Agent: DPP/0.1
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Key: xxx
Sec-WebSocket-Version: 13
HTTP/1.1 101 Switching Protocols
Connection: upgrade
sec-websocket-accept: xxx
upgrade: websocket
CF-Cache-Status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}
Strict-Transport-Security: max-age=15552000; includeSubDomains; preload
X-Content-Type-Options: nosniff
Server: cloudflare
CF-RAY: xxx-DME
1{"op":8,"d":{"v":4,"heartbeat_interval":13750.0}}
Connection: upgrade
sec-websocket-accept: xxx
upgrade: websocket
CF-Cache-Status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}
Strict-Transport-Security: max-age=15552000; includeSubDomains; preload
X-Content-Type-Options: nosniff
Server: cloudflare
CF-RAY: xxx-DME
on_voice_ready
callback is called.Example of erroneous voice server connection in different Discord guild:
DEBUG: Sending op 4 to join VC, guild xxx channel xxx
DEBUG: Connecting voice for guild xxx channel xxx
Nothing happens after that.
GET /?v=4 HTTP/1.1
Host: tel-aviv10001.discord.media
pragma: no-cache
User-Agent: DPP/0.1
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Key: xxx
Sec-WebSocket-Version: 13
HTTP/1.1 400 Bad request
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Cache-Control: no-cache
CF-Cache-Status: DYNAMIC
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=xxx"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0.01,"report_to":"cf-nel","max_age":604800}
Strict-Transport-Security: max-age=15552000; includeSubDomains; preload
X-Content-Type-Options: nosniff
Server: cloudflare
CF-RAY: xxx-DME
5a
<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>
dpp::discord_voice_client::thread_run()
function bot tries to reconnect. Steps 2 and 3 are repeated infinitely.It may be important to mention that in Discord client bot actually connects to voice channel and sits there even when HTTP 400 error occurs.
To Reproduce
Issue appears to only occur when bot connects to
tel-avivXXXXX.discord.media
Discord voice servers. The problematic servers do work in Discord desktop clients, so I think the problem is with the library.Steps to reproduce the behavior:
tel-avivXXXXX.discord.media
voice server.dpp::guild::connect_member_voice()
.on_voice_ready
callback is never called although bot connects to voice channel in Discord UI.Expected behavior
Bot should properly connect to voice server and
on_voice_ready
callback should be called.System Details:
Linux PC 5.15.90.1-microsoft-standard-WSL2 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
and
Linux rpi4 6.1.21-v8+ #1642 SMP PREEMPT aarch64 GNU/Linux
OSes.
Some of my assumptions/terms may be wrong because I tried to get how DPP source code works manually.
The text was updated successfully, but these errors were encountered: