Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix: having 5 bridged rooms results to M_LIMIT_EXCEEDED: Too Many Requests and failing bridge / potentially incompatible with Matrix Synapse 1.19.0 #1201

Closed
Mikaela opened this issue Aug 20, 2020 · 21 comments
Labels

Comments

@Mikaela
Copy link

Mikaela commented Aug 20, 2020

Describe the bug

I have 5 rooms that are bridged to Matrix. When I start matterbridge, it will fail to connect to Matrix due to

time="2020-08-20T07:24:39Z" level=error msg="Bridge matrix.feneas failed to join channel: contents=[123 34 101 114 114 99 111 100 101 34 58 34 77 95 76 73 77 73 84 95 69 88 67 69 69 68 69 68 34 44 34 101 114 114 111 114 34 58 34 84 111 111 32 77 97 110 121 32 82 101 113 117 101 115 116 115 34 44 34 114 101 116 114 121 95 97 102 116 101 114 95 109 115 34 58 57 55 50 56 125] msg=Failed to POST JSON to /_matrix/client/r0/join/!REDACTED:matrix.org code=429 wrapped=M_LIMIT_EXCEEDED: Too Many Requests" func=disableBridge file="gateway/router.go:111" prefix=router

I think this is related to homeserver having recently updated to Synapse 1.19.0

To Reproduce

  1. Configure 5 channels.
  2. Start matterbridge

Expected behavior

Matterbridge starts normally.

Screenshots/debug logs

time="2020-08-20T07:24:39Z" level=error msg="Bridge matrix.feneas failed to join channel: contents=[123 34 101 114 114 99 111 100 101 34 58 34 77 95 76 73 77 73 84 95 69 88 67 69 69 68 69 68 34 44 34 101 114 114 111 114 34 58 34 84 111 111 32 77 97 110 121 32 82 101 113 117 101 115 116 115 34 44 34 114 101 116 114 121 95 97 102 116 101 114 95 109 115 34 58 57 55 50 56 125] msg=Failed to POST JSON to /_matrix/client/r0/join/!REDACTED:matrix.org code=429 wrapped=M_LIMIT_EXCEEDED: Too Many Requests" func=disableBridge file="gateway/router.go:111" prefix=router

Environment (please complete the following information):

  • OS: Debian 9.13
  • Matterbridge version: 1.18.0 e3d8fe4

Additional context

  • The homeserver was recently updated to Synapse 1.19.0
  • I don't have config file on this machine, please ask if it's really needed.
  • I think Matterbridge may be misbehaving by attempting to join the 5 rooms on start while you only need to join rooms once (unless you leave or get kicked), so it should just detect that it has joined those rooms already rather than trying to rejoin every start?
@Mikaela Mikaela added the bug label Aug 20, 2020
@Mikaela Mikaela changed the title Matrix: having 5 bridged rooms results to M_LIMIT_EXCEEDED: Too Many Requests and failing bridge Matrix: having 5 bridged rooms results to M_LIMIT_EXCEEDED: Too Many Requests and failing bridge / potentially incompatible with Matrix Synapse 1.19.0 Aug 20, 2020
@42wim
Copy link
Owner

42wim commented Aug 20, 2020

Thanks for the debugging info 👍

It also needs to join the rooms to get the id's back to have the channel mapping.

Could you try putting a time.Sleep(time.Second) above this line https://github.com/42wim/matterbridge/blob/master/bridge/matrix/matrix.go#L63 to see if this slowdown fixes it?

@Mikaela
Copy link
Author

Mikaela commented Aug 21, 2020

It also needs to join the rooms to get the id's back to have the channel mapping.

Shouldn't it just perform a sync and get list of the rooms where it's already in? https://matrix.org/docs/guides/client-server-api#getting-all-state

Could you try putting a time.Sleep(time.Second) above this line https://github.com/42wim/matterbridge/blob/master/bridge/matrix/matrix.go#L63 to see if this slowdown fixes it?

I am not sure, I haven't managed to compile Matterbridge before and my server definitely isn't capable of it. I will need to take a look later.

@JeremyRand
Copy link
Contributor

Running into this issue too.

Could you try putting a time.Sleep(time.Second) above this line https://github.com/42wim/matterbridge/blob/master/bridge/matrix/matrix.go#L63 to see if this slowdown fixes it?

This didn't help for me, but adding a 10-second sleep instead got it working.

@Mikaela

This comment has been minimized.

@JeremyRand
Copy link
Contributor

Would that be time.Sleep(10) ?

time.Sleep(10 * time.Second)

I also needed to add "time" to the import list at the top of the file. Hope this helps you work around it. (I doubt that this is really the correct fix; the bridge really should be able to detect the error and retry, but for a quick and dirty workaround it's gotten my bridge back online.)

Mikaela added a commit to Mikaela/matterbridge that referenced this issue Aug 23, 2020
Co-authored-by: 42wim <wim@42.be>
Co-authored-by: JeremyRand <jeremyrand@airmail.cc>
@Sorunome
Copy link

It also needs to join the rooms to get the id's back to have the channel mapping.

there's an API endpoint for that: https://matrix.org/docs/spec/client_server/latest#get-matrix-client-r0-joined-rooms

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

Sure and what about bots that haven't joined rooms and need to join the rooms :)
Those ratelimit defaults on matrix seem very strict, only 0.1 join per second allowed for the local rooms and 0.01 per second for remote rooms? that seems insane?

@Sorunome
Copy link

Sure and what about bots that haven't joined rooms and need to join the rooms :)
Those ratelimit defaults on matrix seem very strict, only 0.1 join per second allowed for the local rooms and 0.01 per second for remote rooms? that seems insane?

The idea is: On startup you fetch all joined rooms, so that you don't attempt to re-join those rooms on startup

Then, when joining new rooms you just send the join there. Then, you handle M_LIMIT_EXCEEDED and re-try after the specified retry_after_ms plus like a second or so just to be sure? and then if you aren't able to join after e.g. 5 tries or so you can consider it failed.

@Mikaela
Copy link
Author

Mikaela commented Aug 23, 2020

Sure and what about bots that haven't joined rooms and need to join the rooms :)

As matterbridge cannot register an account by itself, would it be OK to just have documentation telling the user to login as the bot and join the room? I have been doing that anyway to create new rooms and ensure that the bot has full power there so that if I am unavailable and someone else in organisation needs urgent access, they can use the bot account to grant themselves power or manage abusers.

@Sorunome
Copy link

Those ratelimit defaults on matrix seem very strict, only 0.1 join per second allowed for the local rooms and 0.01 per second for remote rooms? that seems insane?

ratelimits are actually configured per-server, a server admin is free to configure their own rate limits however they see fit.

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

The idea is: On startup you fetch all joined rooms, so that you don't attempt to re-join those rooms on startup

Then, when joining new rooms you just send the join there. Then, you handle M_LIMIT_EXCEEDED and re-try after the specified retry_after_ms plus like a second or so just to be sure? and then if you aren't able to join after e.g. 5 tries or so you can consider it failed.

Ok, feel free to send a PR for this, the library I use is a fork of gomatrix, can be found here: https://github.com/matterbridge/gomatrix/commits/work

I also accept PR's there.

@Sorunome
Copy link

The idea is: On startup you fetch all joined rooms, so that you don't attempt to re-join those rooms on startup
Then, when joining new rooms you just send the join there. Then, you handle M_LIMIT_EXCEEDED and re-try after the specified retry_after_ms plus like a second or so just to be sure? and then if you aren't able to join after e.g. 5 tries or so you can consider it failed.

Ok, feel free to send a PR for this, the library I use is a fork of gomatrix, can be found here: https://github.com/matterbridge/gomatrix/commits/work

I also accept PR's there.

Sorry, soru does not use matterbridge.

The retry logic could be something like this (example is typescript stuff):

async function safeCall<T>(call: () => Promise<T>, tries: number = 0): Promise<T> {
    try {
        // call the function
        return await call();
    } catch (err) {
        const errObj = err.error || err.body; // fetch the json'ified error body
        if (!errObj || errObj.errcode !== "M_LIMIT_EXCEEDED" || tries > RETRY_MAX_TRIES) {
            // we retried too many times, through the error
            throw err;
        }
        // fetch in how many milliseconds we need to wait, together with a retry tolerange and a fallback
        const retryMs = typeof errObj.retry_after_ms === "number" ? (errObj.retry_after_ms + RETRY_TOLERANCE)
            : RETRY_DEFAULT_TIMEOUT;
        // wait the milliseconds and retry then
        return new Promise<T>((resolve, reject) => {
            setTimeout(async () => {
                try {
                    const ret = await Util.safeCall<T>(call, tries + 1);
                    resolve(ret);
                } catch (err) {
                    reject(err);
                }
            }, retryMs);
        });
    }
}

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

Is this change already rolled out on matrix.org?

@fire219
Copy link

fire219 commented Aug 23, 2020

Is this change already rolled out on matrix.org?

Yes it is. I just ran into this issue after having to restart matterbridge. My deployment of it probably one of the larger ones (for Pine64; 8 channels spread across 4 platforms each), so I just had to custom-compile matterbridge with the time.Sleep(10 * time.Second) workaround.

The workaround works, but having to wait an additional 80 seconds for the bridge to come up is not ideal. It's better than our network being broken, though. 😅

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

sigh, that really sucks that they roll this out so quickly and don't have their own libraries support the rate-limiting

@fire219
Copy link

fire219 commented Aug 23, 2020

From personal experience, I can say that the general consensus in that community is that the matrix-appservice bots should used for any sort of bridging. Most of those appservice bots use the Node libraries, which probably have gotten updates to match Synapse v1.19.

I've gotten plenty of complaints from my use of Matterbridge, but running 4 discrete bridges sounds like a nightmare to me...

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

it's not just bridging, any "normal" client that just joins rooms will get issues.

@Sorunome
Copy link

Sorunome commented Aug 23, 2020 via email

@Sorunome
Copy link

Sorunome commented Aug 23, 2020 via email

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

@Sorunome yes, well guess what if I query joined_rooms I get ID's and not the room aliases :)

So now I have to query for every joined room, even if it's not in the matterbridge config a GET /_matrix/client/r0/rooms/{roomId}/aliases (which isn't supported in the gomatrix library) and also is rate-limited.

So the netto result is that I'll have to do at least an extra query to get the same information.

I don't understand that they also limit the joining rooms you're already a member of, that shouldn't take that much resources.

@42wim
Copy link
Owner

42wim commented Aug 23, 2020

released v1.18.1 with contains this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants