Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out flood of member & hidden event spam when we detect the scrollback is full of it #1339

Open
MadLittleMods opened this issue Feb 2, 2023 · 24 comments
Labels
A-Performance A-Timeline O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Enhancement Z-MadLittleMods

Comments

@MadLittleMods
Copy link

MadLittleMods commented Feb 2, 2023

Originally opened as a element-web issue (on 2022-04-22) that was incorrectly moved to #491 and then a discussion

Your use case

Why would you like to do it?

Rooms can be overwhelmed by bulk spam users joining rooms (thousands and thousands). Each one of those joins and leaves creates an event in the timeline.

Currently in rooms like this and trying to scrollback, you just get stuck on the thousands of member events that we only paginate 20 at a time. Each request is so slow and it doesn't even get me further back in actual results I want to see.

The goal of this change is to make the room scrollback usable again and be able to view the history of the room. Otherwise, when these spam incidents occur, that whole time period in the room is essentially a black hole.

What would you like to do? / How would you like to achieve it?

When we detect that the whole /messages response is filled with m.room.member join, leave, and invite events, we can ask the user whether they want to continue scrolling back without them. If they accept, we should add a filter to /messages to not include them.

Here is a mockup of what the user prompt could look like: "It looks like you're paginating through a lot of member events, would you like to scrollback without them?"


Another option is to automatically start back-paginating by a much bigger value (500).


Another option is to use MSC3030 jump to date to jump past all of the messages. Behind the scenes, we could use /messages with a filter to find the spot and then jump.

Have you considered any alternatives?

It's possible to hide all join/leave messages in the timeline with Settings -> Preferences -> Timeline section -> toggle the Show join/leave messages (invites/removes/bans unaffected) (showJoinLeaves) setting. But this just affects the display of the event. It doesn't help with filtering them out of the /messages pagination requests to being with to speed things up and get to the results we care about.

Additional context

  • Hide member events
  • Filter out member events when paginating /messages
  • Scrollback should filter member events when there is too many
  • Scrollback is slow and filled with member events
  • Flood of member state spam
  • Filter out bulk spam member events when we detect the scrollback is full of them
@MadLittleMods MadLittleMods changed the title .. Filter out flood of member & hidden event spam when we detect the scrollback is full of it Feb 2, 2023
@MadLittleMods MadLittleMods added A-Performance A-Timeline S-Major Severely degrades major functionality or product features, with no satisfactory workaround O-Occasional Affects or can be seen by some users regularly or most users rarely Z-MadLittleMods labels Feb 2, 2023
@MadLittleMods
Copy link
Author

MadLittleMods commented Feb 2, 2023

Re-opening here as we're seeing this in the case of Gitter rooms where I synced the room membership. Giant blocks of membership that are impossible to paginate past. My original proposal still seems reasonable to me but this is really just tracking the problem with one potential solution of many.


In the last issue, @t3chguy noted some caveats with historical profiles not working correctly if we skip fetching membership events. And also affects push rules since historical profiles are needed to evaluate if a given message pings. These seem minor in comparison to the room being unnavigable though. And are technical problems we can overcome like with using MSC3952 for intentional mentions or just ignore the problem since the chance of changing your profile (and receiving a notification) in a scenario like this is probably very small so you probably won't miss any notifications anyway.

Related:

@t3chguy t3chguy added the X-Needs-Product More input needed from the Product team label Feb 3, 2023
@Johennes
Copy link
Contributor

Johennes commented Feb 7, 2023

This also impacts things such as live location sharing. In that case, however, the events are in the timeline and possibly encrypted. It would be nice if we could handle these cases with the same approach but I think the /messages filter wouldn't work there, right?

@ekpyron
Copy link

ekpyron commented Feb 7, 2023

Not entirely sure, if what I'm seeing is covered by this issue here (see also element-hq/roadmap#26 (comment)) - but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im). I got multiple people to confirm this using element desktop and element web. I.e. ever since the mass joining of gitter users, the room remains pretty much unusable via element.

@andybalaam
Copy link
Member

This is having an impact on the Gitter migration, so might need prioritising @daniellekirkwood @Johennes

@MadLittleMods
Copy link
Author

MadLittleMods commented Feb 7, 2023

but even with the setting mentioned in the description of this issue here toggled, element is maxing out at 100% CPU usage remaining pretty much unusable after the mass joining of gitter users in a large gitter channel I'm using (https://gitter.im/ethereum/solidity / https://matrix.to/#/#ethereum_solidity:gitter.im).

@ekpyron Please note that the setting mentioned in the issue won't help at all. As mentioned in the description "[that setting] just affects the display of the event. It doesn't help with filtering them out of the /messages pagination requests to being with to speed things up and get to the results we care about."

@andybalaam
Copy link
Member

andybalaam commented Feb 8, 2023

Investigating this today. The first thing is to establish the behaviour, since we thought there might be a bug where we don't actually keep back-paginating when we should.

I've created a room on my local synapse with 10000 hidden events followed by some chat messages. When Element Web tries to display the room, it does keep making requests to the messages API, but they get slower and slower until it seems to grind to a halt.

Looking at the actual requests from my Synapse, according to Firefox they are taking ~9ms consistently, so the slowdown is in the client.

@andybalaam
Copy link
Member

When I set the limit to 2000 instead of 20, I got responses of size 1000, presumably due to a Synapse limit.
After the first 1000 were received, Element Web slowed to a halt and didn't request the next batch for a long time.
This is the most important thing to investigate, I think.

@andybalaam
Copy link
Member

andybalaam commented Feb 8, 2023

Running it through the Firefox profiler, I see almost all the time is spent inside decryptGroupMessage, and actually in the WASM code of olm. This might be a red herring, or at least a different problem, because I am assuming the Gitter rooms are unencrypted (right?) so I'm going to try this again with an unencrypted room.

@andybalaam
Copy link
Member

Without encryption, Element Web appears to be loading the hidden event at a rate of 1000 per 7 seconds, which is more reasonable (if not great).

@andybalaam
Copy link
Member

Although having said that, even when its count of events has reached 10K, it's still processing very heavily and mostly unusable for several minutes. Trying to get a profile.

@andybalaam
Copy link
Member

It appears to be calling processSyncResponse ~1000 times / second and doSync ~500 times / second

Profile is here: https://share.firefox.dev/3JWEZ7b

@andybalaam
Copy link
Member

5-10 minutes later it's still unresponsive.

@andybalaam
Copy link
Member

andybalaam commented Feb 9, 2023

Nope, I misread the profile. It's spending a lot of time inside doSync, but not (necessarily) calling it a lot.

@andybalaam
Copy link
Member

MessagePanel.shouldShowEvent is being called many, many times, taking ~20ms per time. Also Room.eventShouldLiveIn inside there.

@andybalaam
Copy link
Member

Something is happening repeatedly for all 10K events whenever we re-render.

In MessagePanel.getTiles we have only 5 events (in my test case) so it must be above there.

@andybalaam
Copy link
Member

I have a test that crashes node:

In MessagePanel-test.tsx:

    it("should handle large numbers of hidden events quickly", () => {
        const events = [];
        for (let i = 0; i < 10000; i++) {
            events.push(
                TestUtilsMatrix.mkEvent({
                    event: true,
                    type: "unknown.event.type",
                    content: { key: "value" },
                    room: "!room:id",
                    user: "@user:id",
                    ts: 1000000 + i,
                }),
            );
        }
        render(getComponent({ events }, { showHiddenEvents: false }));
    });

crashes with:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

This seems bad :-)

@andybalaam
Copy link
Member

Adjusting the 10000 above to 1000 passes the test but it runs in 1395 ms which seems slow, so I can investigate from here.

@andybalaam
Copy link
Member

If I replace getEventTiles() with return [] the test passes in 55ms, so I can zoom in on that.

@andybalaam
Copy link
Member

If I replace getNextEventInfo in MessagePanel with a simple impl, the test passes in 80ms.
This code contains a deeply suspicious array.slice, so we might be getting somewhere.

@andybalaam
Copy link
Member

Removing the slice didn't help, so I'll have to think more deeply :-(

@andybalaam
Copy link
Member

We are re-running shouldShowEvent O(n^2) times, so I am experimenting with briefly caching the results to make it O(n).

@andybalaam
Copy link
Member

andybalaam commented Feb 9, 2023

That has helped a lot, but it still looks like we are calling sync 10K times, or maybe many more times than that.

@Johennes Johennes changed the title Filter out flood of member & hidden event spam when we detect the scrollback is full of it Gitter sunsetting: Filter out flood of member & hidden event spam when we detect the scrollback is full of it Feb 10, 2023
@Johennes Johennes removed the X-Needs-Product More input needed from the Product team label Feb 10, 2023
@Johennes
Copy link
Contributor

Removing X-Needs Product for now as we may be able to fix the performance issue without a UX change.

@andybalaam
Copy link
Member

I created element-hq/element-web#24480 to track further performance work. I think this issue should be used to think about batch sizes for /messages API, which is something I didn't consider so far, because the performance problems mask the need to do that.

FWIW I think we should probably double the batch size every time we receive a full batch of hidden events, up to a max of 1000, which seems to be Synapse's default max.

@andybalaam andybalaam removed their assignment Feb 22, 2023
@Johennes Johennes changed the title Gitter sunsetting: Filter out flood of member & hidden event spam when we detect the scrollback is full of it Filter out flood of member & hidden event spam when we detect the scrollback is full of it Mar 2, 2023
@t3chguy t3chguy transferred this issue from element-hq/element-web Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Performance A-Timeline O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Enhancement Z-MadLittleMods
Projects
None yet
Development

No branches or pull requests

5 participants