-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Admin API - Users' message event usage statistics #11871
Comments
@buffless-matt I think we would accept PRs for expanding the information available about users, especially as I can see how those could be used to try to find abusive accounts. There's currently no plans to add such a feature though. |
I'm planning on making two PRs for this:
I'm not sure on how to calculate a fair approximation of event storage, but I've come up with the following queries (respectively): SELECT p.displayname AS displayname,
COUNT(ej.json) AS events_count,
SUM(octet_length(ej.json)) AS events_length,
e.sender AS user_id
FROM events e
JOIN event_json ej ON (e.event_id = ej.event_id)
JOIN users u ON (e.sender = u.name)
LEFT JOIN profiles p ON (u.name = '@' || p.user_id || ':' ?HOSTNAME?)
GROUP BY p.displayname,
e.sender SELECT e.room_id AS room_id,
r.name AS room_name,
COUNT(ej.json) AS events_count,
SUM(octet_length(ej.json))
FROM events e
JOIN event_json ej ON (e.event_id = ej.event_id)
JOIN room_stats_state r ON (e.room_id = r.room_id)
LEFT JOIN users u ON (u.name = e.sender)
WHERE u.name IS NULL
GROUP BY e.room_id,
r.name Note: I'm aware that the JSON string byte length calculation will need to be branched (at run-time) based on the storage engine used (e.g. SQLite will need something like Could I get some feedback on the above please? |
Hi Matt, This statistic (count of events, and count of event bytes per-user) used to exist, but I think it got removed because it wasn't used. I don't know if the background job that incrementally tracked these things still exists for some other metrics or whether it got ripped out too. I will just note that these queries will be very slow on realistic homeservers (and I'm not sure what joining to -- Count + Bytes
SELECT COUNT(ej.json) AS events_count,
SUM(octet_length(ej.json)) AS events_length,
e.sender AS user_id
FROM events e
JOIN event_json ej USING (event_id)
LEFT JOIN users u ON (e.sender = u.name)
GROUP BY e.sender;
-- Count (try your best to keep it as an index-only scan)
SELECT COUNT(e.sender) AS events_count,
e.sender AS user_id
FROM events e
JOIN users u ON (e.sender = u.name)
GROUP BY e.sender; Note that events have a ~65 kiB size limit so you may not need to fret about the exact consumed size too much. |
Hi @reivilibre, Thanks for the feedback!
Interesting, I'll dig around through the history and see what I can find.
Was aiming to provide similar output to the pre-existing media stats end-point (and hence followed a similar implementation approach).
This is a good point, thanks! I agree that the count should be enough, so I'll aim for index-only queries (assuming there isn't already something else buried in the code that I can use, which you alluded to above). |
I presume #9602 is what you were referring to and it looks to me (from the linked PR) that the tracking got ripped out too. Is that right?
Unfortunately there is no index on the Assuming we're happy to go down this road, is an index on P.S. I've linked an EMS PR to this issue (to hopefully share some context on EMS use-cases for these proposed admin end-points). |
Looks like you came to the right conclusion, yeah! The design of that code changed over time so I forgot how it was actually implemented until looking at it just now; some metrics used to be bucketed into time slices and
I think incrementally tracking it may be less expensive than adding an index... On my personal homeserver, I tried both a hash index and a B-tree index (I named it
I was quite surprised by how small these were, especially the B-tree which I thought would have been larger than a hash index! Frankly, incremental tracking will still likely make more sense if you only care about the number of rows (but it is more complex to maintain), but I was surprised to see that this workable. |
We're not enthusiastic about adding more indexes to the events table, so suspect incremental tracking in a separate metrics table would be more palatable. |
( |
Thank you. Which background process are you referring to by the way? Is it this one? |
Yup, that is the place we update the those stats. |
Are there any plans to expand the pre-existing admin API offerings for statistics?
E.g. While the pre-existing "Users' media usage statistics" offering covers media, it doesn't cover message events. I'm thinking about this through a lens of:
Edit (2022-03-24): It appears as though some functionality relating to this had existed.
The text was updated successfully, but these errors were encountered: