-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contributor reports based on slack interaction #1367
Comments
We can get message counts per user per day with this endpoint, but it requires a Business+, Select/Compliance, or Grid plan so we can't use it: https://api.slack.com/methods/admin.analytics.getFile |
Is the goal to save aggregate stats for user activity between release cycles as we did for emails? |
Something like that yes. To note when someone new appears, or when someone stops posting. Who are the top posters, most active channels, and especially we want to track any channel that has the word boost at the beginning. |
We have an option here: store all message history in our db, or do some aggregation to store less data based on what we want to show. Currently I'm tracking activity into hourly buckets per channel per user |
We have to have a script to get all message history through the API, which after the first run can be run again on a schedule to get only new messages. However, there is also a webhook that we could use to get notified about new messages as the happen as an optimization if desired. https://api.slack.com/apis/events-api#rate_limiting |
Rather than storing all messages, would it be enough to perhaps store something simple like a name, date and message count that gets run on a daily basis (we'd need to do some type of previous data import, but I'm looking forward) that would allow us to get users and counts between release dates, or all time without being a heavy lift for the DB. Since slack messages tend to be more like texts than emails, would we want to measure counts by any message, like "ok", "Sure", or "No" or have a minimum character count? |
Instead of storing all messages, we keep a count of messages per user per hour per channel to allow further aggregation later. Incremental updates are supported, fetching only new messages since the last update. However, thread messages do not show up in the main message list so message history for every thread ever encountered has to be checked every time. Hourly buckets are chosen to defer the choice of timezone to later. This will allow aggregation and display to be done in any timezone with a whole-hour UTC offset. Automatically sleeps when encountering rate limiting, so while it make take a while, it will finish successfully. Initial run time for the #boost-website channel was 2 minutes 7 seconds. With the data collected, we can generate this overall activity report: SELECT real_name, SUM(b.count) FROM slack_slackactivitybucket b, slack_slackuser u WHERE b.user_id = u.id GROUP BY 1 ORDER BY 2 DESC LIMIT 10; real_name | sum ------------------------------+------ Vinnie Falco | 3076 Rob Beeston | 990 Joaquín M López Muñoz | 619 Sam Darwin | 534 René Ferdinand Rivera Morell | 323 Kenneth Reitz | 226 Alan de Freitas | 179 Spencer Strickland | 143 Julio C Estrada | 136 Peter Dimov | 119 Or similar reports for any time range that ends on hour boundaries. Refs #1367
Do we need to store users' slack avatars? Slack gives us URLs to avatars at different pixels sizes. If we do want to display them, should we just store the URL or download it into django media instead of hotlinking? I can't find any slack documentation on hotlinking behavior of avatar URLs. |
no.
If you determine that is the case please store them in the S3 media bucket along with web-user avatars. Or hyperlinking, if that seems to work well and it's supported. |
Let me know what the requirements are. |
since there was no response to that question, I would assume there wouldn't be any filtering. |
Instead of storing all messages, we keep a count of messages per user per hour per channel to allow further aggregation later. Incremental updates are supported, fetching only new messages since the last update. However, thread messages do not show up in the main message list so message history for every thread ever encountered has to be checked every time. Hourly buckets are chosen to defer the choice of timezone to later. This will allow aggregation and display to be done in any timezone with a whole-hour UTC offset. Automatically sleeps when encountering rate limiting, so while it make take a while, it will finish successfully. Initial run time for the #boost-website channel was 2 minutes 7 seconds. With the data collected, we can generate this overall activity report: SELECT real_name, SUM(b.count) FROM slack_slackactivitybucket b, slack_slackuser u WHERE b.user_id = u.id GROUP BY 1 ORDER BY 2 DESC LIMIT 10; real_name | sum ------------------------------+------ Vinnie Falco | 3076 Rob Beeston | 990 Joaquín M López Muñoz | 619 Sam Darwin | 534 René Ferdinand Rivera Morell | 323 Kenneth Reitz | 226 Alan de Freitas | 179 Spencer Strickland | 143 Julio C Estrada | 136 Peter Dimov | 119 Or similar reports for any time range that ends on hour boundaries. Refs #1367
Instead of storing all messages, we keep a count of messages per user per hour per channel to allow further aggregation later. Incremental updates are supported, fetching only new messages since the last update. However, thread messages do not show up in the main message list so message history for every thread ever encountered has to be checked every time. Hourly buckets are chosen to defer the choice of timezone to later. This will allow aggregation and display to be done in any timezone with a whole-hour UTC offset. Automatically sleeps when encountering rate limiting, so while it make take a while, it will finish successfully. Initial run time for the #boost-website channel was 2 minutes 7 seconds. With the data collected, we can generate this overall activity report: SELECT real_name, SUM(b.count) FROM slack_slackactivitybucket b, slack_slackuser u WHERE b.user_id = u.id GROUP BY 1 ORDER BY 2 DESC LIMIT 10; real_name | sum ------------------------------+------ Vinnie Falco | 3076 Rob Beeston | 990 Joaquín M López Muñoz | 619 Sam Darwin | 534 René Ferdinand Rivera Morell | 323 Kenneth Reitz | 226 Alan de Freitas | 179 Spencer Strickland | 143 Julio C Estrada | 136 Peter Dimov | 119 Or similar reports for any time range that ends on hour boundaries. Refs #1367
since a lot of mailing list activity has moved to slack, we are considering options to create reports based on slack interaction through the API.
This is in its beginning stages, so we are looking for ideas and feedback.
Consider looking at this part of the api for what kind of data may be available to use:
https://api.slack.com/admins/audit-logs
The text was updated successfully, but these errors were encountered: