Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributor reports based on slack interaction #1367

Open
rbbeeston opened this issue Oct 17, 2024 · 10 comments
Open

Contributor reports based on slack interaction #1367

rbbeeston opened this issue Oct 17, 2024 · 10 comments
Assignees
Labels
Feature New feature or request

Comments

@rbbeeston
Copy link
Member

since a lot of mailing list activity has moved to slack, we are considering options to create reports based on slack interaction through the API.

This is in its beginning stages, so we are looking for ideas and feedback.
Consider looking at this part of the api for what kind of data may be available to use:
https://api.slack.com/admins/audit-logs

@rbbeeston rbbeeston converted this from a draft issue Oct 17, 2024
@rbbeeston rbbeeston added the Feature New feature or request label Oct 17, 2024
@rbbeeston rbbeeston assigned gavinwahl and unassigned brianjp93 Oct 22, 2024
@gavinwahl
Copy link
Collaborator

gavinwahl commented Oct 22, 2024

@rbbeeston

We can get message counts per user per day with this endpoint, but it requires a Business+, Select/Compliance, or Grid plan so we can't use it: https://api.slack.com/methods/admin.analytics.getFile
There is another endpoint to get all messages that we could download the entire message history from: https://api.slack.com/messaging/retrieving

@gavinwahl
Copy link
Collaborator

Is the goal to save aggregate stats for user activity between release cycles as we did for emails?

@vinniefalco
Copy link
Member

Something like that yes. To note when someone new appears, or when someone stops posting. Who are the top posters, most active channels, and especially we want to track any channel that has the word boost at the beginning.

@gavinwahl gavinwahl moved this from Accepted to In Progress in website-v2 Oct 29, 2024
@gavinwahl
Copy link
Collaborator

We have an option here: store all message history in our db, or do some aggregation to store less data based on what we want to show. Currently I'm tracking activity into hourly buckets per channel per user

@gavinwahl
Copy link
Collaborator

We have to have a script to get all message history through the API, which after the first run can be run again on a schedule to get only new messages. However, there is also a webhook that we could use to get notified about new messages as the happen as an optimization if desired. https://api.slack.com/apis/events-api#rate_limiting

@rbbeeston
Copy link
Member Author

Rather than storing all messages, would it be enough to perhaps store something simple like a name, date and message count that gets run on a daily basis (we'd need to do some type of previous data import, but I'm looking forward) that would allow us to get users and counts between release dates, or all time without being a heavy lift for the DB.

Since slack messages tend to be more like texts than emails, would we want to measure counts by any message, like "ok", "Sure", or "No" or have a minimum character count?

gavinwahl added a commit that referenced this issue Oct 29, 2024
Instead of storing all messages, we keep a count of messages per user
per hour per channel to allow further aggregation later. Incremental
updates are supported, fetching only new messages since the last update.
However, thread messages do not show up in the main message list so
message history for every thread ever encountered has to be checked
every time.

Hourly buckets are chosen to defer the choice of timezone to later. This
will allow aggregation and display to be done in any timezone with a
whole-hour UTC offset.

Automatically sleeps when encountering rate limiting, so while it make
take a while, it will finish successfully. Initial run time for
the #boost-website channel was 2 minutes 7 seconds.

With the data collected, we can generate this overall activity report:

SELECT real_name, SUM(b.count)
FROM slack_slackactivitybucket b,
     slack_slackuser u
WHERE b.user_id = u.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

          real_name           | sum
------------------------------+------
 Vinnie Falco                 | 3076
 Rob Beeston                  |  990
 Joaquín M López Muñoz        |  619
 Sam Darwin                   |  534
 René Ferdinand Rivera Morell |  323
 Kenneth Reitz                |  226
 Alan de Freitas              |  179
 Spencer Strickland           |  143
 Julio C Estrada              |  136
 Peter Dimov                  |  119

Or similar reports for any time range that ends on hour boundaries.

Refs #1367
@gavinwahl
Copy link
Collaborator

gavinwahl commented Nov 5, 2024

Do we need to store users' slack avatars?

Slack gives us URLs to avatars at different pixels sizes. If we do want to display them, should we just store the URL or download it into django media instead of hotlinking? I can't find any slack documentation on hotlinking behavior of avatar URLs.

@sdarwin
Copy link
Collaborator

sdarwin commented Nov 5, 2024

The avatars of tens of thousands of slack users? all users.

no.

A few avatars, for a release report?

If you determine that is the case please store them in the S3 media bucket along with web-user avatars. Or hyperlinking, if that seems to work well and it's supported.

@gavinwahl
Copy link
Collaborator

@rbbeeston

Since slack messages tend to be more like texts than emails, would we want to measure counts by any message, like "ok", "Sure", or "No" or have a minimum character count?

Let me know what the requirements are.

@rbbeeston
Copy link
Member Author

since there was no response to that question, I would assume there wouldn't be any filtering.

gavinwahl added a commit that referenced this issue Nov 5, 2024
Instead of storing all messages, we keep a count of messages per user
per hour per channel to allow further aggregation later. Incremental
updates are supported, fetching only new messages since the last update.
However, thread messages do not show up in the main message list so
message history for every thread ever encountered has to be checked
every time.

Hourly buckets are chosen to defer the choice of timezone to later. This
will allow aggregation and display to be done in any timezone with a
whole-hour UTC offset.

Automatically sleeps when encountering rate limiting, so while it make
take a while, it will finish successfully. Initial run time for
the #boost-website channel was 2 minutes 7 seconds.

With the data collected, we can generate this overall activity report:

SELECT real_name, SUM(b.count)
FROM slack_slackactivitybucket b,
     slack_slackuser u
WHERE b.user_id = u.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

          real_name           | sum
------------------------------+------
 Vinnie Falco                 | 3076
 Rob Beeston                  |  990
 Joaquín M López Muñoz        |  619
 Sam Darwin                   |  534
 René Ferdinand Rivera Morell |  323
 Kenneth Reitz                |  226
 Alan de Freitas              |  179
 Spencer Strickland           |  143
 Julio C Estrada              |  136
 Peter Dimov                  |  119

Or similar reports for any time range that ends on hour boundaries.

Refs #1367
@gavinwahl gavinwahl moved this from In Progress to Blocked in website-v2 Nov 8, 2024
@rbbeeston rbbeeston moved this from Blocked to In Progress in website-v2 Nov 8, 2024
gavinwahl added a commit that referenced this issue Nov 13, 2024
Instead of storing all messages, we keep a count of messages per user
per hour per channel to allow further aggregation later. Incremental
updates are supported, fetching only new messages since the last update.
However, thread messages do not show up in the main message list so
message history for every thread ever encountered has to be checked
every time.

Hourly buckets are chosen to defer the choice of timezone to later. This
will allow aggregation and display to be done in any timezone with a
whole-hour UTC offset.

Automatically sleeps when encountering rate limiting, so while it make
take a while, it will finish successfully. Initial run time for
the #boost-website channel was 2 minutes 7 seconds.

With the data collected, we can generate this overall activity report:

SELECT real_name, SUM(b.count)
FROM slack_slackactivitybucket b,
     slack_slackuser u
WHERE b.user_id = u.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

          real_name           | sum
------------------------------+------
 Vinnie Falco                 | 3076
 Rob Beeston                  |  990
 Joaquín M López Muñoz        |  619
 Sam Darwin                   |  534
 René Ferdinand Rivera Morell |  323
 Kenneth Reitz                |  226
 Alan de Freitas              |  179
 Spencer Strickland           |  143
 Julio C Estrada              |  136
 Peter Dimov                  |  119

Or similar reports for any time range that ends on hour boundaries.

Refs #1367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request
Projects
Status: In Progress
Development

No branches or pull requests

5 participants