feature: add a GitHub action to quell spam PRs #5449

CBID2 · 2024-02-06T17:35:20Z

Problem

I was scrolling through Twitter and someone posted about the spam PR.

Possible Solution

Implement this GitHub action. It'll lessen the workload for maintainers.

wesleytodd · 2024-02-06T17:58:40Z

I am not sure we need an action for this unless there is an action which does more than that one (like spam detection with ai or something). It is easy to delete/close/lock with as many clicks as it is to comment and we don't need an additional third party action (added security risk and maintenance) to do it right?

krzysdz · 2024-02-06T18:36:59Z

unless there is an action which does more than that one (like spam detection with ai or something)

Not sure about AI spam detection, but almost all of these PRs update the readme, have default name (Update filename), change only a single line and have no description, so it should be rather easy to close them automatically.

Most authors of these PRs have a repsitory called "localrepo", so that's another rule that could be used to detect spam from this source (Apna College).

wesleytodd · 2024-02-06T18:43:09Z

Yeah this is a one-off issue of the day. Spam PRs have been a problem for years and they come in different varieties. This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

CBID2 · 2024-02-06T20:01:39Z

Hey @wesleytodd. I found this on X: https://twitter.com/github/status/1311772722234560517
Maybe that could work?

UlisesGascon · 2024-02-06T20:20:37Z

This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

Agree with @wesleytodd. The issue is quite complex because the moderation in GitHub is not an easy thing:

Many people (those that are watching the repo) will receive a notification in their email or in the app/web once a new PR is created. This will occur even before a Github Action is triggered.
Closing PRs is an easy action (1 click), so that is not a big time consumer
Detecting invalid PRs is not the big issue because you can easily to spot them with the practice.
Reporting issues/users is complex because the UI requires many steps to do it for each user, so that is a blocker for many maintainers.
Nuke button like @CBID2 suggested is great when the community needs to slow down due a discussion/flame in certain moments, but is not the long term solution as it will prevent other users from contributing (PRs) that are legit or need help (issues) while using Express.

So, I think that we are quite limited on how much we can do with GitHub actions in this case. There are other scenarios that are less frequent but more prone to use GitHub Actions to moderate, for example when people do comments that include offensive content or heavily language. In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

wesleytodd · 2024-02-06T20:38:37Z

In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

I think historically the express project has not needed the same style of moderation as things like Node.js does. There have been less contentious discussions and mostly the CoC violations have come from folks outside the project so it was relatively simple to ban and move on. I don't think we need to immediately spin up a moderation team but I do think that the Triage team and TC should have the tools to properly moderate. Right now I don't think we have that well in hand. I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

UlisesGascon · 2024-02-06T20:52:59Z

I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Yes, we can add it to the TODO list and start to work on it in few weeks

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

+1 from me

krzysdz · 2024-02-06T20:53:15Z

Maybe a pull reqest template would make some people think before creating a PR?
On the other hand, it doesn't look like those who spam with pull requests will even bother to read it and they'll probably just click "Create pull reqest" with unedited description.

wesleytodd · 2024-02-06T21:52:06Z

Doesn't need to stop the conversation, but since we specifically don't want to use a GHA to do this then this issue is complete. There are more threads to follow up on, but I think we can do that in other discussions more specific to those.

gaurishhs · 2024-02-07T15:36:15Z

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

CBID2 · 2024-02-07T17:01:09Z

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

@wesleytodd, I think you should reopen this issue. @gaurishhs made another point about using GitHub actions

wesleytodd · 2024-02-07T17:20:13Z

I do agree that workflow is a bit more well suited IMO. I am still hesitant and I would like to also look at other ways but yeah in the mean time lets re-open the issue so we don't end up having multiple on the same topic or miss out on good ideas like @gaurishhs'.

deepak4566 · 2024-02-07T17:27:17Z

even one more better thing is maximum spam pr revolve around README.md ,if we can specify to block those pr updating readme for now (upto the spam pr's get less) we can handle this through github actions for this so that good pr's may not effect.

TimGels · 2024-02-07T17:27:30Z

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

CBID2 · 2024-02-07T18:25:32Z

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

I don't think you're overthinking it @TimGels. Previous contributors should be spared in some way.

gaurishhs · 2024-02-07T18:27:56Z

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

Pratik-Kumar-621 · 2024-02-07T20:12:13Z

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

I don't think word count matters because some bugs only requires a minimal changes.

Pratik-Kumar-621 · 2024-02-07T20:21:46Z

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

CBID2 · 2024-02-07T22:35:00Z

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

Pratik-Kumar-621 · 2024-02-07T23:04:11Z

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.
The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

There are Rest APIs for getting the info of prs and also for actions. Making a bot which can automate spam pr closing using those rest apis can help. But not sure how to detect spam.

animesh-chouhan · 2024-02-09T17:05:25Z

Heya! An update on my last comment: I think I'll be working on an AI based spam detector for use-cases like this, which would analyze diff files of PRs and conclude whether it is worthwhile to keep or not. Will update with progress on that here if y'all would like that.

You don't need AI for this. How are you going to approach this?

I agree with @QuantGeekDev, detecting this kind of spam is relatively easy and using AI would be an overkill. A simple rule-based approach will work well in this case: https://github.com/animesh-chouhan/github-spamstop/blob/main/index.js#L48-L66

This uses a pull-request template which needs to be edited with the details of the change which would easily deter a large amount of such PRs.

PR template: https://github.com/animesh-chouhan/github-spamstop/blob/main/.github/pull_request_template.md

nicholasgriffintn · 2024-02-13T19:39:56Z

Really, I think you could probably just reject anything with "update: ''" as a PR title, I don't think that would be too disruptive for normal PRs.

SakuraBlossomTree · 2024-02-14T03:51:16Z

Maybe we can add a PR template when people want to issue a PR

It is probably as this people won't be able to know how to change basic things in the PR template

ADTC · 2024-02-14T04:29:22Z

Most of them can be auto-closed if they meet all of the below criteria:

PR has only one commit.
Commit changes only one file.
Commit has a message subject matching regex Update [^ ]*\.[^ ]+ (match is for all file names generally).
Commit has no message body.

This should auto-close almost all of them. Maybe 1 or 2 every month may not match, but it's much less work to manually check and close them.

PS: The regex match should be on commit subject, not PR title because the newbie spammers are more likely to change the title.

For fun: Close them with a cheeky comment: Congrats! You are now an open source spam contributor. Now learn how to make real contributions and your training will be complete. [Add a Hindi translation of the same.]

Kamleshpaul · 2024-02-14T12:33:18Z

I think we can get auther (who made PR) total lifetime PR count if it is more then 10 or whatever we can prevent that ?

like this

name: Spam Detect

on:
  pull_request:
    types:
      - opened

jobs:
  check_author_pr_count:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout code
      uses: actions/checkout@v2

    - name: Set up Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '14'

    - name: Install dependencies
      run: npm install github-api

    - name: Count Merged PRs
      id: pr_count
      run: |
        AUTHOR=$(curl -sSL https://api.github.com/repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }} | jq -r '.user.login')

        MERGED_PR_COUNT=$(curl -sSL -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" "https://api.github.com/search/issues?q=is:pr+author:$AUTHOR+is:merged" | jq -r '.total_count')
        
        echo "merged_pr_count=$MERGED_PR_COUNT" >> $GITHUB_OUTPUT

    - name: Check PR Count
      run: |
        if [[ "${{ steps.pr_count.outputs.merged_pr_count }}" -gt 20 ]]; then
          echo "PR count is greater than 20. Proceed with the workflow."
        else
          echo "PR count is not greater than 20. Blocking the PR."
          exit 1  
        fi

this is it's test https://github.com/Kamleshpaul/github-spam-action-test/pulls

SakuraBlossomTree · 2024-02-14T12:48:27Z

I think we can get auther (who made PR) total lifetime PR count if it is more then 10 or whatever we can prevent that ?

They already made a github action which takes care of this

ServerDeveloper9447 · 2024-02-18T04:15:39Z

Well, it looks like the issue will be solved when the pr is merged

wesleytodd · 2024-02-18T15:23:05Z

Hey folks, just to update you: We have a thread open with GitHub support where they are passing along our requests to the various product teams on how to improve the moderation features. I personally am still 👎 on an action which auto closes PRs as it is both a maintenance burden and something best handled by features on GitHub's side. We are focused right now on landing some governance changes and will be following up with re-constituting the triage team. I think ideally this decision is left up to the triage team members. So likely there will not be action on this or the PR until we can get that team organized again. If you are interested in helping that please follow the instructions to get involved there (as some of you already have, thank you very much for that).

dougwilson · 2024-02-18T15:24:56Z

They have slowed down too for the time being, the last one already is two days old.

jonchurch · 2024-03-26T14:58:44Z

Closing as not planned.

Many words have been written now in this issue and others about auto-closing spam PRs, and we have come to the same conclusion every time. It is trivial to implement, but is a sledgehammer approach for what is now a mosquitto level annoyance. Closing and locking a handful of spam issues/PRs now and then is the easiest thing a maitainer will be doing that day, so it doesn't save much by labor.

There is very little benefit in having an auto-close action. It will still spam people's Github notifications no matter if it's closed and locked at the speed of a human or a machine. The only thing we'd prevent there is people commenting disparagingly on the PR before it gets locked by a human, generating more notifications. We haven't had any of that in the past couple weeks (we did get some hostile comments when this topic was trending originally, which it no longer is).

The PRs are still coming, but intermittently. The storm has passed, this can be reevaluated if another flood happens in the future from the previous source or any other.

Thank you all for your eagerness to help. If you want an ear to the ground on things that the project is prioritizing and possibly looking for help with, peruse the Discussions repo https://github.com/expressjs/discussions/issues

not7cd · 2024-04-19T19:09:30Z

This can be reconsidered with the use of label-action. By labelling those PRs this action can create a meaningful message to the spammer to inform them why it's getting closed.

See an example from Renovate:

wesleytodd · 2024-04-23T15:47:39Z

Hey @not7cd,

I love the idea of "create a meaningful message to the spammer to inform them why it's getting closed". Would you be willing to sign up to own this and help participate in the project to maintain the tooling? The main thing here is that closing a few spam PRs is a simple but short term thing. Working on setting up automation means time taken away from much more important project priorities we have right now. So it is mostly a prioritization thing.

So far the folks who showed up to make noise about ways to "solve" this problem have not showed up to reply with this sort of context in the spam PRs. If anyone in the threads wants to start posting these prewritten messages to the authors then we (the maintainers) can stop with our current approach of closing them as spam. This would save us a lot of time and help those users learn, but it is work and we are volunteers.

For everyone in this thread, please we need your help:

Start helping those users by posting nice responses telling them why this is bad
Start participating in the project, which could mean following activity repos and replying to issues (that is the road to being a triager) or opening PRs with helpful things