Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

Closed
github-actions bot opened this issue May 11, 2022 · 31 comments
Labels
bug Something isn't working ✔ tracking done This marks tracking issues as (presumably) done so that they will be auto-closed. 👁 tracking issue This issue tracks the merge-status of PRs created by this repo.

Comments

@github-actions
Copy link

Workflow run: https://github.com/exercism/org-wide-files/actions/runs/2306860844

@github-actions github-actions bot added the 👁 tracking issue This issue tracks the merge-status of PRs created by this repo. label May 11, 2022
This was referenced May 11, 2022
@ee7
Copy link
Member

ee7 commented May 13, 2022

To others: Sascha manually triggered exercism/nim-test-runner#138

@SaschaMann
Copy link
Contributor

SaschaMann commented May 13, 2022

That's really strange. Perhaps fetching the full repo object and comparing them between the listed and unlisted repos could show some differences that explain this?

This might be an issue with the GH search itself, too.

(Sorry for the less than helpful comments, can't look into it properly atm due to PC issues)

@SleeplessByte
Copy link
Member

Maybe the problem is that when the tool does this it's paginated and only looks at the first page?

@SaschaMann
Copy link
Contributor

SaschaMann commented May 13, 2022

That was my initial thought too, but the call to github.paginate should take care of it.

Sending the same query via curl shows that we should not be hitting the rate limits either:

$ curl \
>   -H "Accept: application/vnd.github.v3+json" \
>   https://api.github.com/search/repositories?q=org:exercism+topic:exercism-tooling+is:public+archived:false
{
  "total_count": 106,
  "incomplete_results": false,
  "items": [

@iHiD
Copy link
Member

iHiD commented May 13, 2022

Could you retry and see if the rust-test-runner works now pls?

@ee7
Copy link
Member

ee7 commented May 13, 2022

Could you retry and see if the rust-test-runner works now pls?

The below command outputs 109 repo names, and includes rust-test-runner.

I think I was seeing fewer at one point. But it might have been an intermittent rate-limiting or pagination thing. And --limit 200 might be optimistic for gh search repos, but it isn't immediately documented that it won't work.

Did you change something?

$ gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1 | sort
exercism/abap-test-runner
exercism/bash-analyzer
exercism/bash-test-runner
exercism/cfml-test-runner
exercism/clojure-analyzer
exercism/clojure-representer
exercism/clojurescript-test-runner
exercism/clojure-test-runner
exercism/coffeescript-test-runner
exercism/common-lisp-analyzer
exercism/common-lisp-representer
exercism/common-lisp-test-runner
exercism/cpp-test-runner
exercism/c-representer
exercism/crystal-test-runner
exercism/csharp-analyzer
exercism/csharp-representer
exercism/csharp-test-runner
exercism/c-test-runner
exercism/dart-test-runner
exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/d-test-runner
exercism/elixir-analyzer
exercism/elixir-representer
exercism/elixir-test-runner
exercism/elm-analyzer
exercism/elm-representer
exercism/elm-test-runner
exercism/emacs-lisp-test-runner
exercism/erlang-analyzer
exercism/erlang-representer
exercism/erlang-test-runner
exercism/fortran-test-runner
exercism/fsharp-analyzer
exercism/fsharp-representer
exercism/fsharp-test-runner
exercism/generic-analyzer
exercism/generic-representer
exercism/generic-test-runner
exercism/go-analyzer
exercism/go-representer
exercism/go-test-runner
exercism/groovy-test-runner
exercism/haskell-test-runner
exercism/haxe-analyzer
exercism/haxe-representer
exercism/haxe-test-runner
exercism/j-analyzer
exercism/java-analyzer
exercism/java-representer
exercism/javascript-analyzer
exercism/javascript-representer
exercism/javascript-test-runner
exercism/java-test-runner
exercism/j-representer
exercism/j-test-runner
exercism/julia-analyzer
exercism/julia-representer
exercism/julia-test-runner
exercism/kotlin-analyzer
exercism/kotlin-test-runner
exercism/lfe-test-runner
exercism/lua-test-runner
exercism/mips-test-runner
exercism/nim-representer
exercism/nim-test-runner
exercism/objective-c-test-runner
exercism/ocaml-test-runner
exercism/perl5-analyzer
exercism/perl5-test-runner
exercism/pharo-smalltalk-test-runner
exercism/php-test-runner
exercism/plsql-test-runner
exercism/prolog-test-runner
exercism/purescript-test-runner
exercism/python-analyzer
exercism/python-representer
exercism/python-test-runner
exercism/racket-test-runner
exercism/raku-test-runner
exercism/reasonml-test-runner
exercism/red-analyzer
exercism/red-representer
exercism/red-test-runner
exercism/r-test-runner
exercism/ruby-analyzer
exercism/ruby-representer
exercism/ruby-test-runner
exercism/rust-analyzer
exercism/rust-representer
exercism/rust-test-runner
exercism/scala-analyzer
exercism/scala-test-runner
exercism/scheme-test-runner
exercism/sml-test-runner
exercism/stub-analyzer
exercism/swift-test-runner
exercism/tcl-test-runner
exercism/typescript-analyzer
exercism/typescript-representer
exercism/typescript-test-runner
exercism/vbnet-test-runner
exercism/vimscript-test-runner
exercism/wasm-test-runner
exercism/wren-representer
exercism/wren-test-runner
exercism/x86-64-assembly-test-runner
exercism/zig-test-runner

@iHiD
Copy link
Member

iHiD commented May 13, 2022

I removed leading white space from the rust-test-runners topic. But I don't know if I actually did, because in most of the UI that doesn't show, but on one specific bit it did. So I'm wondering if it's strip'd in most places, but not in some.

@SaschaMann
Copy link
Contributor

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

@ee7
Copy link
Member

ee7 commented May 13, 2022

I removed leading white space from the rust-test-runners topic.

I see.

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

$ gh repo list exercism --visibility private

Showing 1 of 1 repositories in @exercism that match your search

exercism/delphi-test-runner    private  Oct  5, 2021

(It's probably also better to use gh repo list exercism than gh search repos --owner=exercism).

@iHiD
Copy link
Member

iHiD commented May 13, 2022

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

Yes. It contains delphi's source code, which is proprietory (delphi is our only non open-source language AFAICT)

@ee7
Copy link
Member

ee7 commented May 13, 2022

Yes. It contains delphi's source code

Ah.

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

@SaschaMann You jest, but I've actually been considering a repo that runs periodic checks on Exercism repo metadata. Then we can get notifications for things like:

I suppose it could even be a periodic workflow in exercism/configlet.

@iHiD
Copy link
Member

iHiD commented May 13, 2022

(cc @kytrinyx to this too)

@SaschaMann
Copy link
Contributor

SaschaMann commented May 13, 2022

If you're gonna implement that, please add "repo is empty" as that has caused us issues a few times already :)

@ee7
Copy link
Member

ee7 commented May 14, 2022

please add "repo is empty"

You'd like a failing check when any Exercism repo has isEmpty as true in the below?

The 5 repos with the lowest disk usage currently:

gh repo list exercism \
    --limit 500 --no-archived --visibility public --json diskUsage,isEmpty,name \
    | jq 'sort_by(.diskUsage) | .[0:5]'
[
  {
    "diskUsage": 6,
    "isEmpty": false,
    "name": "babel-preset-javascript"
  },
  {
    "diskUsage": 11,
    "isEmpty": false,
    "name": "guys-slack-bot"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "open-source-stats"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "backlog"
  },
  {
    "diskUsage": 13,
    "isEmpty": false,
    "name": "wip"
  }
]

(https://github.com/exercism/babel-preset-javascript is quite small: CODE_OF_CONDUCT.md and labels).

that has caused us issues a few times already :)

#110? Can we filter out empty repos before we fork?

@SaschaMann
Copy link
Contributor

SaschaMann commented May 15, 2022

#110? Can we filter out empty repos before we fork?

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

@ErikSchierboom
Copy link
Member

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

@ErikSchierboom
Copy link
Member

ErikSchierboom commented May 17, 2022

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

@ee7
Copy link
Member

ee7 commented May 17, 2022

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

I'm more persuaded by the counter-argument: it seems easier to be robust to the existence of an empty repo, rather than being blocked by one and needing manual intervention (so org-wide-files jobs fail until we remove, or add content to, an empty repo that is encountered). A new repo might be empty for a short period while it's being set up, and it'd be easy to forget that it would break org-wide-files jobs. Creating an empty repo might be convenient for someone while setting something up, and the fact that it has happened before means that it can happen in the future.

I could also claim that an empty repo does not do anything yet, and so does not urgently need to be targeted by the next org-wide-files PR. Furthermore, an empty repo might not yet have the correct topics set, so org-wide-files cannot sync the correct files anyway.

We could have a periodic job that runs e.g. daily, and does things like producing an error if there is an empty repo. But that doesn't help if an empty repo is created shortly before an org-wide-files job runs, and making the "exercism repo linting" job prevent an org-wide-files job is even harder than filtering empty repos, and produces the same outcome - org-wide-files being blocked.

But I probably don't have the full picture, and I'm not the one who has to deal with it. Do whatever's easiest, of course.

@ee7
Copy link
Member

ee7 commented May 17, 2022

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

I think only this issue. I believe org-wide-files has now targeted every repo we wanted - it was only nim-test-runner that was not targeted in this case.

It's strange that nim-test-runner was missing, though.

It does have the topic:

gh repo view exercism/nim-test-runner --json repositoryTopics
{
  "repositoryTopics": [
    {
      "name": "exercism-tooling"
    },
    {
      "name": "exercism-test-runner"
    }
  ]
}

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

And/or switch to gh (or curl), if we're suspicious that it might perform better than actions/github-script?

Getting all the Exercism-owned, public, non-archived, repos with the exercism-tooling topic with gh:

gh repo list exercism \
    --visibility public \
    --no-archived \
    --limit 500 \
    --topic exercism-tooling \
    --json name \
    --jq 'sort_by(.name)' \
    | jq '.[:3]' 

(The final pipe to jq is just to slice to keep the output short below).

[
  {
    "name": "abap-test-runner"
  },
  {
    "name": "bash-analyzer"
  },
  {
    "name": "bash-test-runner"
  }
]

I think this avoids the search API, because if I write --limit 1234 I don't see this message: https://github.com/cli/cli/blob/9454ad376953/pkg/cmd/repo/list/list.go#L181-L182

@ErikSchierboom
Copy link
Member

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

I think it already does that: https://github.com/actions/github-script#actionsgithub-script

A pre-authenticated octokit/rest.js client with pagination plugins

@SaschaMann
Copy link
Contributor

SaschaMann commented May 18, 2022

I don't have a particular preference for github-script vs gh for this particular use case. Though I'm not sure if this would be as readable in Bash. These lines would need to be rewritten:

script: |
if (context.eventName === 'repository_dispatch') {
return context.payload.client_payload.repos
} else if (context.eventName === 'workflow_dispatch') {
return JSON.parse(context.payload.inputs.repos)
} else {
const allRepos = (await github.paginate(github.rest.search.repos, {
q: 'org:exercism+is:public+archived:false'
})).flatMap(({ full_name }) => [full_name])
const toolingRepos = (await github.paginate(github.rest.search.repos, {
q: 'org:exercism+topic:exercism-tooling+is:public+archived:false'
})).flatMap(({ full_name }) => [full_name])
const trackRepos = (await github.paginate(github.rest.search.repos, {
q: 'org:exercism+topic:exercism-track+is:public+archived:false'
})).flatMap(({ full_name }) => [full_name])
return allRepos.filter(r => !toolingRepos.includes(r) && !trackRepos.includes(r))
}

Not relying on the search API seems like a good idea for resilience.

@ErikSchierboom
Copy link
Member

But then what does it use? The graphql API? Something special?

@SaschaMann
Copy link
Contributor

I didn't look up what the gh command does but there's an endpoint that returns all repositories in an organisation. You could then filter these based on topics afterwards.

@ErikSchierboom
Copy link
Member

I think that is what they do: https://github.com/cli/cli/blob/00e97121ec8cf4de86bdda704fb711302a8dfc96/pkg/cmd/repo/list/http.go#L80

@ee7
Copy link
Member

ee7 commented May 18, 2022

I think the source of truth is https://github.com/cli/cli/blob/00e97121ec8c/pkg/cmd/repo/list/http.go#L32, but I don't understand exactly what I'm seeing. It looks like both listRepos and searchRepos use the GraphQL API v4, and FromSearch is set to true if e.g. we filter non-archived or by topic.

(Edit: Erik beat me by 7 seconds).

@ErikSchierboom
Copy link
Member

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

@ee7
Copy link
Member

ee7 commented May 18, 2022

Aha, I've worked it out.

If I write

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234

Then the output does start with

warning: this query uses the Search API which is capped at 1000 results maximum

But the warning does not appear if I add some --json option:

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234 --json name

I think they could still print that warning on stderr (which is where it appears when --json is not used).

@ee7
Copy link
Member

ee7 commented May 18, 2022

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

I don't know. But the GraphQL API and the REST API do have different limits: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

Maybe let's do nothing for now, and keep an eye on the number of repos targeted by an org-wide-files run? Or we could e.g. obtain the repo list by two independent methods, and require them to agree.

@github-actions github-actions bot added the ✔ tracking done This marks tracking issues as (presumably) done so that they will be auto-closed. label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ✔ tracking done This marks tracking issues as (presumably) done so that they will be auto-closed. 👁 tracking issue This issue tracks the merge-status of PRs created by this repo.
Projects
None yet
Development

No branches or pull requests

5 participants