Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

github-actions · 2022-05-11T11:56:12Z

Workflow run: https://github.com/exercism/org-wide-files/actions/runs/2306860844

ee7 · 2022-05-13T11:02:37Z

To others: Sascha manually triggered exercism/nim-test-runner#138

SaschaMann · 2022-05-13T11:03:38Z

That's really strange. Perhaps fetching the full repo object and comparing them between the listed and unlisted repos could show some differences that explain this?

This might be an issue with the GH search itself, too.

(Sorry for the less than helpful comments, can't look into it properly atm due to PC issues)

SleeplessByte · 2022-05-13T11:50:59Z

Maybe the problem is that when the tool does this it's paginated and only looks at the first page?

SaschaMann · 2022-05-13T12:04:28Z

That was my initial thought too, but the call to github.paginate should take care of it.

Sending the same query via curl shows that we should not be hitting the rate limits either:

$ curl \
>   -H "Accept: application/vnd.github.v3+json" \
>   https://api.github.com/search/repositories?q=org:exercism+topic:exercism-tooling+is:public+archived:false
{
  "total_count": 106,
  "incomplete_results": false,
  "items": [

iHiD · 2022-05-13T12:31:17Z

Could you retry and see if the rust-test-runner works now pls?

ee7 · 2022-05-13T12:34:58Z

Could you retry and see if the rust-test-runner works now pls?

The below command outputs 109 repo names, and includes rust-test-runner.

I think I was seeing fewer at one point. But it might have been an intermittent rate-limiting or pagination thing. And --limit 200 might be optimistic for gh search repos, but it isn't immediately documented that it won't work.

Did you change something?

$ gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1 | sort
exercism/abap-test-runner
exercism/bash-analyzer
exercism/bash-test-runner
exercism/cfml-test-runner
exercism/clojure-analyzer
exercism/clojure-representer
exercism/clojurescript-test-runner
exercism/clojure-test-runner
exercism/coffeescript-test-runner
exercism/common-lisp-analyzer
exercism/common-lisp-representer
exercism/common-lisp-test-runner
exercism/cpp-test-runner
exercism/c-representer
exercism/crystal-test-runner
exercism/csharp-analyzer
exercism/csharp-representer
exercism/csharp-test-runner
exercism/c-test-runner
exercism/dart-test-runner
exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/d-test-runner
exercism/elixir-analyzer
exercism/elixir-representer
exercism/elixir-test-runner
exercism/elm-analyzer
exercism/elm-representer
exercism/elm-test-runner
exercism/emacs-lisp-test-runner
exercism/erlang-analyzer
exercism/erlang-representer
exercism/erlang-test-runner
exercism/fortran-test-runner
exercism/fsharp-analyzer
exercism/fsharp-representer
exercism/fsharp-test-runner
exercism/generic-analyzer
exercism/generic-representer
exercism/generic-test-runner
exercism/go-analyzer
exercism/go-representer
exercism/go-test-runner
exercism/groovy-test-runner
exercism/haskell-test-runner
exercism/haxe-analyzer
exercism/haxe-representer
exercism/haxe-test-runner
exercism/j-analyzer
exercism/java-analyzer
exercism/java-representer
exercism/javascript-analyzer
exercism/javascript-representer
exercism/javascript-test-runner
exercism/java-test-runner
exercism/j-representer
exercism/j-test-runner
exercism/julia-analyzer
exercism/julia-representer
exercism/julia-test-runner
exercism/kotlin-analyzer
exercism/kotlin-test-runner
exercism/lfe-test-runner
exercism/lua-test-runner
exercism/mips-test-runner
exercism/nim-representer
exercism/nim-test-runner
exercism/objective-c-test-runner
exercism/ocaml-test-runner
exercism/perl5-analyzer
exercism/perl5-test-runner
exercism/pharo-smalltalk-test-runner
exercism/php-test-runner
exercism/plsql-test-runner
exercism/prolog-test-runner
exercism/purescript-test-runner
exercism/python-analyzer
exercism/python-representer
exercism/python-test-runner
exercism/racket-test-runner
exercism/raku-test-runner
exercism/reasonml-test-runner
exercism/red-analyzer
exercism/red-representer
exercism/red-test-runner
exercism/r-test-runner
exercism/ruby-analyzer
exercism/ruby-representer
exercism/ruby-test-runner
exercism/rust-analyzer
exercism/rust-representer
exercism/rust-test-runner
exercism/scala-analyzer
exercism/scala-test-runner
exercism/scheme-test-runner
exercism/sml-test-runner
exercism/stub-analyzer
exercism/swift-test-runner
exercism/tcl-test-runner
exercism/typescript-analyzer
exercism/typescript-representer
exercism/typescript-test-runner
exercism/vbnet-test-runner
exercism/vimscript-test-runner
exercism/wasm-test-runner
exercism/wren-representer
exercism/wren-test-runner
exercism/x86-64-assembly-test-runner
exercism/zig-test-runner

iHiD · 2022-05-13T15:56:55Z

I removed leading white space from the rust-test-runners topic. But I don't know if I actually did, because in most of the UI that doesn't show, but on one specific bit it did. So I'm wondering if it's strip'd in most places, but not in some.

SaschaMann · 2022-05-13T16:03:47Z

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

ee7 · 2022-05-13T16:05:33Z

I removed leading white space from the rust-test-runners topic.

I see.

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

$ gh repo list exercism --visibility private

Showing 1 of 1 repositories in @exercism that match your search

exercism/delphi-test-runner    private  Oct  5, 2021

(It's probably also better to use gh repo list exercism than gh search repos --owner=exercism).

iHiD · 2022-05-13T16:06:15Z

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

Yes. It contains delphi's source code, which is proprietory (delphi is our only non open-source language AFAICT)

ee7 · 2022-05-13T16:15:03Z

Yes. It contains delphi's source code

Ah.

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

@SaschaMann You jest, but I've actually been considering a repo that runs periodic checks on Exercism repo metadata. Then we can get notifications for things like:

I suppose it could even be a periodic workflow in exercism/configlet.

iHiD · 2022-05-13T16:16:53Z

(cc @kytrinyx to this too)

SaschaMann · 2022-05-13T16:45:45Z

If you're gonna implement that, please add "repo is empty" as that has caused us issues a few times already :)

ee7 · 2022-05-14T11:50:46Z

please add "repo is empty"

You'd like a failing check when any Exercism repo has isEmpty as true in the below?

The 5 repos with the lowest disk usage currently:

gh repo list exercism \
    --limit 500 --no-archived --visibility public --json diskUsage,isEmpty,name \
    | jq 'sort_by(.diskUsage) | .[0:5]'

[
  {
    "diskUsage": 6,
    "isEmpty": false,
    "name": "babel-preset-javascript"
  },
  {
    "diskUsage": 11,
    "isEmpty": false,
    "name": "guys-slack-bot"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "open-source-stats"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "backlog"
  },
  {
    "diskUsage": 13,
    "isEmpty": false,
    "name": "wip"
  }
]

(https://github.com/exercism/babel-preset-javascript is quite small: CODE_OF_CONDUCT.md and labels).

that has caused us issues a few times already :)

#110? Can we filter out empty repos before we fork?

SaschaMann · 2022-05-15T01:06:56Z

#110? Can we filter out empty repos before we fork?

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

ErikSchierboom · 2022-05-17T09:53:43Z

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

ErikSchierboom · 2022-05-17T09:57:13Z

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

ee7 · 2022-05-17T10:23:07Z

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

I'm more persuaded by the counter-argument: it seems easier to be robust to the existence of an empty repo, rather than being blocked by one and needing manual intervention (so org-wide-files jobs fail until we remove, or add content to, an empty repo that is encountered). A new repo might be empty for a short period while it's being set up, and it'd be easy to forget that it would break org-wide-files jobs. Creating an empty repo might be convenient for someone while setting something up, and the fact that it has happened before means that it can happen in the future.

I could also claim that an empty repo does not do anything yet, and so does not urgently need to be targeted by the next org-wide-files PR. Furthermore, an empty repo might not yet have the correct topics set, so org-wide-files cannot sync the correct files anyway.

We could have a periodic job that runs e.g. daily, and does things like producing an error if there is an empty repo. But that doesn't help if an empty repo is created shortly before an org-wide-files job runs, and making the "exercism repo linting" job prevent an org-wide-files job is even harder than filtering empty repos, and produces the same outcome - org-wide-files being blocked.

But I probably don't have the full picture, and I'm not the one who has to deal with it. Do whatever's easiest, of course.

ee7 · 2022-05-17T10:24:46Z

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

I think only this issue. I believe org-wide-files has now targeted every repo we wanted - it was only nim-test-runner that was not targeted in this case.

It's strange that nim-test-runner was missing, though.

It does have the topic:

gh repo view exercism/nim-test-runner --json repositoryTopics

{
  "repositoryTopics": [
    {
      "name": "exercism-tooling"
    },
    {
      "name": "exercism-test-runner"
    }
  ]
}

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

And/or switch to gh (or curl), if we're suspicious that it might perform better than actions/github-script?

Getting all the Exercism-owned, public, non-archived, repos with the exercism-tooling topic with gh:

gh repo list exercism \
    --visibility public \
    --no-archived \
    --limit 500 \
    --topic exercism-tooling \
    --json name \
    --jq 'sort_by(.name)' \
    | jq '.[:3]'

(The final pipe to jq is just to slice to keep the output short below).

[
  {
    "name": "abap-test-runner"
  },
  {
    "name": "bash-analyzer"
  },
  {
    "name": "bash-test-runner"
  }
]

I think this avoids the search API, because if I write --limit 1234 I don't see this message: https://github.com/cli/cli/blob/9454ad376953/pkg/cmd/repo/list/list.go#L181-L182

ErikSchierboom · 2022-05-18T08:51:42Z

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

I think it already does that: https://github.com/actions/github-script#actionsgithub-script

A pre-authenticated octokit/rest.js client with pagination plugins

SaschaMann · 2022-05-18T10:02:07Z

I don't have a particular preference for github-script vs gh for this particular use case. Though I'm not sure if this would be as readable in Bash. These lines would need to be rewritten:

org-wide-files/.github/workflows/sync-rest.yml

Lines 78 to 97 in cd20548

    
                     script: | 
        
                       if (context.eventName === 'repository_dispatch') { 
        
                         return context.payload.client_payload.repos 
        
                       } else if (context.eventName === 'workflow_dispatch') { 
        
                         return JSON.parse(context.payload.inputs.repos) 
        
                       } else { 
        
                         const allRepos = (await github.paginate(github.rest.search.repos, { 
        
                           q: 'org:exercism+is:public+archived:false' 
        
                         })).flatMap(({ full_name }) => [full_name]) 
        
                         const toolingRepos = (await github.paginate(github.rest.search.repos, { 
        
                           q: 'org:exercism+topic:exercism-tooling+is:public+archived:false' 
        
                         })).flatMap(({ full_name }) => [full_name]) 
        
                         const trackRepos = (await github.paginate(github.rest.search.repos, { 
        
                           q: 'org:exercism+topic:exercism-track+is:public+archived:false' 
        
                         })).flatMap(({ full_name }) => [full_name]) 
        
                         return allRepos.filter(r => !toolingRepos.includes(r) && !trackRepos.includes(r)) 
        
                       }

Not relying on the search API seems like a good idea for resilience.

ErikSchierboom · 2022-05-18T10:29:43Z

But then what does it use? The graphql API? Something special?

SaschaMann · 2022-05-18T10:41:54Z

I didn't look up what the gh command does but there's an endpoint that returns all repositories in an organisation. You could then filter these based on topics afterwards.

ErikSchierboom · 2022-05-18T10:45:05Z

I think that is what they do: https://github.com/cli/cli/blob/00e97121ec8cf4de86bdda704fb711302a8dfc96/pkg/cmd/repo/list/http.go#L80

ee7 · 2022-05-18T10:45:12Z

I think the source of truth is https://github.com/cli/cli/blob/00e97121ec8c/pkg/cmd/repo/list/http.go#L32, but I don't understand exactly what I'm seeing. It looks like both listRepos and searchRepos use the GraphQL API v4, and FromSearch is set to true if e.g. we filter non-archived or by topic.

(Edit: Erik beat me by 7 seconds).

ErikSchierboom · 2022-05-18T10:52:31Z

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

ee7 · 2022-05-18T10:59:20Z

Aha, I've worked it out.

If I write

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234

Then the output does start with

warning: this query uses the Search API which is capped at 1000 results maximum

But the warning does not appear if I add some --json option:

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234 --json name

I think they could still print that warning on stderr (which is where it appears when --json is not used).

ee7 · 2022-05-18T11:05:24Z

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

I don't know. But the GraphQL API and the REST API do have different limits: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

Maybe let's do nothing for now, and keep an eye on the number of repos targeted by an org-wide-files run? Or we could e.g. obtain the repo list by two independent methods, and require them to agree.

github-actions bot added the 👁 tracking issue This issue tracks the merge-status of PRs created by this repo. label May 11, 2022

github-actions bot added the ✔ tracking done This marks tracking issues as (presumably) done so that they will be auto-closed. label May 25, 2022

github-actions bot closed this as completed May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

github-actions bot commented May 11, 2022

ee7 commented May 13, 2022

SaschaMann commented May 13, 2022 •

edited

Loading

SleeplessByte commented May 13, 2022

SaschaMann commented May 13, 2022 •

edited

Loading

iHiD commented May 13, 2022

ee7 commented May 13, 2022 •

edited

Loading

iHiD commented May 13, 2022

SaschaMann commented May 13, 2022

ee7 commented May 13, 2022

iHiD commented May 13, 2022

ee7 commented May 13, 2022 •

edited

Loading

iHiD commented May 13, 2022

SaschaMann commented May 13, 2022 •

edited

Loading

ee7 commented May 14, 2022 •

edited

Loading

SaschaMann commented May 15, 2022 •

edited

Loading

ErikSchierboom commented May 17, 2022

ErikSchierboom commented May 17, 2022 •

edited

Loading

ee7 commented May 17, 2022 •

edited

Loading

ee7 commented May 17, 2022 •

edited

Loading

ErikSchierboom commented May 18, 2022

SaschaMann commented May 18, 2022 •

edited

Loading

ErikSchierboom commented May 18, 2022

SaschaMann commented May 18, 2022

ErikSchierboom commented May 18, 2022

ee7 commented May 18, 2022 •

edited

Loading

ErikSchierboom commented May 18, 2022

ee7 commented May 18, 2022

ee7 commented May 18, 2022 •

edited

Loading

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

Comments

github-actions bot commented May 11, 2022

ee7 commented May 13, 2022

SaschaMann commented May 13, 2022 • edited Loading

SleeplessByte commented May 13, 2022

SaschaMann commented May 13, 2022 • edited Loading

iHiD commented May 13, 2022

ee7 commented May 13, 2022 • edited Loading

iHiD commented May 13, 2022

SaschaMann commented May 13, 2022

ee7 commented May 13, 2022

iHiD commented May 13, 2022

ee7 commented May 13, 2022 • edited Loading

iHiD commented May 13, 2022

SaschaMann commented May 13, 2022 • edited Loading

ee7 commented May 14, 2022 • edited Loading

SaschaMann commented May 15, 2022 • edited Loading

ErikSchierboom commented May 17, 2022

ErikSchierboom commented May 17, 2022 • edited Loading

ee7 commented May 17, 2022 • edited Loading

ee7 commented May 17, 2022 • edited Loading

ErikSchierboom commented May 18, 2022

SaschaMann commented May 18, 2022 • edited Loading

ErikSchierboom commented May 18, 2022

SaschaMann commented May 18, 2022

ErikSchierboom commented May 18, 2022

ee7 commented May 18, 2022 • edited Loading

ErikSchierboom commented May 18, 2022

ee7 commented May 18, 2022

ee7 commented May 18, 2022 • edited Loading

SaschaMann commented May 13, 2022 •

edited

Loading

SaschaMann commented May 13, 2022 •

edited

Loading

ee7 commented May 13, 2022 •

edited

Loading

ee7 commented May 13, 2022 •

edited

Loading

SaschaMann commented May 13, 2022 •

edited

Loading

ee7 commented May 14, 2022 •

edited

Loading

SaschaMann commented May 15, 2022 •

edited

Loading

ErikSchierboom commented May 17, 2022 •

edited

Loading

ee7 commented May 17, 2022 •

edited

Loading

ee7 commented May 17, 2022 •

edited

Loading

SaschaMann commented May 18, 2022 •

edited

Loading

ee7 commented May 18, 2022 •

edited

Loading

ee7 commented May 18, 2022 •

edited

Loading