Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use repo search to select repositories #397

Merged
merged 11 commits into from
Oct 27, 2023

Conversation

jamestelfer
Copy link
Contributor

@jamestelfer jamestelfer commented Oct 24, 2023

What does this change

Using the --repo-search option, it is possible to supply a GitHub repository search query that is used to generate the set of repositories to process.

This feature is GItHub-only at the moment.

This allows for searches that reference:

  • partial names with boolean queries: owner:buildkite-plugins in:name docker AND plugin
  • one or multiple owners owner:cultureamp owner:jamestelfer fork:true in:name plugin
  • README contents
  • license

I initially started this as a "warm up" to implementing something similar with code search, but the partial name search and license search are potentially quite useful in themselves.

Obviously this means that search syntax and results depend on the underlying source being queried, so implementations for GitLab et al would be required to see this feature properly take root.

References:

$ multi-gitter run examples/general/test.sh --repo-search "user:jamestelfer fork:only" --dry-run --commit-message "chore: foo"

<log output elided />
INFO[0102] Cloning and running script                    repo=jamestelfer/multi-gitter
INFO[0103] Script output: jamestelfer/multi-gitter       repo=jamestelfer/multi-gitter
DEBU[0103] diff --git a/.features.yaml b/.features.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..73311e25f5ad0766cd0576c7302b96f9bc76a4b7
--- /dev/null
+++ b/.features.yaml
@@ -0,0 +1,3 @@
+
+features: ~
+
INFO[0103] Skipping pushing changes because of dry run   repo=jamestelfer/multi-gitter
Repositories with a successful run:
  jamestelfer/jenkins-inheritance-plugin #0
  jamestelfer/slow-cheetah.xdt #0
  jamestelfer/global-build-stats-plugin #0
  jamestelfer/wheel-of-misfortune #0
  jamestelfer/blast-1 #0
  jamestelfer/cloudformation-cli #0
  jamestelfer/step-templates-buildkite-plugin #0
  jamestelfer/fusionauth-site #0
  jamestelfer/aws-lambda-go #0
  jamestelfer/testza #0
  jamestelfer/avro #0
  jamestelfer/ca-go #0
  jamestelfer/rocketcrab #0
  jamestelfer/terraform-cdk #0
  jamestelfer/aws-cdk #0
  jamestelfer/multi-gitter #0

What issue does it fix

Closes #398

Refer also to original discussion.

Notes for the reviewer

Happy for feedback/changes of direction.

Checklist

  • Made sure the PR follows the CONTRIBUTING.md guidelines
  • Tests if something new is added

Using the `--repo-search` option, it is possible to supply a GitHub
repository search query that is used to generate the set of
repositories to process.
@jamestelfer jamestelfer marked this pull request as ready for review October 25, 2023 04:05
@jamestelfer
Copy link
Contributor Author

@lindell I've managed to get some tests on the go, so I've taken this out of draft. Very open to feedback and changes of direction, let me know what you think!

@lindell
Copy link
Owner

lindell commented Oct 25, 2023

Nice! Will try to take a look this evening :D

Copy link
Owner

@lindell lindell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

It looks very good. The biggest concern I have is accidentally writing a query that returns too many results. Which will hit the incomplete results error, and give the user an error that they don't know what to do with. Do you have any ideas about it?

internal/scm/github/github.go Outdated Show resolved Hide resolved
internal/scm/github/github.go Outdated Show resolved Hide resolved
cmd/platform.go Outdated Show resolved Hide resolved
@lindell
Copy link
Owner

lindell commented Oct 25, 2023

PS. I fixed the linting timeout. If you merge master, it should fix the error.

@jamestelfer jamestelfer requested a review from lindell October 25, 2023 22:25
@jamestelfer
Copy link
Contributor Author

The biggest concern I have is accidentally writing a query that returns too many results. Which will hit the incomplete results error, and give the user an error that they don't know what to do with. Do you have any ideas about it?

Fair point, and to be honest one that I hadn't considered enough.

The doco states that:

To keep the REST API fast for everyone, we limit how long any individual query can run. For queries that exceed the time limit, the API returns the matches that were already found prior to the timeout, and the response has the incomplete_results property set to true.

We can handle this by either:

  1. failing with an error that directs the user to refine their search (so that it's less likely to return incomplete results), or
  2. continue with the incomplete results and emit a warning.

In my opinion, the warning will likely get lost, and the user will get unexpected results instead. I'm going to go ahead and refine the error message (option 1), but I'm happy to change this if you would like (2) or you have a different approach.

Now suggests possible remediation for the user, and includes a test for
this case.
@jamestelfer
Copy link
Contributor Author

I've updated it now so the error message reads:

search timed out on GitHub and was marked incomplete: try refining the search to return fewer results or be less complex

Let me know what you think.

The test was flakey because the times were all the same and map ordering
is non-deterministic. Changing the time slightly for each ensures that
they're going to come out in order.
Copy link
Owner

@lindell lindell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new error for incomplete results is good 👍

When playing around with it a bit further, for the query "org:google", where the total repos returned is 2569. It did not error on incomplete results, but instead

GET https://api.github.com/search/repositories?page=11&per_page=100&q=fork%3Atrue+org%3Agoogle: 422 Only the first 1000 search results are available

This message came after making 10 requests.

So we should also add a check if the total results are >1000 and error right away in that case.

internal/scm/github/github.go Outdated Show resolved Hide resolved
Wraps up nil check, easier to read.
The search will eventually fail, so short circuit the waiting for the
user.
@jamestelfer
Copy link
Contributor Author

Fantastic feedback, much appreciated. Thanks for testing out the upper bound like that, it's much better to get a fast failure.

@jamestelfer jamestelfer requested a review from lindell October 26, 2023 05:29
Copy link
Owner

@lindell lindell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks for the contribution!

@lindell lindell merged commit 0f8c2dc into lindell:master Oct 27, 2023
8 checks passed
@github-actions
Copy link
Contributor

Included in release v0.48.0 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use repository metadata search as input for target repository list
2 participants