Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional filters before checkout repo #92

Open
elicwhite opened this issue Dec 14, 2019 · 8 comments
Open

Additional filters before checkout repo #92

elicwhite opened this issue Dec 14, 2019 · 8 comments

Comments

@elicwhite
Copy link
Contributor

I'd like to be able to run some lightweight filters that can't be encoded in the GitHub search API before checking out the repo to do heavier weight repos.

For example, my search results are matching repos that has node_modules checked in and the query is matching files inside of node_modules.

For example, a url like this: riphunter07/ATproject/ATproject/node_modules.old/react-native/jest/setup.js.

I could encode NOT node_modules in my query, but I think that would exclude modules with:

const foo = require('./node_modules/foo/foo.js');

Github lets you do searches with file names but I don't think that lets you do negative file name searches:

Libraries/Renderer/shims/ReactNative fork:false language:JavaScript filename: NOT native_modules NOT flow-typed

Being able to do this early filtering (on paths, or even other things that GitHub returns as part of the API search results) would let me avoid having to clone hundreds of repos that are using React Native to realize my search is finding code in react-native core in each of these projects.

@parshap
Copy link
Contributor

parshap commented Dec 14, 2019

You might be able to accomplish what you want with -path:node_modules (see docs on exclusion), but GitHub search is not exactly the most reliable or robust thing, so I think the type of functionality you’re talking about makes sense. At NerdWallet, we’ve solved this using the should_migrate hook, but we’re only ever working with ~100 repos so the cost of cloning is ok. Being able to do a pre-clone filter makes sense for your type of use case.

Do you have a proposal in mind for how this would work?

@elicwhite
Copy link
Contributor Author

I’m not sure how it would work. The simple case of filtering the file path is relatively straight forward. You could even make that an env variable that could be passed to a script or just grep directly.

However, I imagine there is some other data returned from the github api that might be nice to filter on, although I’m not sure what that is. In that case I’d love to process it with a script. Maybe you could json stringify it and call a script with that as stdin, or just stick it in an env var too :-D

@parshap
Copy link
Contributor

parshap commented Dec 14, 2019

I like the serialized json over stdin idea. I think that might be the most robust, but maybe not the most ergonomic to script against. If there’s a limited number of basic data (strings), env vars might make more sense.

I think the next step would be to see what context is available at that time that would be relevant to filtering (gh api response, anything else?) and go from there.

@elicwhite
Copy link
Contributor Author

elicwhite commented Dec 14, 2019

That sounds like a great plan to me! Do you happen to have any bandwidth or interest to work on this? Otherwise I’ll take a stab when I need it.

Edit: it was silly for me to ask that. I’ll dig into this myself when I work on this next

@elicwhite
Copy link
Contributor Author

This is an example element of the array of the JSON response for a search request:

{
  "name": "ProgressViewIOS.ios.js",
  "path": "Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
  "sha": "76a1f5efb5294638826265b6a4009e6419da1fd4",
  "url": "https://api.github.com/repositories/214770447/contents/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js?ref=d5e5d39c2dd9822af07448ff47e44ed1df44d8e7",
  "git_url": "https://api.github.com/repositories/214770447/git/blobs/76a1f5efb5294638826265b6a4009e6419da1fd4",
  "html_url": "https://github.com/yinhangfeng/react-native-web/blob/d5e5d39c2dd9822af07448ff47e44ed1df44d8e7/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
  "repository": {
    "id": 214770447,
    "node_id": "MDEwOlJlcG9zaXRvcnkyMTQ3NzA0NDc=",
    "name": "react-native-web",
    "full_name": "yinhangfeng/react-native-web",
    "private": false,
    "owner": {
      "login": "yinhangfeng",
      "id": 2867940,
      "node_id": "MDQ6VXNlcjI4Njc5NDA=",
      "avatar_url": "https://avatars0.githubusercontent.com/u/2867940?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/yinhangfeng",
      "html_url": "https://github.com/yinhangfeng",
      "followers_url": "https://api.github.com/users/yinhangfeng/followers",
      "following_url": "https://api.github.com/users/yinhangfeng/following{/other_user}",
      "gists_url": "https://api.github.com/users/yinhangfeng/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/yinhangfeng/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/yinhangfeng/subscriptions",
      "organizations_url": "https://api.github.com/users/yinhangfeng/orgs",
      "repos_url": "https://api.github.com/users/yinhangfeng/repos",
      "events_url": "https://api.github.com/users/yinhangfeng/events{/privacy}",
      "received_events_url": "https://api.github.com/users/yinhangfeng/received_events",
      "type": "User",
      "site_admin": false
    },
    "html_url": "https://github.com/yinhangfeng/react-native-web",
    "description": null,
    "fork": false,
    "url": "https://api.github.com/repos/yinhangfeng/react-native-web",
    "forks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/forks",
    "keys_url": "https://api.github.com/repos/yinhangfeng/react-native-web/keys{/key_id}",
    "collaborators_url": "https://api.github.com/repos/yinhangfeng/react-native-web/collaborators{/collaborator}",
    "teams_url": "https://api.github.com/repos/yinhangfeng/react-native-web/teams",
    "hooks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/hooks",
    "issue_events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/events{/number}",
    "events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/events",
    "assignees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/assignees{/user}",
    "branches_url": "https://api.github.com/repos/yinhangfeng/react-native-web/branches{/branch}",
    "tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/tags",
    "blobs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/blobs{/sha}",
    "git_tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/tags{/sha}",
    "git_refs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/refs{/sha}",
    "trees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/trees{/sha}",
    "statuses_url": "https://api.github.com/repos/yinhangfeng/react-native-web/statuses/{sha}",
    "languages_url": "https://api.github.com/repos/yinhangfeng/react-native-web/languages",
    "stargazers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/stargazers",
    "contributors_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contributors",
    "subscribers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscribers",
    "subscription_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscription",
    "commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/commits{/sha}",
    "git_commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/commits{/sha}",
    "comments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/comments{/number}",
    "issue_comment_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/comments{/number}",
    "contents_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contents/{+path}",
    "compare_url": "https://api.github.com/repos/yinhangfeng/react-native-web/compare/{base}...{head}",
    "merges_url": "https://api.github.com/repos/yinhangfeng/react-native-web/merges",
    "archive_url": "https://api.github.com/repos/yinhangfeng/react-native-web/{archive_format}{/ref}",
    "downloads_url": "https://api.github.com/repos/yinhangfeng/react-native-web/downloads",
    "issues_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues{/number}",
    "pulls_url": "https://api.github.com/repos/yinhangfeng/react-native-web/pulls{/number}",
    "milestones_url": "https://api.github.com/repos/yinhangfeng/react-native-web/milestones{/number}",
    "notifications_url": "https://api.github.com/repos/yinhangfeng/react-native-web/notifications{?since,all,participating}",
    "labels_url": "https://api.github.com/repos/yinhangfeng/react-native-web/labels{/name}",
    "releases_url": "https://api.github.com/repos/yinhangfeng/react-native-web/releases{/id}",
    "deployments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/deployments"
  },
  "score": 84.579025
}

@elicwhite
Copy link
Contributor Author

I think of these things I'd like to expose

path
repository.name
repository.owner.login

I think exposing these as environment variables is reasonable. Maybe matching #88 with:

SHEPHERD_GITHUB_REPO_OWNER
SHEPHERD_GITHUB_REPO_NAME

as well as a

SHEPHERD_MATCH_PATH

What do you think about that?

Also, checkout.ts is the file that runs should_migrate, so naively I'd expect to make this file run should_checkout before checking it out. However, it doesn't have this info. github.ts is the file that gets this data, and it returns an array of repo names, losing this additional info.

It probably doesn't make sense to have github.ts run executeSteps. Instead maybe getCandidateRepos needs to change to return more info? I'm curious what you think.

Here's a concrete proposal:

Make getCandidateRepos take another argument called something like filterCandidate.
Then we change this line

repoNames = searchResults.map((r: any) => r.repository.full_name).sort();

to something like

repoNames = searchResults.filter(candidate => filterCandidate(candidate)).map((r: any) => r.repository.full_name).sort();

The adapter also implements another method called like

getFilterCandidateEnvironmentVariable(candidate: CandidateResult): Promise<IEnvironmentVariables>

Then checkout.ts would do something like:

repos = await adapter.getCandidateRepos(onRetry, async (candidate) => {
  const envVars = adapter.getFilterCandidateEnvironmentVariable(candidate);
  const stepsResults = await executeSteps(context, repo, 'should_checkout', envVars);
  return stepsResults.succeeded;
});

It looks like there will be some grossness to this implementation like multiple ways exec-in-repo will have to take env vars, however you might need to see it to give concrete feedback. I'll try to turn this into a PR.

elicwhite added a commit to elicwhite/shepherd that referenced this issue Dec 17, 2019
elicwhite added a commit to elicwhite/shepherd that referenced this issue Dec 17, 2019
@elicwhite
Copy link
Contributor Author

The problem with the above proposal is that it runs all of these should_checkouts together, and it should be inline together.

I implemented the above proposal and the output is something like this (with three repos where should_checkout is exit 2

> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
✔ Loaded 0 repos

@elicwhite
Copy link
Contributor Author

I need to get the search results (or at least the data that will become the environment variables) at the time I'm running forEachRepo. I think that means that I need to attach the variables to the repos object returned by getCandidateRepos.

This data would only be present if the org option is not being used. This data is also specific to the github adapter, so it probably shouldn't be parsed or manipulated by anything other than the github adapter. hmm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants