Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for directory walk #29

Closed
wants to merge 2 commits into from

Conversation

Dentrax
Copy link

@Dentrax Dentrax commented Jun 3, 2022

Fixes #28

Signed-off-by: Furkan furkan.turkal@trendyol.com
Co-authored-by: Batuhan batuhan.apaydin@trendyol.com

@Dentrax
Copy link
Author

Dentrax commented Jun 3, 2022

Example output:

$ go run . check ./apko/.github/workflows

[PASS] ../apko/.github/workflows/add-issues.yml
[PASS] ../apko/.github/workflows/build.yaml
[PASS] ../apko/.github/workflows/codeql.yaml
[PASS] ../apko/.github/workflows/go-tests.yaml
[FAIL] ../apko/.github/workflows/mink-e2e.yaml : check failed: found 5 unpinned refs: ["docker/setup-qemu-action@v2.0.0" "chainguard-dev/actions/kind-diag@main" "chainguard-dev/actions/setup-kind@main" "chainguard-dev/actions/setup-mink@main" "imjasonh/setup-crane@v0.1"]
check failed: found 5 unpinned refs: ["docker/setup-qemu-action@v2.0.0" "chainguard-dev/actions/kind-diag@main" "chainguard-dev/actions/setup-kind@main" "chainguard-dev/actions/setup-mink@main" "imjasonh/setup-crane@v0.1"]
exit status 1

.gitignore Outdated Show resolved Hide resolved
command/check.go Outdated Show resolved Hide resolved
@Dentrax Dentrax force-pushed the check-dir branch 2 times, most recently from 9683495 to dda8ce6 Compare February 13, 2023 19:48
@Dentrax
Copy link
Author

Dentrax commented Feb 13, 2023

Hey @sethvargo, really sorry, I dropped the ball here. Just made some changes according to your reviews. New commit includes:

  • restrict to only y(a|ml) files
  • introduce a new func called doer to make it available in all the commands
  • summarize the total findings for calculating the last exit code
  • add test coverage

PTAL when possible. Comments have been addressed. 🤞

* ability to traverse dirs
* restrict to .yml and .yaml files only
* plumb in through lower in the stack
* summarize the total findings for the exit code
* add test case

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
Co-authored-by: Batuhan <batuhan.apaydin@trendyol.com>
Copy link
Owner

@sethvargo sethvargo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Dentrax - the walker implementation looks correct, but what I was trying to explain in my previous comment is that this is still incorrect. By invoking Run for each file, we could generate hundreds or thousands of upstream requests. Think if I ran this on a directory with 100 identical YAML files, each with 10 GitHub Actions references. That would generate 1000 unique calls to the GitHub API.

Furthermore, what if something changes between when we resolve file_a to file_z? It's possible that a reference could resolve differently if an upstream developer pushes a new version. This is pretty horrible, since now you have different checksums for the same reference on the same ratchet run.

Instead, I think we need to walk as you've done here, but instead of calling "Run", we need to parse all the files and de-duplicate references. Then resolve all references. Then Walk again and rewrite each time. Does that make sense?

@Dentrax
Copy link
Author

Dentrax commented Feb 14, 2023

we could generate hundreds or thousands of upstream requests.

Oh, good point. I obviously missed that part, ratchet actually does HTTP requests to upstream.

what if something changes between when we resolve file_a to file_z?

I think you mean there would be a slight time-window that would be ending up with reference drift between local vs upstream.

we need to parse all the files and de-duplicate references. Then resolve all references.

Could you please clarify, what does de-duplicate references mean in this context? What is my reference key? Is it refs := refsList.All() but for aggregated by traversing all files? By doing so, I would not send duplicated HTTP requests for already resolved files. Is that correct?

@sethvargo
Copy link
Owner

I think you mean there would be a slight time-window that would be ending up with reference drift between local vs upstream.

I mean, if we're iterating over files and making upstream HTTP calls, it's possible that the same reference (actions/checkout@v3) could resolve to different checksums as each file is parsed, if GitHub pushes a new package version while Ratchet is running. It's a real-life race condition.

Could you please clarify, what does de-duplicate references mean in this context? What is my reference key? Is it refs := refsList.All() but for aggregated by traversing all files? By doing so, I would not send duplicated HTTP requests for already resolved files. Is that correct?

I think so, yea. We would need all the references, then we would need to drop them in a map/dedup. Then resolve the references. Then write the resolutions back to each time. This might mean we need to walk the files twice, but that's probably fine.

Signed-off-by: Furkan <furkan.turkal@trendyol.com>
@Dentrax
Copy link
Author

Dentrax commented Feb 16, 2023

Hey @sethvargo, I have just made some changes according to your reviews and ideas. PTAL when possible. Thanks!


In walk mode, I traverse each file and cache the all references. pin and update commands resolves the all corresponding references from the cache.

I had to introduce a new FetchAndCacheReferences func to traverse and feed to cache, it seems duplicate to Pin function in the first place but couldn't find a much cleaner way.

Added necessary unit test cases to cover walking logic. Used bitwise-like cache hit counter to ensure the correctness.

@Dentrax Dentrax changed the title feat: check: support directory feat: add support for directory walk Feb 16, 2023
@Dentrax
Copy link
Author

Dentrax commented Mar 25, 2023

Kindly ping @sethvargo

@sethvargo sethvargo closed this in #64 Feb 3, 2024
sethvargo added a commit that referenced this pull request Feb 3, 2024
This is a refactor to enable Ratchet to support multiple files in a
single run. All files are parsed once (to limit upstream API calls) and
then updated in serial.

Closes #29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check: support directory scan
2 participants