Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cmp/stream - allow to exclude all('*') #18810

Closed
wants to merge 1 commit into from

Conversation

marxus
Copy link

@marxus marxus commented Jun 25, 2024

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
    happen depending on risk/complexity).

This one is regarding:
https://argo-cd.readthedocs.io/en/stable/operator-manual/config-management-plugins/#plugin-tar-stream-exclusions

This one is a tricky one. It's not a "new feature", yet it's not really a "bug fix", it's just merely allowing to set
ARGOCD_REPO_SERVER_PLUGIN_TAR_EXCLUSIONS='*'

No need to edit the current document since the document doesn't suggest anything about not excluding all the files
In practice, if you actually try to exclude all the files you will get the 'no files to send' error

Why should we allow exclude everything?

  • well, it's the "admin" choice to do so, if he knows what he's doing then it shouldn't bother the argo development team and he should have the freedom to do so, if it's not introducing breaking changes.
  • maybe i got an empty repo and i already exclude '.git/*', in that case i'll have the same error result. but who to decide that an empty repository can't get processed by a plugin that generates manifests on the fly from build envs/other sources?

So is it backward compatible?

  • yes, because it just gives you the possibility to configure to send an empty TAR to the CMP server (this is probably not configured anywhere because it was not allowed...) , it doesn't disable the possibility to send a non-empty TAR..

Why was it not allowed so far?

  • I guess there was an assumption that if it's a GitOps tool you should have at least some sources that you want to process in order to generate manifests.
  • to safeguard users from misconfiguration of the exclusion globs. (which it doesn't really keep them safe..). we might consider adding a notice such as "Use exclusion wisly and make sure you do not exclude files that are needed for your plugin operation".

My use case:
we have a monorepo, we do not change files in the file system during generation yet we can not use the same source to generate manifests. the default tar-gz/stream/untar-gz per generation operation takes a heavy toll on the Disk I/O.
our quick workaround this problem could be:

  • copy nothing (exlude '*')
  • have a shared volume between the argocd-repo-server and the plugin sidecar. (shared /tmp/_argocd-repo)
  • change the plugin manifest like so:
apiVersion: argoproj.io/v1alpha1
kind: ConfigManagementPlugin
metadata:
  name: my-plugin
spec:
  generate:
    command:
      - bash
      - -c
      - |
        ### added changes to support such change, pseudo code
        cd /_tmp/_argocd-repo && match the "correct-repo-uuid" using "$ARGOCD_APP_SOURCE_REPO_URL"
        cd "correct-repo-uuid/$ARGOCD_APP_SOURCE_PATH"
        
        ### rest of the plugin commands
        kustimoze build . | helm template . | envsubst | you-name-it | argocd-vault-plugin | etc etc...

I'm not claiming this is a way to go. or such methods should be the preferred way of doing stuff, there are probably more pitfalls one might think about, such as multiple plugins needs extra attention, discovery rules needed to be carefully crafted, generate operations that do change the filesystem in order to manipulate the manifests, etc etc etc
but all those concerns are offloaded to the admin and are not part of the argocd scope

for us it's a quick win by enabling something which is only logical - allowing the admin to exclude what ever he want.

this can allow other quick wins for admins, without being too hacky or have a customized forked binary that they don't want to maintain, i've collected issues that could benefit or in someway related:

#16146
#15763
#17775
#18054
#17951
#16837
#18795

@crenshaw-dev @jsolana @czchen @momilo @clement-heetch @woehrl01

and this is the PR the introduced the big change CMP way of operation #8600
my PR doesn't revert anything. just allows one to have a logical escape hatch

Copy link

bunnyshell bot commented Jun 25, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link

bunnyshell bot commented Jun 25, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@marxus marxus force-pushed the master branch 2 times, most recently from adc44ea to ee95e55 Compare June 25, 2024 07:49
Signed-off-by: Amit Marcus <marxus@gmail.com>
Copy link

codecov bot commented Jun 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.30%. Comparing base (c95d4ee) to head (2a9d041).
Report is 528 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18810      +/-   ##
==========================================
- Coverage   50.30%   50.30%   -0.01%     
==========================================
  Files         315      315              
  Lines       43126    43124       -2     
==========================================
- Hits        21695    21693       -2     
- Misses      18951    18955       +4     
+ Partials     2480     2476       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marxus marxus marked this pull request as ready for review June 25, 2024 09:11
@marxus marxus requested a review from a team as a code owner June 25, 2024 09:11
@marxus marxus closed this Jun 25, 2024
@marxus marxus reopened this Jun 25, 2024
Copy link

bunnyshell bot commented Jun 25, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link

bunnyshell bot commented Jun 25, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIF of Larry David thinking to himself, "What's even the point of GitOps? Folks just wanna ship stuff using whatever tool happens to have API credentials, reproducibility doesn't matter, audit history doesn't matter, change review doesn't matter. Delete Argo CD and replace it with Postman."

@crenshaw-dev
Copy link
Member

So.... I'm not violently opposed to making the change. After all, you're the admin, you could do the same thing but probably in a worse way.

But I really wanna know...... exactly what are you doing with this gitopsless gitops deployment?

@marxus
Copy link
Author

marxus commented Jun 25, 2024

Hey, thanks for responding

1st of all, Argo is great, and also the plugin system. it changed the way we think and design our solution and probably made some of our CI obsolete (like, generating manifests, commit them, and then consume them "statically") + gave us the possibility to create on the fly dynamic envs by injecting parameters and such. we're basically using mostly the plugin system as an entry-point in our Apps without using the built in tools like "Helm" or "Kustomize" (we do actually use them, but inside our plugin...since we need to pipe output to envsubst and argocd-vault-plugin and such... )

Okay, To the PR title's usecase:
so basically we experienced a great toll on cpu/mem/disk-io usage when we moved from an older version of argo, which used the same source of a repo in CMPs, to a new version which TarGZ's and streams the entire repo for each generation iteration...(actually it happens twice, once for discovery and once for generation command)

We only have one generic plugin that we maintain that does not change the filesystem during run time (so we keep the git state clean..).

we have 500 apps on the same monorepo. a commit triggers a cache miss on all of them at once. we have approx 3000 files, they don't weight that much, but it's the numbers of files that's heavy on the i/o
it's 500 apps * 3000 files * 2 (discovery and generation) = 3000000 files that get targz and streamed, which chocks everything. refresh times are at 30min.
before moving to the new way of doing CMPs, we didn't encounter that problem. zero copies, only generate commands, less than 30 sec for a full refresh.

so we merely looking for an escape hatch.

  • we share the /tmp mount with the sidecar (yes, i know in the docs it claims that it can be used for traversal path exploits.. but it's our plugin, the only plugin...and our toolchain)

  • no files are streamed

  • we do need the files, so we just change dir within the plugin into the shared /tmp/_argocd-repo/-repo-uuid-, and run the command.

  • we get to keep all the BuildEnv and plugin parameters in the ENV.

  • yes. it's an escape hatch. but it gives us a quick win in just 3 lines of bash. it can be a quick win for people facing the same problem, with this minor change of argo codebase

  • it's also logical, why limit what a one can exclude

  • it also could be a quick win for people that want to "include" files (as seen in other issues) they can just cp only the files they need from the shared /tmp/_argocd-repo mount.... no need for fancy issues like chore: transmit only app manifests to the cmp-server for plugin-based applications #18054 which can cause their own edge cases (for example manifest-generate-paths states which files trigger a change, but they are not exclusively all the files needed for generation... e.g, bases that doesnt change, but overlays that do...)...
    or feat: add plugin tar inclusion #16146 that is WIP for over 8 months (an wont get merged?), 288 files changes, 111 commits, endless lines of code and complexity for something that they wouldn't probably need before the CMP change. (and they can roll their own in the plugin in a much more simple way - not needing to maintain a fork on production as they're doing now.).

  • it offloads the responsibility to the Admins for a short period until a more mature way of avoid this TGZ streaming hell is handled.

  • yes we know we can reduce the cache miss by optimizing manifest-generate-paths and avoid a generate operation. we already use it but we want our plugin to be fast again regardless.

@marxus
Copy link
Author

marxus commented Jun 26, 2024

@crenshaw-dev Hey. I've took the time to try and express via a real example with real code snippets. please take a look at:

https://github.com/marxus/argo-cd/tree/cmp-required-changes-demo

@crenshaw-dev
Copy link
Member

I think the project is better-served by improving monorepo support for plugins in general rather than facilitating a hack. I think the combination of these two would largely solve the performance problem:

  1. feat: Allow return the client without performing a matchRepository #18053
  2. chore: transmit only app manifests to the cmp-server for plugin-based applications #18054

For now, I think you could accomplish basically what you need by excluding every file extension in the repo except one that you know is super lightweight. For example:

*.yaml,*.json # only thing that gets copied is README.md

This change would make that hack a bit easier: #16146

The PR isn't actually 288 files, there was a botched rebase.

@marxus
Copy link
Author

marxus commented Jun 27, 2024

I agree, we should improve monorepo support

even that from a configuration wise, the changes for PLUGIN_TAR_EXCLUSIONS and for workingDir (as seen here https://github.com/marxus/argo-cd/tree/cmp-required-changes-demo) can have legit cases + they break nothing.
and they do not imply that you can use them to hack your way.

On another topic:
Does every MatchRepository and GenerateManifests create it's own temp TarGz? because if so we can actually make it much more faster by creating a TarGz for streaming next to /_tmp/_argocd-repo/[repo-uuid], as part as hard-refresh flow.

after git pull/checkout + git clean is done for the revision, we already know the exclusion+inclusion options because they are static, and we create the TarGz as tmp right next to the checked in code, the cmp-server that streams then pick it up instead of recreating it for each app.

\_ /tmp/_argocd-repo
  \_ repo-uuid # the repo folder
  \_ repo-uuid-myplugin-v1.0.tgz # added the the plugin name/version as suffix to support https://github.com/argoproj/argo-cd/pull/16146 the work on inclusion/exlusion based on plugin.
  \_ repo-uuid-otherplugin-v2.0.tgz 

incase for a non static inclusion, like the one that offered using manifest-path annotation, we can use them as intermediate TarGzs

@momilo
Copy link

momilo commented Jun 29, 2024

@marxus - your hack to use a shared volume is a very interesting idea (acknowledging the risks, naturally). Our engineers have been desperately looking for a solution (....or abandonment of argo-cd...) ever since the migration of CMP to the sidecar, which rendered all our deployments very painful (and unreliable - timeout issues etc).

I'll definitely try the suggestion.

And, needtless to say, we would also welcome an option to either "exclude all", or to simply have "--include-only" param (which might be more palatable to argo-cd maintainers, from a philosophical perspective). In general, when operating monorepo, there is no need to include anything beyond a very small subset of folders - hence having only an option "--exclude" does not seem to be a fitting design choice.

@crenshaw-dev
Copy link
Member

Does every MatchRepository and GenerateManifests create it's own temp TarGz?

Yep, this is an anti-traversal mechanism.

because if so we can actually make it much more faster by creating a TarGz for streaming next to /_tmp/_argocd-repo/[repo-uuid], as part as hard-refresh flow.

after git pull/checkout + git clean is done for the revision, we already know the exclusion+inclusion options because they are static, and we create the TarGz as tmp right next to the checked in code, the cmp-server that streams then pick it up instead of recreating it for each app.

It sounds like what you're describing is basically introducing a persistent cache in the CMP. We've intentionally avoided that for these reasons:

  1. managing multiple revisions while avoiding races or duplication
  2. managing temporary on-disk mutations (e.g. kustomize edit set) while avoiding duplications
  3. architectural simplicity for maintainability

Of course, once user namespaces has broad cloud provider support, we'll be able to abandon the sidecar-based isolation strategy and avoid the network overhead/flakiness.

I'm gonna close this for now, since I think you can accomplish the same thing pretty today easily:

For now, I think you could accomplish basically what you need by excluding every file extension in the repo except one that you know is super lightweight. For example:

.yaml,.json # only thing that gets copied is README.md

I do recommend following these two and reviewing/testing:

@crenshaw-dev
Copy link
Member

Closing just to express that I don't think we'll merge the PR, obviously more discussion is very welcome. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants