Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgoCD start to DDoS related Helm Repository #12314

Closed
3 tasks done
fotto1 opened this issue Feb 7, 2023 · 5 comments · Fixed by #19530
Closed
3 tasks done

ArgoCD start to DDoS related Helm Repository #12314

fotto1 opened this issue Feb 7, 2023 · 5 comments · Fixed by #19530
Labels
bug Something isn't working

Comments

@fotto1
Copy link

fotto1 commented Feb 7, 2023

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Our ArgoCD Application start to DDoS our Helm Repositories.

For our deployment we use gitops, which uses an app of apps approach to create nearly 64 different ArgoCD Applications. For our gitops we have multiple branches and do multiple deployments on same argocd environment in one AWS EKS Cluster.

We have 32 Deployments means 2048 ArgoCD Applications based on same helm repository + 32 Argo CD Application running as app of apps over gitops.

To Reproduce

  1. Configure a Gitops Repository with a set of ArgoCD Applications using same Helm Chart Repository and a sync policy of 1 minute.
  2. Greate multiple of this ArgoCD Applications by using gitops.
  3. Check access log on Helm Chart Repository (I use jFrog Artifactory), for calls to index.yaml File.

Expected behavior

After ArgoCD calls always the same index.yaml File in the Helm Chart Repository I would expect that it is cached in ArgoCD and used as shared resource between ArgoCD Applications. Not that every single ArgoCD Application calls the index.yaml File itself.

Our index.yaml has 20.26 MB x 2048 ArgoCD Applications calling one time per Hour for 12 hours. That makes 20.26 MB x 2048 x 12 = 497.909,76 MB ~= 486,24 GB per Day.

That is what we also see from our access logs in our Helm Repository.

Our expectation is 20,26 MB x 24 x 60 to check file onces per minute which should result in 28,49 GB per day, so that file is cached centrally.

Version

argocd: v2.6.0+acc554f
  BuildDate: 2023-02-06T21:44:18Z
  GitCommit: acc554f3d99010e0353b498a595844b30090556f
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64
Handling connection for 8081
Handling connection for 8081
argocd-server: v2.5.6+9db2c94
  BuildDate: 2023-01-10T19:30:17Z
  GitCommit: 9db2c9471f6ff599c3f630b446e940d3a065620b
  GitTreeState: clean
  GoVersion: go1.18.9
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.5.7 2022-08-02T16:35:54Z
  Helm Version: v3.10.3+g835b733
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.18.0
@fotto1 fotto1 added the bug Something isn't working label Feb 7, 2023
@jessesuen
Copy link
Member

The feature request makes sense. We need the equivalent of what we are currently doing with git ls-remote but for a helm chart repository.

For the implementation, I feel we could cache the index.yaml in redis, and make subsequent requests for index.yaml with the If-Modified-Since header. If the response is HTTP 304, then we can use the cache result.

This all presumes jFrog Artifactory supports the If-Modified-Since header.

@fotto1 are you open to submitting a fix for this?

@fotto1
Copy link
Author

fotto1 commented Feb 9, 2023

@jessesuen sounds like a good proposal. Maybe you have somewhere the link to merge request for the similar change on git repositories.

Can you provide some guidance which classes must be touch to implement the change you proposed?

After that I will check if I can do the change by my own.

@fotto1
Copy link
Author

fotto1 commented Oct 5, 2023

Seems also related to the problem described by @hmoravec here #8698

@able8
Copy link

able8 commented Dec 22, 2023

We often encounter helm dependency build failed timeout after 1m30s. #3977 (comment)
It would be nice to cache the helm charts.

rpc error: code = Unknown desc = Manifest generation error (cached): 'helm dependency build' failed timeout after 3mOs

@fotto1
Copy link
Author

fotto1 commented Mar 13, 2024

Based on multiple discussion also with my colleague @andrei-gavrila we identified the underlying issue.

Seems argocd use for every helm chart an repo url and a cache. Means every configured application has an own cache and store a index.yaml file.

indexCache indexCache

type nativeHelmChart struct {
	chartCachePaths argoio.TempPaths
	repoURL         string
	creds           Creds
	repoLock        sync.KeyLock
	enableOci       bool
	indexCache      indexCache
	proxy           string
}

This is how the index is retrieved:

if err := c.indexCache.GetHelmIndex(c.repoURL, &data); err != nil && err != cache.ErrCacheMiss {

if !noCache && c.indexCache != nil {
    if err := c.indexCache.GetHelmIndex(c.repoURL, &data); err != nil && err != cache.ErrCacheMiss {
        log.Warnf("Failed to load index cache for repo: %s: %v", c.repoURL, err)
    }
}

if len(data) == 0 {
    start := time.Now()
    var err error
    data, err = c.loadRepoIndex()
    if err != nil {
        return nil, err
    }
    log.WithFields(log.Fields{"seconds": time.Since(start).Seconds()}).Info("took to get index")

    if c.indexCache != nil {
        if err := c.indexCache.SetHelmIndex(c.repoURL, data); err != nil {
            log.Warnf("Failed to store index cache for repo: %s: %v", c.repoURL, err)
        }
    }
}

Unfortunately, this is how it's really retrieved (no helm involvement):

func (c *nativeHelmChart) loadRepoIndex() ([]byte, error) {

func (c *nativeHelmChart) loadRepoIndex() ([]byte, error) {
	indexURL, err := getIndexURL(c.repoURL)
	if err != nil {
		return nil, err
	}

	req, err := http.NewRequest(http.MethodGet, indexURL, nil)
	if err != nil {
		return nil, err
	}
	if c.creds.Username != "" || c.creds.Password != "" {
		// only basic supported
		req.SetBasicAuth(c.creds.Username, c.creds.Password)
	}

	tlsConf, err := newTLSConfig(c.creds)
	if err != nil {
		return nil, err
	}

	tr := &http.Transport{
		Proxy:             proxy.GetCallback(c.proxy),
		TLSClientConfig:   tlsConf,
		DisableKeepAlives: true,
	}
	client := http.Client{Transport: tr}
	resp, err := client.Do(req)
	if err != nil {
		return nil, err
	}
	defer func() { _ = resp.Body.Close() }()

	if resp.StatusCode != http.StatusOK {
		return nil, errors.New("failed to get index: " + resp.Status)
	}
	return io.ReadAll(resp.Body)
}

No helm involvement, see the http.NewRequest(http.MethodGet, indexURL, nil) call and yes that means every helm chart has its own cache.

MicrosoftTeams-image (10)

The original feature commit @alexmt that enabled caching in argocd (includes an environment variable to configured cache lifetime + the cache is set on the client - eg. 1000 charts, one repo, one client -> one cache)

5889bbb

Not sure why this have been changed but the change explains our problem.

todaywasawesome added a commit that referenced this issue Aug 21, 2024
* fix: cache helm-index in Redis cluster

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

* Update repository.go

Fix order

Signed-off-by: Dan Garfield <dan@codefresh.io>

---------

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
Signed-off-by: Dan Garfield <dan@codefresh.io>
Co-authored-by: Dan Garfield <dan@codefresh.io>
ashutosh16 pushed a commit that referenced this issue Aug 23, 2024
* fix: cache helm-index in Redis cluster

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

* Update repository.go

Fix order

Signed-off-by: Dan Garfield <dan@codefresh.io>

---------

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
Signed-off-by: Dan Garfield <dan@codefresh.io>
Co-authored-by: Dan Garfield <dan@codefresh.io>
Signed-off-by: ashutosh16 <ashutosh_singh@intuit.com>
pasha-codefresh pushed a commit to pasha-codefresh/argo-cd that referenced this issue Oct 9, 2024
* fix: cache helm-index in Redis cluster

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>

* Update repository.go

Fix order

Signed-off-by: Dan Garfield <dan@codefresh.io>

---------

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
Signed-off-by: Dan Garfield <dan@codefresh.io>
Co-authored-by: Dan Garfield <dan@codefresh.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants