Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issues with Helm generator - archive already exists #5271

Open
gaeljw opened this issue Aug 17, 2023 · 21 comments
Open

Concurrency issues with Helm generator - archive already exists #5271

gaeljw opened this issue Aug 17, 2023 · 21 comments
Labels
area/helm kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/under-consideration

Comments

@gaeljw
Copy link

gaeljw commented Aug 17, 2023

What happened?

Given the following file structure where the base has a helmChart reference:

base/kustomization.yaml <-- has a helmChart reference
overlays/overlay1/kustomization.yaml
overlays/overlay2/kustomization.yaml

If you run Kustomize on both overlays at the same time, there's a chance one of the 2 commands fail with following error:

Error: failed to untar:
a file or directory with the name /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts/k8s-app-1.2.7.tgz already exists:
unable to run: 'helm pull --untar --untardir /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts --repo https://argo:D64mNn3f6ZU77UGm2YWy@gitlab.com/api/v4/projects/29393074/packages/helm/stable k8s-app --version 1.2.7'
with env=[HELM_CONFIG_HOME=/tmp/kustomize-helm-837700893/helm HELM_CACHE_HOME=/tmp/kustomize-helm-837700893/helm/.cache HELM_DATA_HOME=/tmp/kustomize-helm-837700893/helm/.data]
(is 'helm' installed?)

See several users reporting the issue in the context of ArgoCD where it's a common setup to have multiple "applications" concurrently synchronizing and if they happen to share same Kustomize base, the error above happens.

What did you expect to happen?

Ideally, it should be possible to run Kustomize on both overlays without any concurrency issues.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Expected output

N/A

Actual output

N/A

Kustomize version

5.0.3

Operating system

Linux

@gaeljw gaeljw added the kind/bug Categorizes issue or PR as related to a bug. label Aug 17, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 17, 2023
@gaeljw
Copy link
Author

gaeljw commented Aug 17, 2023

I haven't looked at the code yet. Is there any reason to raise this error in the 1st place?

I see several options:

  • do not check for file already present, if file already present, assume it's the good one and continue
  • implement some kind of "lock"/retries mechanism: if a run is ongoing, wait few milliseconds and retry?
  • download the charts in a repo with a unique name. I think this would conflict with existing features though.

Maybe the new behaviour could be enabled with a flag like --ignore-helm-concurrency-issue (just to give the idea, the name should be shorter of course!) if we're not confident it won't break existing usages.

@gaeljw
Copy link
Author

gaeljw commented Aug 18, 2023

Actually, looking now at the code, Kustomize is "only" calling Helm with a command similar to helm pull --untar --untardir ... --repo ... ... and the error comes from Helm.

func (p *plugin) pullCommand() []string {
args := []string{
"pull",
"--untar",
"--untardir", p.absChartHome(),
"--repo", p.Repo,
p.Name}
if p.Version != "" {
args = append(args, "--version", p.Version)
}
return args
}

It would be nice if Helm supported some kind of "overwrite" mode then I guess. But this might still bring concurrency issues? Maybe it's still best to handle it at Kustomize level?

Related issues on Helm:

@be-hase
Copy link

be-hase commented Sep 26, 2023

I am also encountering this problem.

@DiptoChakrabarty
Copy link
Member

hi @gaeljw if you don't mind ,can you please share the kustomization file along with files you are using under the overlays directory so that we can also debug it once

@greenkiwi
Copy link

@DiptoChakrabarty I've created an example here:

https://github.com/greenkiwi/kustomize-helm-concurrency-issue

It can be run locally or it can be run via the Github Action there. It has our script that runs kustomize in parallel for the builds. Note, I've added a for loop, just to help ensure that the problem can be seen.

It also appears that if the chart is already there with the correct version, it doesn't have an issue.

@gaeljw
Copy link
Author

gaeljw commented Oct 17, 2023

@DiptoChakrabarty I can provide another minimal reproduction case as well if needed but it's quite similar to the one given by @greenkiwi (thanks!)

@shakefu
Copy link

shakefu commented Nov 2, 2023

Could a potential fix be to use os.MkdirTemp as the untar target and then copy the resulting filesystem into the target path, so helm won't fight itself?

@natasha41575
Copy link
Contributor

/assign @annasong20

for further investigation

@gaeljw
Copy link
Author

gaeljw commented Nov 28, 2023

Is there any progress on this issue? Do you think there's a chance it can be fixed?

@annasong20
Copy link
Contributor

/unassign
Sorry @natasha41575, @gaeljw, I don't think I'll have time to investigate this issue.

@annasong20
Copy link
Contributor

Hi @gaeljw, sorry for the late response. I can reproduce your issue. Yes, as your comments point out, the problem is due to parallel calls to https://github.com/kubernetes-sigs/kustomize/blob/master/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L256-L261. Multiple threads see that the .tgz file doesn't exist and attempt to download it. After the 1st thread, successive downloads fail because the .tgz was downloaded by the 1st.

Our stance is that this is a helm issue, because Helm clearly doesn't support parallel runs of helm pull, as aforementioned. Pasting the terminal output here for clarity

helm pull --untar --untardir /tmp/charts/minecraft-3.1.3 --repo https://itzg.github.io/minecraft-server-charts minecraft --version 3.1.3

Error: failed to untar: a file or directory with the name /tmp/charts/minecraft-3.1.3/minecraft-3.1.3.tgz already exists

See Kustomize's stance on Helm support here.

However, if this issue is popular enough, I personally would be willing to accept a patch that either uses "locks" or unique download directory names (which will hurt performance). Would need to discuss with other Kustomize maintainers, especially @koba1t.

/area helm
/triage under-consideration

@k8s-ci-robot k8s-ci-robot added area/helm triage/under-consideration and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 20, 2023
@greenkiwi
Copy link

@annasong20 Should we also be raising this issue with helm?

@gaeljw
Copy link
Author

gaeljw commented Dec 21, 2023

Thanks for the feedback @annasong20

For now Helm considers this is not an issue with Helm but its usage in Kustomize.
See the related issue: helm/helm#12315

If anyone have strong arguments to present to Helm, please do :)

I'm more in the idea of fixing it in Kustomize but I don't have enough knowledge in Go to proceed with a PR.

@annasong20
Copy link
Contributor

annasong20 commented Dec 21, 2023

@gaeljw Thanks for the context! I can empathize with Helm's argument.

Will discuss this issue with the team after the holidays, then we can update the triage status.
/cc @greenkiwi @koba1t @natasha41575

@natasha41575
Copy link
Contributor

We have discussed this among the maintainers and we agree that this is a valid issue and we should try to fix it if we can. We are not sure that a locking mechanism will be possible because it is two different processes running, but we might be able to do something with the directory structure (e.g. adding a hash). If we modify the directory structure, we need to be really careful that we don't break anything with users' existing helm charts, whether those are local charts or remote charts that have already been pulled and cached by kustomize.

We are also open to other solutions if anyone has a feasible alternative.

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 3, 2024
@koba1t
Copy link
Member

koba1t commented Jan 3, 2024

I feel this opinion helps to resolve this problem.
helm/helm#12315 (comment)

Ignore this error from Helm.

This idea helps resolve this problem without hurting performance and breaking change.
I believe this is a better solution, but it takes time to implement.

@koba1t
Copy link
Member

koba1t commented Jan 24, 2024

I am waiting for this PR to be merged.
helm/helm#12725

@imperfect-fourth
Copy link

any updates on this issue? there's no activity on the above mentioned PR in the last 3 months.

@Skaronator
Copy link

I don't think helm/helm#12725 get merged any time soon. It would be faster if we fix this in kustomize.

@gaeljw
Copy link
Author

gaeljw commented Jun 21, 2024

For the record, at my company, we changed our structure to workaround this issue.

Instead of having N Kustomize folders -> 1 Kustomize base -> 1 Helm chart, we integrated the generation of the N variants in the Helm chart itself using arrays/dict in the Helm values.

In our case, it makes sense and it's easier to maintain in the end, but it might not be for everyone for sure.

@shoppingjaws
Copy link

Adding retry-mechanism to solve this in kustomize not helm, right?
Is there any idea?

khrm added a commit to khrm/infra-deployments that referenced this issue Dec 18, 2024
khrm added a commit to khrm/infra-deployments that referenced this issue Dec 18, 2024
khrm added a commit to khrm/infra-deployments that referenced this issue Dec 18, 2024
rhtap-qe-bots pushed a commit to redhat-appstudio-qe/infra-deployments that referenced this issue Dec 18, 2024
khrm added a commit to khrm/infra-deployments that referenced this issue Dec 18, 2024
rhtap-qe-bots-2 pushed a commit to redhat-appstudio-qe/infra-deployments that referenced this issue Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. triage/under-consideration
Projects
None yet
Development

No branches or pull requests