Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot apply CRD and a CR using it in the same plan/apply due to SSA #1367

Open
heschlie opened this issue Aug 12, 2021 · 63 comments
Open

Cannot apply CRD and a CR using it in the same plan/apply due to SSA #1367

heschlie opened this issue Aug 12, 2021 · 63 comments
Labels
acknowledged Issue has undergone initial review and is in our work queue. manifest progressive apply upstream-terraform

Comments

@heschlie
Copy link

Terraform version, Kubernetes provider version and Kubernetes version

Terraform version: 0.14.11
Kubernetes Provider version: 2.4.1
Kubernetes version: EKS 1.17

Terraform configuration

There is a bit going on here, but essentially this is the output from the terraform_flux_provder, and through some HCL abuse I'm massaging it into the right format.

resource "kubernetes_manifest" "install" {
  for_each   = { for manifest in local.install_manifest : join("-", [manifest.kind, manifest.metadata.name]) => manifest }
  depends_on = [kubernetes_namespace.flux_system]
  manifest   = each.value
}

resource "kubernetes_manifest" "sync" {
  for_each   = { for manifest in local.sync_manifest : join("-", [manifest.kind, manifest.metadata.name]) => manifest }
  depends_on = [kubernetes_manifest.install]
  manifest   = each.value
}

Question

Essentially I am using the kubernetes_manifest resource, and am trying to:

  1. Deploy some custom resource definitions
  2. Deploy some custom resources using the above definitions

Upon doing this I am greeted with an error during the plan because the CRDs have not been created and SSA is not happy about it:

Acquiring state lock. This may take a few moments...

Error: Failed to determine GroupVersionResource for manifest

  on main.tf line 49, in resource "kubernetes_manifest" "sync":
  49: resource "kubernetes_manifest" "sync" {

no matches for kind "Kustomization" in group "kustomize.toolkit.fluxcd.io"


Error: Failed to determine GroupVersionResource for manifest

  on main.tf line 49, in resource "kubernetes_manifest" "sync":
  49: resource "kubernetes_manifest" "sync" {

no matches for kind "GitRepository" in group "source.toolkit.fluxcd.io"

Releasing state lock. This may take a few moments...
ERRO[0105] Hit multiple errors:
Hit multiple errors:
exit status 1

Is there a way to tell the provider that things are ok, and not try to plan this? It seems like a bug or required feature before this comes out of experimental, as asking for someone to first apply the CRDs, then add and apply the CRs doesn't seem like a valid long term solution.

@kyschouv
Copy link

I came to ask the same thing, as I ran into this trying to migrate a bunch of things from Helm charts to straight Terraform. There are a lot of similar issues on the old repo:

hashicorp/terraform-provider-kubernetes-alpha#247
hashicorp/terraform-provider-kubernetes-alpha#218
hashicorp/terraform-provider-kubernetes-alpha#235

This is going to make adoption of kubernetes_manifest very spotty. It'll basically require a separate Terraform step to plan and apply any CRDs ahead of using those CRDs. This will be further complicated by using something like Operator Lifecycle Manager, where CRDs are installed as part of the operator, and several operators might have dependent CRDs. For example, my current dependency chain looks like this:

cert-manager -> azure service operator -> grafana (uses a mysql database/user deployed through azure service operator).

cert-manager is installed through OLM, and created CRDs which must be used to generate resources ahead of installing azure service operator, which then creates CRDs that are used to create resources ahead of installing grafana. My plan was to utilize depends_on to handle the dependency chain, but because of the issue above (and in the linked old issues), that's not working. It would require separating each of those into a separate Terraform plan/apply step and then performing them in order, which will become cumbersome fast.

My current workaround is to place CRD usage into a local Helm chart and use that through helm_release. It's very hacky, but Terraform doesn't seem to evaluate those during the plan stage (at least not in a breaking way if the CRDs aren't installed yet), so it's a usable workaround. It would be a lot cleaner to be able to just use kubernetes_manifest though.

@nbraun-wolf
Copy link

nbraun-wolf commented Aug 15, 2021

So as far as I understand, it wouldn't even work when separating out these 20k lines of yaml from cert manager into individual files so they could be used as manifest. Because the moment we run apply, it does this server-side check and will report that the CRD does not exist. If we had at least an exclude flag, doing 2 times apply wouldnt be so bad, but without that the first apply will the run in the second apply again if we target only that crd the first time around.

I am using a Makefile now. Its not that nice but at least I can spin everything up with one command.

@alxndr13
Copy link

alxndr13 commented Sep 2, 2021

For anyone having the same issue:

I wanted to install cert-manager using Helm and deploy a ClusterIssuer manifest in the same terraform apply step. The ClusterIssuer obviously depends on the cert-manager.io/v1 apiVersion, therefore the plan failed due to SSA.

I'm now using the kubectl provider for the ClusterIssuer Resource and combined with a depends_on meta argument on my cert-manager Helm release the plan and apply went through :)

@dhirschfeld
Copy link

I had hoped that the manifest resource being promoted out of alpha meant this problem had been resolved. Unfortunately, I've just hit it again with argocd. It would be great if someone from HashiCorp could comment on any plans to resolve this long-standing issue.

It might be a slightly different reason than the original hashicorp/terraform-provider-kubernetes-alpha#72 but the end result is the same - you can't use the manifest resource to deploy anything which depends on a CRD which greatly limits it's usefulness.

@redwyre
Copy link

redwyre commented Oct 31, 2021

This is a deep issue, because any time you want to reference a CRD, you need to split your terraform project into something that can be applied separately. eg using Traefik with CRDs means I need a plan to install Traefik first, then another to set up all my services.

I don't know what the solution would be here, I'd guess either terraform needs the ability to run a multi-stage plan, or have much better support for splitting a project into pieces.

@zorgzerg
Copy link

For anyone having the same issue:

I wanted to install cert-manager using Helm and deploy a ClusterIssuer manifest in the same terraform apply step. The ClusterIssuer obviously depends on the cert-manager.io/v1 apiVersion, therefore the plan failed due to SSA.

I'm now using the kubectl provider for the ClusterIssuer Resource and combined with a depends_on meta argument on my cert-manager Helm release the plan and apply went through :)

I have the same problem when deploying a cert manager. two resources: helm and Cluster Issuer, and WTF???

╷
│ Error: Failed to determine GroupVersionResource for manifest
│ 
│   with kubernetes_manifest.cluster_issuer,
│   on main.tf line 50, in resource "kubernetes_manifest" "cluster_issuer":
│   50: resource "kubernetes_manifest" "cluster_issuer" {
│ 
│ cannot select exact GV from REST mapper
╵
ERRO[0011] 1 error occurred:
        * exit status 1

We need fix! Very need.

@jufa2401
Copy link

jufa2401 commented Dec 1, 2021

@alexsomesan sorry for pinging, but have you seen this issue?

@EugenMayer
Copy link

This issue alone renders the manifest resource entirely useless IMHO. There is no good way - for now - to apply CRD's an isolated way, so one can e.g. deploy CRDs of all the helm charts like traefik, chunkydata, minio-operator ... and a lot more (since CRDs are now used everywhere).

One has to either pull all the CRDs from the projects helm/kustomize charts manually somekind of CRD-only resources (cumbersome and fragile task, esp. maintaining it), then apply those in an extra, isolated terraform run and then start using the manifest resource to create CRs.

It really reminds me of the 'typescript typings' issue back in the days. Now we probably will see all the projects grind out the CRDs in custom, isolated helm charts/kustomize repositories. Then (maybe) a common standard will follow so that CRDs can be 'looked up' by kubectl automagically (and installed) .. and before that terraform kubernetes manifest will rely on this 2-step process of installing CRDs first.

This is by no means the fault of terraform / kubernetes manifest here, it is what happens when you introduce strong typings late into game (looks at kubernetes). ConfigMaps (no typing) was the way to go, then we got CRDs and now we have the issue of 'typings first, then declaration of instances'.

The same issue/mistake was done with typescript / typings - or maybe it's just the nature those huge things evolve - step by step, with a lot of painful intermediate steps in between.

That said, considering the long road until those CRDs can be installed an isolated way, this resource is basically a playing ground IMHO. I'am not sure it makes sense to maintain this huge (and nice) little thing here until then - but of course that is not up to me to judge.

We would love to use manifests so much! But i guess, we need to wait for the helm/kustomize/kubernetes projects to make up their minds about CRDs and 'typings' first.

@alekc
Copy link

alekc commented Jan 17, 2022

One possible solution in case of argocd (for those who end up here through google) is to use terraform's target option.

This is how I am doing atm with 4 stage eks cluster bootstrapping

RuntimeArgoCDPlan:
  stage: runtime_argocd_plan
  extends:
    - .runtimeJob
  #  needs:
  #    - ClusterApply
  script:
    - gitlab-terraform init
    - gitlab-terraform validate
    - gitlab-terraform plan -target=module.runtime.helm_release.argocd
    - gitlab-terraform plan-json -target=module.runtime.helm_release.argocd
  artifacts:
    paths:
      - ${TF_ROOT}/plan.cache
    reports:
      terraform: ${TF_ROOT}/plan.json

RuntimeArgoCDApply:
  stage: runtime_argocd_apply
  extends:
    - .runtimeJob
  needs:
    - RuntimeArgoCDPlan
  script:
    - gitlab-terraform init
    - gitlab-terraform apply -target=module.runtime.helm_release.argocd
  when: manual

RuntimePlan:
  stage: runtime_plan
  extends:
    - .runtimeJob
  needs:
    - RuntimeArgoCDApply
  script:
    - gitlab-terraform init
    - gitlab-terraform validate
    - gitlab-terraform plan
    - gitlab-terraform plan-json
  artifacts:
    paths:
      - ${TF_ROOT}/plan.cache
    reports:
      terraform: ${TF_ROOT}/plan.json

RuntimeApply:
  stage: runtime_apply
  extends:
    - .runtimeJob
  needs:
    - RuntimePlan
  script:
    - gitlab-terraform init
    - gitlab-terraform apply
  when: manual

basically, on the first run you install only argocd helm chart, and then everything else is being deployed through argo manifests (which support unknown crd if they are in the helm chart).

Since this solution uses --target, it's not necessary split the project && state.

ArchiFleKs added a commit to particuleio/terraform-kubernetes-addons that referenced this issue Jan 27, 2022
Because of hashicorp/terraform-provider-kubernetes#1367

First deployment of VolumeSnapshotClass fail because the CRDs does not
exist yet.

Fixes #807

Signed-off-by: Kevin Lefevre <kevin@particule.io>
@rlees85
Copy link

rlees85 commented Jan 27, 2022

Just going to add the Kustomization provider doesn't have this problem either: https://registry.terraform.io/providers/kbst/kustomization/latest/docs

You can deploy the CRDs using Helm or whatever and then the CRs themselves by using a Kustomization that depends on the Helm release.

Would also like to be able to do this directly with a kubernetes_manifest. Would be so much cleaner.

@ArchiFleKs
Copy link

ArchiFleKs commented Jan 27, 2022

kubectl provider seems to work also for this use cas. I revert to it here instead of kubernetes_manifest as it was not able to deploy because CRDs are being created by the dependencies.

@DaleyKD
Copy link

DaleyKD commented Aug 9, 2022

I can't believe I'm going to have to add a totally separate provider just to do manifests.

@primeos-work
Copy link

Ok, so this is currently the 3rd most upvoted issue of this provider: https://github.com/hashicorp/terraform-provider-kubernetes/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc

The main perceived problem for this issue seems to be Terraform's "limitation" that everything needs to be planned in advance. However, technically, this doesn't seem to be a blocker for this issue. This should be proven by the fact that other providers do reportedly already enable this common use case (e.g., kbst/kustomization and gavinbunney/kubectl).

The actual limitation here seems to be that this providers wants to validate CRs during the planning phase (i.e., a design choice by this provider). This validation is obviously only possible if the CRD already exists. Therefore, I propose the following solution: We add a validate (better names are welcomed) argument for the kubernetes_manifest resource. This argument would default to true in which case the behavior remains exactly as is. However, if the user chooses to set validate = false, the provider would first check if the resource already exists and, if this isn't the case, the planning stage would simply conclude that the resource needs to be created and not do any further validation (i.e., the user is responsible for the correctness of the provided data and specifying the correct dependencies so that the resource always gets created after the CRD - this should be documented and otherwise it will simply fail during the apply phase). This should also be the only behavior that needs to be altered. If the CR already exits then the CRD also already has to exist. There might be some additional things to consider when implementing this (I'm not yet familiar with the code, unfortunately) but I don't think it should make this provider worse / cause additional limitations (especially since the current design already has quite a few problems/limitations like that destroying CRDs will also destroy the associated CRs (which this provider cannot track/handle properly), that the CRD could be destroyed/altered between the plan and apply phases, that the providers fails if the CR already exists on K8s but not in Terraform's state file, that dependency tracking between CRDs and CRs is left to the user, and there are likely even more issues/limitations).

What do you think of this proposal/idea? Does this seem like a good idea or did I miss anything relevant (like new issues this validation bypass could create, techical limitations, etc.)? It would be great if we could find a way to finally resolve this issue/limitation as this seems to be quite a major limitation of this provider and something that should be avoidable. Feedback is welcome :)

@alexsomesan
Copy link
Member

alexsomesan commented Sep 6, 2022

Hi @primeos-work!

Thanks for taking the time to analyse this problem and make such detailed suggestions!
In order to move this conversation further in a constructive way, please let me make some corrections to your assumptions about how the provider is implemented and why.

The reason why the provider needs access to the CRD during planning is not for validation purposes (in fact, validation is still a missing feature in the manifest resource), but rather because the provider needs to establish the structure (aka schema) of the resource so that Terraform can correctly persist the state for it. Terraform's state storage and the provider protocol are strongly typed. The entire structure of a resource, including the types of all its attributes, needs to be known to Terraform from the creation of the resource and Terraform will assume it stays constant once it's been defined.

The only way we can fully describe a CR resource to Terraform at creation time is by looking up it's schema details in its parent CRD. Unless we do this, all the structural information we would have about the CR is the set of attributes the user included in the configuration. This is not a full description of the resource, but rather a subset of it. The minute the user then updates the resource adding a new, previously unspecified attribute value, Terraform will fail and report inconsistent state structure. This will lead to corrupt state storage and will make any update operations on the CR resource impossible.

Because of the above reasons, we cannot avoid requesting the CRD structure during planning.

@primeos-work
Copy link

Hey @alexsomesan, thanks a lot for your fast and detailed response with the corrections!
My bad, I already feared it couldn't be quite that simple, but let's try to find a new way then.

because the provider needs to establish the structure (aka schema) of the resource so that Terraform can correctly persist the state for it

Does this step necessarily have to happen during the planning stage or would it, e.g., be possible to use only the set of attributes the user included in the configuration for the plan, if the CRD lookup fails, and then perform the CRD lookup during the apply stage (when creating the CR)? That way, the final state would be correct and only the plan should change. For correctness, the apply could even fail if the CR already exists during the apply stage as it was manually/externally created since the planning (IIRC this is even already the current behavior). So from a theoretical standpoint this approach should mostly keep the correctness of applying the plan (the only issue I see is if the CRD changes between the plan and apply stages - I'll try to test the current behavior later - this might be acceptable though because the CRD is versioned and there shouldn't occur any breaking changes anyway). Would this be something that could be implemented or are there additional technical challenges/restrictions (e.g., the mentioned provider protocol - I'm not sure if the creation you mentioned refers to the creation of the full description/specification/structure during the planning or the creation of the actually object during the application of the plan)?

@alexsomesan
Copy link
Member

alexsomesan commented Sep 7, 2022

@primeos-work, unfortunately it's still not that simple (as you realised yourself).

Let's first consider what is the point of having a "plan" step in Terraform. It is to present the user with an accurate preview of exactly what is going to be performed during "apply". This is to give the user a chance to vet the proposed changes and confirm or abort before anything is touched. Once a "plan" is confirmed by the user, "apply" will only perform exactly what was "promised" during the plan, not more not less. That means we establish a trust contract between Terraform and the user. This is a major differentiator of Terraform against other tools that might at first glance seem "simpler". I can appreciate that maybe this value isn't immediately recognised by new Terraform users, but that is a whole other conversation.

Because the plan is a "promise" that should not be broken during apply, Terraform enforces consistency checks on values and types between the plan and the apply result. One of these checks is for schema types of resources and attributes to match between plan and apply. Missing or extra attributes between plan and apply will fail this check and in turn the whole operation fails. This is why there is no other option but to fully construct and "promise" the type of the resource during planning. Doing it later will be rejected by Terraform's consistency checks.

Terraform isn't trying to be unreasonably pedantic here. There is a very solid point about keeping types consistent like this. Consider some resource depending on a kubernetes_manifest resource. You might want to interpolate the value of one of the manifest attributes into that other resource's configuration. During planning, Terraform will decide on the ordering of operations on these resources as well as the values displayed for attributes of dependent resource based on the availability of a value for the upstream manifest attributes. But if it doesn't know if there even is an attribute there, how can it make any promises about the structure of the dependent resource? Without type checking, it might expect a string value and instead get a set of booleans at apply time. This would render the whole promise of the plan use-less and thus the whole point (and effort) of having a planning mechanism in Terraform.

In conclusion, if the planning phase is of little value to one's use-case then there are alternative providers that decided to opt out of offering these guarantees (a few mentioned above in this thread). These providers just treat the resources as blocks of unstructured text (from Terraform's POV) that they just throw at the API. While they do provide a seemingly simpler first-use experience, they will soon fall short when it comes to combining the resources in more complex configurations.

There is no point in us creating yet another provider with the same limitations. Instead, this provider tries to offer as much of Terraform's value propositions as possible and that includes properly following the plan-apply workflow. The solution for the problem we are discussing here is best solved in Terraform itself, in order to preserve all the guarantees Terraform is designed to offer. We are having ongoing conversations with the Terraform team about approaches to it and they are experimenting with various solutions. Once they settle on an approach, we will be discussing it with the wider audience.

@primeos-work
Copy link

Thanks again for all of your insights @alexsomesan!

If I may share my thoughts (from a user's perspective) on Terraform's planning and state features / design decisions (a bit out of scope here but I feel like it's relevant - then again it shouldn't be something new): I do see the value of Terraform’s planning and it does indeed provide many great advantages (that I like/appreciate as a user). However, it obviously also comes with many drawbacks that can be quite a PITA (if I may say so), especially when first working with Terraform. But as you already mentioned it can be a huge benefit in the long run. Overall, I’m quite split on this design decision. I do find it nice/useful but its strictness also often seems to limit practicability (or at least comfort / ease of use; in addition, it would be nice if diverged state could be imported semi-automatically / detected better, etc.). Terraform’s strict and complete state tracking especially seems a bit out of place or at least redundant in cases like K8s where the complete, declarative state is already available (sure, it still offers advantages / additional features but I'd say in such cases it provides fewer advantages and more drawbacks than in other cases). This will also result in multiple issues/annoyances if the K8s cluster’s state diverges from what Terraform thinks it still is but this divergence should obviously be avoided by the user in the first place. And even with all of the nice planning features you can still do things like using the kubernetes_manifest resource to change a CRD in a breaking way (or even deleting it) which can cause lots of issues for CRs (other kubernetes_manifest resources) that depend on this CRD (e.g., causing Terraform to fail later in the apply operation or in the next plan/apply operation without any additional changes). This can rightfully be considered misuse but in a perfect world without misuse Terraform could be much simpler (e.g., I love typing but it's kinda only there to prevent misuse).

-> Anyway, what I’m trying to say is that it might be nice if Terraform could be a bit less strict in some places (in the sense of “perfect is the enemy of good”) but that’s obviously also a very dangerous thing to do and needs to be considered very carefully (to avoid making things worse / opening Pandora's box).

Regarding the motivation of the current behavior/design (dependencies, variable interpolation, promises, etc.):

But if it doesn't know if there even is an attribute there, how can it make any promises about the structure of the dependent resource?

In that case it shouldn’t (and I'd say that's fine). IMO the normal behavior of kubernetes_manifest should remain unchanged and there should either be an additional argument (like the previously proposed validate argument that needs a better name) or another resource for this special “mode”/hack where the CRD cannot be looked up. In that case the resource would either not output anything (which would be the easiest and should be fine for users) or it would only output the attributes specified in the Terraform configuration (which I wouldn’t suggest as it comes with a lot of challenges (having to infer the types and trying to validate them once the CRD is available) and one should be able to use Terraform variables, data sources, etc. instead). This would trade one feature for another but I think it should be fine for most if not all use cases here (the main use case that I see would be to depend on a default value from the CRD that isn’t part of the Terraform configuration but in practice there should always be pretty acceptable workarounds like making that value part of the Terraform configuration and setting that attribute explicitly).

This would render the whole promise of the plan use-less and thus the whole point (and effort) of having a planning mechanism in Terraform.

So with an implementation as described above this problem should be avoided (the plan wouldn't be complete but this is already the case as Terraform cannot know everything in advance anyway) and only some features of kubernetes_manifest would be limited. Overall, I’d still consider it a significant improvement as it should be better than the current alternatives (using unverified 3rd-party providers with less elegant approaches, using the -target option, splitting the Terraform configuration up into multiple root modules (so that the CRDs are created before the CRs), or using even worse hacks (I've seen some that likely shouldn't even be shared)).

In conclusion, if the planning phase is of little value to one's use-case then there are alternative providers that decided to opt out of offering these guarantees (a few mentioned above in this thread). These providers just treat the resources as blocks of unstructured text (from Terraform's POV) that they just throw at the API. While they do provide a seemingly simpler first-use experience, they will soon fall short when it comes to combining the resources in more complex configurations.

I agree with all of that. Tbh it’d be nice though to have a more official one (in terms of reviews/verifications, maintenance / community support, etc.) – but that is of course out-of-scope for this issue (and a challenge in general).

The solution for the problem we are discussing here is best solved in Terraform itself, in order to preserve all the guarantees Terraform is designed to offer. We are having ongoing conversations with the Terraform team about approaches to it and they are experimenting with various solutions. Once they settle on an approach, we will be discussing it with the wider audience.

That sounds very interesting! Thanks a lot for looking into this :) I’m excited to learn more about this so, if possible, please do share a link here once that discussion is public (unless it'll be discussed in this issue anyway).

I hope my thoughts/suggestions here (from a users’ perspective) do provide some value and that some parts of it might be considered. I wish I could be of more help but unfortunately my knowledge of Terraform, the code/implementation, design decisions, data structures, protocols, etc. is still by far too limited for more concrete and reasonable ideas.

@ghost
Copy link

ghost commented Oct 12, 2022

In my case, I'm using the helm provider to install fluxcd and trying to make a kubernetes_manifest with a GitRepository object.

Simply adding a skip_kind_check = true would be enough to avoid this problem.

@thechubbypanda
Copy link

thechubbypanda commented Dec 5, 2023

May have been said already but another potential way of solving this would be to:

  • Allow a resource to dictate that it provides X CRD.
  • Then have terraform just trust the provides tag until apply phase.
  • If during apply, after applying the provider, the resource does not exist, then fail.

Does this make sense?

@andrewhharmon
Copy link

any update on this issue?

@jmgilman
Copy link

jmgilman commented Apr 9, 2024

The kubectl provider has been abandoned and there's an existing issue where non-namespaced resources fail to get created. This prevents one of the primary workarounds to the issue documented here.

Is it possible to get an update on if this is planning on being addressed?

@froblesmartin
Copy link

The kubectl provider has been abandoned and there's an existing issue where non-namespaced resources fail to get created. This prevents one of the primary workarounds to the issue documented here.

Is it possible to get an update on if this is planning on being addressed?

@jmgilman check https://github.com/alekc/terraform-provider-kubectl, I haven't used it, but someone mentioned it in an issue of the other repository

@YarekTyshchenko
Copy link

Are there any open PRs that adress this issue? It looks like the validation should treat missing CRD as if the resource itself was missing, as in k8s its not possible to have a resource created with a missing CRD (iirc).

@charlierm
Copy link

Any update on this at all? It means we're having to put everything in seperate stacks.

@pepsipu
Copy link

pepsipu commented Jun 2, 2024

this sent me back to pulumi 😭

@atheiman
Copy link

atheiman commented Jun 3, 2024

@alexsomesan could the kubernetes provider include a second resource (lets just call it kubernetes_manifest_functional) that does not make all the plan promise guarantees you describe but does satisfy all the use cases defined here?

@YarekTyshchenko
Copy link

YarekTyshchenko commented Jun 3, 2024

@alexsomesan could the kubernetes provider include a second resource (lets just call it kubernetes_manifest_functional) that does not make all the plan promise guarantees you describe but does satisfy all the use cases defined here?

But this defeats the point as you'd basically never be able to use the non functional version.

IMO most of the "errors" should be folded into manifest not existing:

  • No connection to cluster
  • No CRD
  • No resource instance

Edit: I of course agree that this is best solved inside Terraform itself, but if wishes were fishes.

@fabn
Copy link

fabn commented Jun 8, 2024

Really, after years we still have this issue?

Why this wouldn't work?

May have been said already but another potential way of solving this would be to:

  • Allow a resource to dictate that it provides X CRD.

  • Then have terraform just trust the provides tag until apply phase.

  • If during apply, after applying the provider, the resource does not exist, then fail.

Does this make sense?

@gagbo
Copy link

gagbo commented Jun 25, 2024

So meanwhile the only way to resolve the issue is to comment out the kubernetes_manifest that rely on CRDs, and then apply, and then uncomment the manifest and reapply? Not sure I’m understanding this correctly

@alekc
Copy link

alekc commented Jun 25, 2024 via email

@heschlie
Copy link
Author

Not so far off from the 3 year mark when I opened this issue. Even back then we were opposed to deploying k8s resources via terraform but deploying flux with it to help bootstrap the cluster seemed like a good idea, this was before flux offered a provider to do the bootstrapping, we migrated to that once it was available IIRC.

I suppose the lesson here is don't manage your k8s resources in TF unless you absolutely have to as, and this is probably not a surprise to most folks, how k8s manages its resources isn't very compatible with TF.

@alekc
Copy link

alekc commented Jun 25, 2024 via email

@sdemjanenko
Copy link

sdemjanenko commented Jul 19, 2024

Reflecting on the previous comment here, I propose that Terraform could benefit from supporting multiple plan + apply cycles. Specifically, if Terraform detects that a Custom Resource (CR) cannot be planned because its CRD isn’t yet installed, it could defer the CR to a subsequent cycle, provided that it can still make progress in the current cycle.

During the apply phase, Terraform would only execute the actions from the completed plan and then indicate if additional cycles are needed.

This approach would be more efficient than the current workarounds, which involve separating resources into different sets of Terraform files, or temporarily commenting out parts of the code or using conditional logic to manage resource dependencies. Such enhancements would streamline the development process, allowing Terraform to handle resource ordering more intuitively and reducing the manual effort required to ensure resources are applied in the correct order.

Update:

@voronin-ilya
Copy link

The best workaround I've found is to package the custom resources YAMLs into a small Helm chart, bundle it with Terraform module code, and then install it using the helm_release resource:

resource "helm_release" "custom_resources" {
  name  = "custom_resources"
  chart = "${path.module}/custom_resources"

  depends_on = [
    helm_release.crds
  ]
}

@swissbuechi
Copy link

The best workaround I've found is to package the custom resources YAMLs into a small Helm chart, bundle it with Terraform module code, and then install it using the helm_release resource:

resource "helm_release" "custom_resources" {
  name  = "custom_resources"
  chart = "${path.module}/custom_resources"

  depends_on = [
    helm_release.crds
  ]
}

I did exactly the same thing for the cert-manager clusterissuer.

cert-manager.tf
charts/cert-manager-clusterissuer
├── Chart.yaml
└── templates/
    └── cluster_issuer_prod.yaml
    └── cluster_issuer_staging.yaml

cert-manager.tf:

locals {
  solvers_ingress_class_name = "ingress-nginx"
}

resource "kubernetes_namespace_v1" "cert_manager" {
  metadata {
    name = "cert-manager"
  }
}

resource "helm_release" "cert_manager" {
  name        = kubernetes_namespace_v1.cert_manager.metadata.0.name
  repository  = "https://charts.jetstack.io"
  chart       = kubernetes_namespace_v1.cert_manager.metadata.0.name
  version     = var.cert_manager_helm_version
  namespace   = kubernetes_namespace_v1.cert_manager.metadata.0.name
  max_history = 1
  set {
    name  = "installCRDs"
    value = "true"
  }
}

resource "helm_release" "cert_manager_clusterissuer" {
  name        = "cert-manager-clusterissuer"
  chart       = "${path.module}/charts/cert-manager-clusterissuer"
  max_history = 1
  set {
    name  = "acme_email"
    value = var.acme_email
  }
  set {
    name  = "solvers_ingress_class_name"
    value = local.solvers_ingress_class_name
  }
  depends_on = [
    helm_release.cert_manager
  ]
}

Chart.yaml:

apiVersion: v2
name: cert-manager-clusterissuer
version: 0.1.0

cluster_issuer_prod.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: {{ .Values.acme_email }}
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          ingressClassName: {{ .Values.solvers_ingress_class_name }}

cluster_issuer_prod.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: {{ .Values.acme_email }}
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - http01:
        ingress:
          ingressClassName: {{ .Values.solvers_ingress_class_name }}

@gmestre-freiheit
Copy link

This issue is still open for more than 3 years with no viable solution and as most already said here this renders the provider useless for half of its use cases.
I believe this is still an issue that must be fixed and not just accepted.

If time is the issue I can invest some. Is there any available maintainer for which a solution could be discussed with?
Maybe we could schedule a call and go over the multiple solutions described here and the corresponding setbacks and arrive to a solution that could be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged Issue has undergone initial review and is in our work queue. manifest progressive apply upstream-terraform
Projects
None yet
Development

No branches or pull requests