Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a limited generic YAML resource to the provider #195

Closed
wants to merge 15 commits into from

Conversation

lawrencegripper
Copy link

Fixes #183

Hi,

First up, hear me out - I think this is a good approach and has serious upside for a large number of users. I know this has been looked at before and I've spend time reading those threads and hope that this approach is different enough to warrant consideration.

What doesn't this solve?

Creation of standard types (services, pods etc). I propose that the preferred method for these to remain as the existing rich schema resources.

What does it solve?

This allows users to create other exotic types in Kubernetes through terraform, the main aim is to support CRDs (currently testing with an ingress resource but plan to extensively test against CRDs).

It follows a similar model to the existing Azure TF provider in which rich types are provided but a 'fallback' is provided to supply JSON directly to the Azure API's. Similar to that resource it has heavy limitation (updates are Create/deletes, YAML must only contain a single resource and there are no outputs)

Why is it different from past attempts?

The code doesn't require any knowledge of the resources its creating. Using the UniversalDecoder and PartialMetaDataObject along with the ServiceDiscoveryClient and RestClient it is able to parse the YAML provided, validate the kubernetes cluster can support the type defined and create a RestClient capable of performing CRUD operations on the resource in K8s.

This means that, like the ARM Deployment resource, it can support new K8s resource going forward without code changes.

Secondly the code avoids using odd parsing and reflection methods on the YAML and the Clusters version of the resource to spot diffs. It uses a CustomizeDiff function along with the ResourceVersion and UID fields present on the K8s resources. If any change is made to the K8s resource outside of TF these will be picked up by a the diff and the resource will be recreated in the correct configuration.

Status

Currently this is manually tested. The code is commented to aid understand but I haven't updated the provider docs. If there is interest in taking this forward I will:

  • Add Documentation
  • Add Tests (CRDs and review for edge cases)
  • Tidy up code

@ghost ghost added the size/XL label Oct 26, 2018
@pdecat
Copy link
Contributor

pdecat commented Oct 26, 2018

That would be a great addition!

What about using JSON instead of YAML to be able to leverage terraform's jsonencode() function to configure the resource (and soon to come jsondecode() function)?

Maybe both languages could be supported and the resource be named something like kubernetes_generic with mutually exclusive json_body and yaml_body fields.

Edit: current implementation does not seem too much tied to YAML so maybe just a body field with automated encoding detection could do it.

@paultyng
Copy link
Contributor

@pdecat JSON is valid YAML, so i think in theory would just work.

@pdecat
Copy link
Contributor

pdecat commented Oct 26, 2018

@paultyng that's what I figured reviewing the code, it is not much tied to YAML.

// Create the resource in Kubernetes
err = client.Post().AbsPath(absPath["POST"]).Body(rawObj).Do().Into(metaObj)
if err != nil {
return fmt.Errorf("failed to create resource in kuberentes: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
return fmt.Errorf("failed to create resource in kuberentes: %+v", err)
return fmt.Errorf("failed to create resource in kubernetes: %+v", err)

// defined in the YAML
client, absPath, rawObj, err := getRestClientFromYaml(yaml, meta.(KubeProvider))
if err != nil {
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Suggested change
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
return fmt.Errorf("failed to create kubernetes rest client for resource: %+v", err)

// defined in the YAML
client, absPaths, _, err := getRestClientFromYaml(yaml, meta.(KubeProvider))
if err != nil {
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
return fmt.Errorf("failed to create kubernetes rest client for resource: %+v", err)


client, absPaths, _, err := getRestClientFromYaml(yaml, meta.(KubeProvider))
if err != nil {
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
return fmt.Errorf("failed to create kubernetes rest client for resource: %+v", err)

metaObj := &meta_v1beta1.PartialObjectMetadata{}
err = client.Delete().AbsPath(absPaths["DELETE"]).Do().Into(metaObj)
if err != nil {
return fmt.Errorf("failed to delete kuberentes resource '%s': %+v", metaObj.SelfLink, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("failed to delete kuberentes resource '%s': %+v", metaObj.SelfLink, err)
return fmt.Errorf("failed to delete kubernetes resource '%s': %+v", metaObj.SelfLink, err)


client, absPaths, _, err := getRestClientFromYaml(yaml, meta.(KubeProvider))
if err != nil {
return false, fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return false, fmt.Errorf("failed to create kuberentes rest client for resource: %+v", err)
return false, fmt.Errorf("failed to create kubernetes rest client for resource: %+v", err)

@paultyng
Copy link
Contributor

paultyng commented Oct 26, 2018

@lawrencegripper thanks for this PR, this is a common ask (see #150, #141, #109 and also terraform-provider-k8s), and definitely something we are interested in doing. We also do something similar for the nomad_job resource, but we have learned lessons from both the way that is implemented, and with the Azure resource you mention.

Due to the way the type and diffing system was implemented in Terraform 0.11 and earlier, we couldn't get the UX to work quite the way we wanted, but in Terraform 0.12, coupled with some of the improvements to the type system which allow for better nested complex types, we think we can provide this capability in a way that fits in much cleaner with how Terraform users expect it to work (and how Kubernetes natives expect it to work), but it requires changes to the Terraform Provider SDK to enable those new features.

You've also run in to some of the complications with the CustomizeDiff work you started. Kubernetes uses its own three way diffing logic, different from Terraform's. This resource should really replicate that functionality so that it works in a way Kubernetes user's expect (the defacto standard being how kubectl apply works). Unfortunately, that diffing logic isn't packaged up in a easy to use way, and even though there is work to implement server side apply, it would limit the usefulness to versions of Kubernetes where that lands and going forward.

We have an ongoing project internally, where we have collected the requirements for this new resource, and are just waiting for 0.12 to be released (and the new maintainer for this provider, @alexsomesan to ramp up somewhat), before working on the proposal for implementation. I would think once 0.12 has landed, and "baked" somewhat, we will probably be posting the proposal on this repository for community feedback.

@lawrencegripper
Copy link
Author

@paultyng awesome, great to hear there will be improvements that can be made to this when TF 0.12 arrives. In the meantime I think the UX of the code in this PR does fit the brief of matching expectations of TF and K8s users.

I've recorded a quick run-through of using the resource. First up I use TF to create the resource in the cluster then I use Kubectl to delete the resource and show TF recreating it. Next I edit the resource with Kubectl (changing port 80->81) and then re-run TF. As the edit has caused the resource_version to be changed this is picked up by TF and an update is planned then applied. The user see's the update reason referencing the resource_version as having changed.

yamlux

Regarding the 3way merges and kubectl apply, I spent a bit of time using govendor to pull in the jsonmergepath logic from kubectl and after playing for a while discounted that approach (technically it felt doable although I didn't prove that).

One thing that put me off this approach was that it didn't seem to fit with my expectations of how the Terraform provider should work (these may well not be right). I have an expectation that TF follows a Desired Configuration approach and will fix any drift from the declared configuration and the actual running state. As such, if I declare a YAML definition for a resource I expect that to be what is running. I didn't feel like users would expect manual edits made to the resource through Kubectl to be maintained, as they would be with a 3 way merge (for example if they added additional configuration fields not specified in the YAML in TF).

Another limitation of attempting to use the 3 way merge approach is that the provider has no knowledge of what fields in the YAML support updating. Essentially you would need to dynamically decide which fields are likely to require ForceNew to be set on them which I didn't see as feasible. Kubectl allows a force command which will mean apply falls back to delete->create. I looked at using this option but again this creates an issue as in some circumstances TF would update the resource and others it would recreate it, when running a plan operation the user wouldn't know which was going to happen and that didn't feel nice.

That being said, if the 3 way merge support is a blocker to this work being merged I'm happy to revisit the problem and attempt to vendor in the kubectl apply patch logic for use within the provider.

@paultyng
Copy link
Contributor

paultyng commented Oct 29, 2018

One thing that put me off this approach was that it didn't seem to fit with my expectations of how the Terraform provider should work (these may well not be right). I have an expectation that TF follows a Desired Configuration approach and will fix any drift from the declared configuration and the actual running state. As such, if I declare a YAML definition for a resource I expect that to be what is running. I didn't feel like users would expect manual edits made to the resource through Kubectl to be maintained, as they would be with a 3 way merge (for example if they added additional configuration fields not specified in the YAML in TF).

I think the issue with this, is that users aren't the only thing editing the YAML. Admission controllers, scaling systems, etc., are all modifying resources, so there is potential for a lot of non-user drift in the configuration.

An ingress is an interesting example, in a cloud provider, if you provision an ingress, I would imagine it makes updates with information about the cloud's ingress system and would lead potentially to an update loop?

@paultyng
Copy link
Contributor

paultyng commented Oct 29, 2018

@lawrencegripper
Copy link
Author

lawrencegripper commented Oct 30, 2018

Damn - good spot, hadn't considered that.

I had a play this morning and I believe I have a solution which will work with MutatingAdmissionsControllers and other controllers updating the object. Happy to take questions on this as I went quite far down the rabbit hole and found it quite hard to explain.... here goes:

I loop through the fields set in the user provided YAML , for example looking at the field "image" and then pull the "image" field from the resource as it was returned to me by Kubernetes after creation in the cluster.

I then build a hash from these values at the time of creation. When TF does a Read to compare state it rebuilds this hash and compares it against the one stored in state at creation time and triggers an update if there is a diff. In the hash I ignore status and other service side fields along with any fields not specified by the user in their YAML. (Note: We can choose here whether to enforce matching on ALL non-server side fields or only compare those explicitly set).

For AdmissionsControllers, like this example mutating controller I wrote, my expectation is that, as the mutation occurs at the time of resource creation, the RESTClient will receive the mutated object back in the response. This allows TF to deal with a definition that will be mutated on deployment into the cluster as the hash would represent the mutated value.

To test this worked as expected I added a service into the YAML example with a type of LoadBalancer set. This will cause the cloud provider to assign an IP and set it in the Status object asynchronously. When just using the ResourceVersion this would have wrongly triggered an update from TF. With the new hash approach this works as expected, TF ignores the change to the status object.

The end result is, in my limited testing, it should now play nicely with a mutating admission controller and updates occurring to the status fields of the resource (or other fields not directly set by the user). I still use the ResourceVersion to optimize the process, only recalculating the YAML hash when there has been a change.

@paultyng Is this something that you would consider merging assuming I add the appropriate tests to prove out this behavior and update the docs accordingly?

@lawrencegripper
Copy link
Author

lawrencegripper commented Dec 3, 2018

Hi,

@paultyng and/or @alexsomesan did you have a chance to review the PR with the new changes? Could you give me a steer on the likely hood of this being merged?

Happy to add tweaks, testing and integration tests if this is something that will be merged.

@alexsomesan
Copy link
Member

@lawrencegripper Hi there and nice to meet you!
As Paul mentioned, the main reason why we are holding off on this feature right now is because we cannot provide a user experience consistent with other Terraform resources using the current provider API of Terraform. As Paul also mentioned, we want to wait for Terraform 0.12 to land and the updates to the provider SDK to stabilize a bit and then evaluate how they can help us to offer the right user experience.
When I am talking about UX in this case, what I want to say is that we don’t want to treat the manifest content as a single attribute and instead we would want to be able to present a TF native diffing behavior deep into the manifest, like the other TF resources do now. This includes the ability to run and create plans before apply and have those behave like users are expecting of TF.

On a different note, I will be at Kubecon in Seattle next week. If you happen to also be there, I would love to have a chat and discuss this and other things. Do let me know if you can.

@lawrencegripper
Copy link
Author

Makes sense - thanks for the reply and sorry for the slow response over the holidays.

What I might do in the meantime is spin PR off into a provider which people can use while that work is ongoing.

I'm based in the UK so didn't make it over to Kubecon this time round but let me know if your ever over in London

@jhoblitt
Copy link
Contributor

jhoblitt commented Jan 2, 2019

As the number of k8s resources continues to grow and usage of CRDs become more common, an "escape hatch" to be able to use almost arbitrary yaml configs is essentially mandatory.

I'm not a user of kustomize but it looks like it does attempt to do some ordering of yaml configs:
https://github.com/kubernetes-sigs/kustomize/blob/bd83773a1eb6a3853c831d2d9c393a29c0f95483/pkg/gvk/gvk.go#L68-L87 . As it is just yaml, it should be possible for a tf yaml resource to do some limited parsing and attempt to auto-determine dependencies.

@jhoblitt
Copy link
Contributor

jhoblitt commented Jan 2, 2019

@lawrencegripper it would be great if the proposed provider wasn't named kubernetes, so it could be used in parallel.

@pdecat
Copy link
Contributor

pdecat commented Jan 12, 2019

Hi @lawrencegripper, did you intentionally push that last change to the branch used for this PR?

@lawrencegripper
Copy link
Author

@pdecat Sorry - that was a mistake while working on pulling out the changes into an experimental interim provider while the work on a fully supported approach is under way by Hashicorp.

I'll close this one off now as @alexsomesan mentioned the official provider is taking a different route with the support of changes in TF - when this ships I'll mark the interim provider deprecated.

@lawrencegripper
Copy link
Author

For those still interested the fork has matured a bit with a larger set of tests and now has support for CRDs defined in YAML too. It's still an experiment so use with caution. https://github.com/lawrencegripper/terraform-provider-kubernetes-yaml

@pablokbs
Copy link

BTW the provider from lawrencegripper is no longer maintained and it's been replaced by this other fork

@ghost ghost locked and limited conversation to collaborators Apr 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to deploy CRD's
6 participants