Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-document YAML files with yamldecode() #29729

Closed
nyurik opened this issue Oct 11, 2021 · 27 comments
Closed

Support multi-document YAML files with yamldecode() #29729

nyurik opened this issue Oct 11, 2021 · 27 comments

Comments

@nyurik
Copy link
Contributor

nyurik commented Oct 11, 2021

yamldecode causes an error if YAML contains more than one document, causing significant complications when working with Kubernetes. Please remove this restriction, returning a list of objects in case of a multidocument yaml. Alternatively, you could introduce a yamllistdecode() function that always assumes the yaml to be multidocument, and always returns a list of objects.

Current Terraform Version

Terraform v1.0.8
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.11.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.5.0
...

Use-cases

GIven an arbitrary Kubernetes YAML manifest, e.g. a CRD manifest downloaded from a site, one would be able to apply it using this code. Related stack overflow questions: 1 and 2

# DOES NOT WORK -- yamldecode() will fail on multi-document YAML 
resource "kubernetes_manifest" "crd" {
  # create a map of  {  "kind" -- "name"  =>  yaml  }
  for_each = {
    for value in yamldecode(data.http.crd.body) : "${value["kind"]}--${value["metadata"]["name"]}" => value
  }
  manifest = each.value
}

Attempted Solutions

This solution works ok, but is obviously very brittle with all the regex yaml manipulations

resource "kubernetes_manifest" "crd" {
  # Create a map { "kind--name" => yaml_doc } from the multi-document yaml text.
  # Each element is a separate kubernetes resource.
  # Must use \n---\n to avoid splitting on strings and comments containing "---".
  # YAML allows "---" to be the first and last line of a file, so make sure
  # raw yaml begins and ends with a newline.
  # The "---" can be followed by spaces, so need to remove those too.
  # Skip blocks that are empty or comments-only in case yaml began with a comment before "---".
  for_each = {
    for value in [
      for yaml in split(
        "\n---\n",
        "\n${replace(data.http.crd.body, "/(?m)^---[[:blank:]]*(#.*)?$/", "---")}\n"
      ) :
      yamldecode(yaml)
      if trimspace(replace(yaml, "/(?m)(^[[:blank:]]*(#.*)?$)+/", "")) != ""
    ] : "${value["kind"]}--${value["metadata"]["name"]}" => value
  }
  manifest = each.value
}

Proposal

Per above, allow yamldecode(...) to decode multi-document yaml files, or introduce an additional parameter or a new function to handle them.

See also

The 3rd party kubectl provider even introduced a dedicated kubectl_file_documents data source to handle this specific case.

@nyurik nyurik added enhancement new new issue not yet triaged labels Oct 11, 2021
@bflad bflad added the functions label Dec 2, 2021
@mvoitko
Copy link

mvoitko commented Jan 18, 2022

Would the teraform be open for a PR?

@nyurik
Copy link
Contributor Author

nyurik commented Feb 4, 2022

@mvoitko how difficult would it be to implement something like this? I think such simple functionality is likely to be accepted by the TF core team, but of course it could also result in a wasted efforts (hopefully not).

@crw
Copy link
Collaborator

crw commented Feb 5, 2022

Discussing and validating a potential solution (even if it seems fairly straight-forward on the surface) will increase the chances a PR gets accepted. I'd recommended writing up a description of the proposed changes before starting work, and verifying there aren't any hidden reasons the functionality exists in its current form. I hope this helps!

@AndreiBanaruTakeda
Copy link

Documentation on yamldecode() states clearly that:

Only one YAML document is permitted. If multiple documents are present in the given string then this function will return an error.

This contributor, tells a way to do it with locals, split function (based on --- separator of new document), and a count inside the kubernetes_manifest.

Your way is more refined when we look at the regex (spaces, comments that might be found around ---).

@feczo
Copy link

feczo commented Mar 30, 2022

You can use https://github.com/patrickdappollonio/kubectl-slice as a workaround

curl -sL https://github.com/patrickdappollonio/kubectl-slice/releases/download/v1.2.1/kubectl-slice_1.2.1_linux_x86_64.tar.gz | tar -xvzf -;
rm -rf slices hcl;
./kubectl-slice -f document.yaml -o slices 2>&1 | grep  -oP "Wrote \K.+yaml" | while read yamlfile; do echo 'yamldecode(file("'$yamlfile'"))' | terraform console >>hcl; done;
cat hcl

@nyurik
Copy link
Contributor Author

nyurik commented Mar 30, 2022

@AndreiBanaruTakeda that simple split example is not very good because it ignores a lot of corner cases. See the example above in the Attempted Solutions section - it also splits by --- but tries to handle many more cases. I went through many iterations with it, and I know there are still some edge cases that I am not handling - that's why I think it is important to implement this as part of the terraform core function.

@mvoitko
Copy link

mvoitko commented Apr 1, 2022

@feczo We already have a workaround I am talking about more robust solution.

@t7tran
Copy link

t7tran commented Jun 24, 2022

Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.

@mvoitko
Copy link

mvoitko commented Jun 29, 2022

Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.

Instead of one function, this workaround suggests using a new provider.

@brettjacobson
Copy link

It really feels like this provider should support this use case, without relying on another provider to parse the multi-YAML document

@nyurik
Copy link
Contributor Author

nyurik commented Jun 29, 2022

@brettjacobson we are not talking about "another provider" -- instead this issue is about the built-in terraform function to parse YAML, without any specific provider's support.

@levmel
Copy link

levmel commented Oct 6, 2022

terraform-multidecoder-yaml_json

Access multiple YAML and/or JSON files with their relative paths in one step.

Documentantion can be found here:

GitHub:
https://github.com/levmel/terraform-multidecoder-yaml_json

Terraform Registry:
https://registry.terraform.io/modules/levmel/yaml_json/multidecoder/latest

Usage

Place this module in the location where you need to access multiple different YAML and/or JSON files (different paths possible) and pass
your path/-s in the parameter filepaths which takes a set of strings of the relative paths of YAML and/or JSON files as an argument. You can change the module name if you want!

module "yaml_json_decoder" {
  source  = "levmel/yaml_json/multidecoder"
  version = "0.2.1"
  filepaths = ["routes/nsg_rules.yml", "failover/cosmosdb.json", "network/private_endpoints/*.yaml", "network/private_links/config_file.yml", "network/private_endpoints/*.yml", "pipeline/config/*.json"]
}

Patterns to access YAML and/or JSON files from relative paths:

To be able to access all YAML and/or JSON files in a folder entern your path as follows "folder/rest_of_folders/*.yaml", "folder/rest_of_folders/*.yml" or "folder/rest_of_folders/*.json".

To be able to access a specific YAML and/or a JSON file in a folder structure use this "folder/rest_of_folders/name_of_yaml.yaml", "folder/rest_of_folders/name_of_yaml.yml" or "folder/rest_of_folders/name_of_yaml.json"

If you like to select all YAML and/or JSON files within a folder, then you should use "*.yml", "*.yaml", "*.json" format notation. (see above in the USAGE section)

YAML delimiter support is available from version 0.1.0!

WARNING: Only the relative path must be specified. The path.root (it is included in the module by default) should not be passed, but everything after it.

Access YAML and JSON entries

Now you can access all entries within all the YAML and/or JSON files you've selected like that: "module.yaml_json_decoder.files.[name of your YAML or JSON file].entry". If the name of your YAML or JSON file is "name_of_your_config_file" then access it as follows "module.yaml_json_decoder.files.name_of_your_config_file.entry".

Example of multi YAML and JSON file accesses from different paths (directories)

first YAML file:

routes/nsg_rules.yml

rdp:
  name: rdp
  priority: 80
  direction: Inbound
  access: Allow
  protocol: Tcp
  source_port_range: "*"
  destination_port_range: 3399
  source_address_prefix: VirtualNetwork
  destination_address_prefix: "*"
  
---
  
ssh:
  name: ssh
  priority: 70
  direction: Inbound
  access: Allow
  protocol: Tcp
  source_port_range: "*"
  destination_port_range: 24
  source_address_prefix: VirtualNetwork
  destination_address_prefix: "*"

second YAML file:

services/logging/monitoring.yml

application_insights:
  application_type: other
  retention_in_days: 30
  daily_data_cap_in_gb: 20
  daily_data_cap_notifications_disabled: true
  logs:
  # Optional fields
   - "AppMetrics"
   - "AppAvailabilityResults"
   - "AppEvents"
   - "AppDependencies"
   - "AppBrowserTimings"
   - "AppExceptions"
   - "AppExceptions"
   - "AppPerformanceCounters"
   - "AppRequests"
   - "AppSystemEvents"
   - "AppTraces"

first JSON file:

test/config/json_history.json

{
    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
						"GlossSeeAlso": ["GML", "XML"]
                    },
					"GlossSee": "markup"
                }
            }
        }
    }
}

main.tf

module "yaml_json_multidecoder" {
  source  = "levmel/yaml_json/multidecoder"
  version = "0.2.1"
  filepaths = ["routes/nsg_rules.yml", "services/logging/monitoring.yml", test/config/*.json]
}

output "nsg_rules_entry" {
  value = module.yaml_json_multidecoder.files.nsg_rules.aks.ssh.source_address_prefix
}

output "application_insights_entry" {
  value = module.yaml_json_multidecoder.files.monitoring.application_insights.daily_data_cap_in_gb
}

output "json_history" {
  value = module.yaml_json_multidecoder.files.json_history.glossary.title
}

Changes to Outputs:

  • nsg_rules_entry = "VirtualNetwork"
  • application_insights_entry = 20
  • json_history = "example glossary"

@brandongallagher-tag
Copy link

brandongallagher-tag commented Oct 6, 2022

@levmel Does this support multiple yaml objects in the same yaml file, delimited by ---?

key: value
---
key: value

Then parses 2 YAML objects?

@levmel
Copy link

levmel commented Oct 6, 2022

@brandongallagher-tag Not yet to be honest, because I split up my configuration always logically into different config files. That is why I never use delimiters in my config. I can add it to the next release.

For now it supports multiple YAML objects that are located in the same file, but without the delimiter.

Edit:
I added delimiter support and jsondecode support in the newest version 0.2.1. Check out my GitHub repo. :) Cheers! Btw. I updated my previous comment completely.

@dcshiman
Copy link

dcshiman commented Dec 1, 2022

Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.

i think this is the best solution, my implementation tin install kubectl documents with multi document yaml is as follows

data "http" "yaml_file" {
  url = "https://path.to.yaml.file.yaml"
}

data "kubectl_file_documents" "docs" {
  content = data.http.yaml_file.response_body
}

locals {
  yaml_file = [
    for v in data.kubectl_file_documents.docs.documents : {
      data : yamldecode(v)
      content : v
    }
  ]
}

resource "kubectl_manifest" "install" {
  for_each = {
    for v in local.yaml_file : lower(join("/", compact([
      v.data.apiVersion,
      v.data.kind,
      lookup(lookup(v.data, "metadata", {}), "namespace", ""),
      lookup(lookup(v.data, "metadata", {}), "name", "")
    ]))) => v.content
  }
  yaml_body  = each.value
}

But this is to apply kubectl files, not for general usecases

@avbenavides
Copy link

I found a way to do it w/o any additional providers. Let me know if it works for your particular use case.

# Assign to a local after splitting by "---"
locals { my_manifests = split("---", templatefile("multiple-manifests.yaml") }

# Iterate by mapping with a range
resource "kubernetes_manifest" "many_objects" {
  for_each = zipmap(range(0,length(local.my_manifests)),local.my_manifests)
  manifest = yamldecode(each.value)
}

@joaocc
Copy link
Contributor

joaocc commented Oct 4, 2023

Hi. I was facing this exact question at this moment, when trying to process (via templates) multiple kubernetes files, converting them to HCL map/objects/...
The workarounds that are proposed seem to work on the majority of cases, but will probably fail if we have "---" ocurring anywhere on the files or, even worse, if we use heredocs with embedded multi-yaml files (in which case the heredoc-ed YAML will also get split, creating a mess).
For a more robust handling, something needs to convert to convert to multidocs (like the kubectl_file_documents provider mentioned above proposes to do). Unfortunately gavinbunney/kubectl seems to be abandoned and the fork that currently seems more active and volunteering to fix some issues (alekc/kubectl) still has not managed to address pending ones, making this difficult to recommend as a general solution.

@tvildo
Copy link

tvildo commented Nov 29, 2023

Following algorithm works well for me:

  1. Detect and replace document separator with SPECIAL_TOKEN="YAMLSEPARATOR"
  2. Split string on special token
  3. We need to remove invalid strings which can contain only comments: For that we try to dceode YAML.
  4. At the end we have to compact array to remove empty strings
  5. We will have separate yaml documents as array
locals {
  yamlstrings= compact([ for s in split("YAMLSEPARATOR",replace(file("${path.module}/files/standard-install.yaml"), "/(?m:^---$)/", "YAMLSEPARATOR")) : 
          try (yamlencode(yamldecode(s)), "")
      ])
}

@elasticdotventures
Copy link

elasticdotventures commented Jan 24, 2024

Please don't propose to address this by splitting multipart documents using antipatterns such as a string split "---".

Yes - maybe that solves your immediate need today, and maybe it works fine for 99% of the cases, but what about the tomorrow case, where your HCL falls into the 1% .. where the string split solution doesn't work, ..

such as where the strings --- are embedded (but properly escaped) in the file body, such as having markdown tables "|---|---|" in the YAML, or a multiplart MIME (ex: PEM encoded public certificates ---- BEGIN CERTIFICATE ---) inside the YAML in a text block.

String split is an antipattern, don't use that, your program will break in the future in a non-obvious way, possibly at a critical moment

@nyurik
Copy link
Contributor Author

nyurik commented Jan 24, 2024

@elasticdotventures its a hacky workaround to a pressing problem that unfortunately was never addressed. The community clearly needs a solution to this, and in the mean time... uses horribly-unstable crutches - simply because bad solution is better than no solution.

@crw
Copy link
Collaborator

crw commented Mar 7, 2024

Thank you for your continued interest in this issue.

Terraform version 1.8 launches with support of provider-defined functions. It is now possible to implement your own functions! We would love to see this implemented as a provider-defined function.

Please see the provider-defined functions documentation to learn how to implement functions in your providers. If you are new to provider development, learn how to create a new provider with the Terraform Plugin Framework. If you have any questions, please visit the Terraform Plugin Development category in our official forum.

We hope this feature unblocks future function development and provides more flexibility for the Terraform community. Thank you for your continued support of Terraform!

@red8888
Copy link

red8888 commented Mar 21, 2024

@crw I read through some of the documentation but the docs don't really indicate what this looks like from the user's perspective. They really need a full example going from "here's your provider code" => "here's the actual HCL using that provider".

Do you build a custom function into your provider and it becomes available in the global HCL namespace?

So I could create a "better yaml" provider, users add it like any other provider (provider "better-yaml" {}), and then the function betteryamldecode() is just available to them like any native HCL function?

I think the docs need to explain this better- maybe I'm dumb though lol.

@crw
Copy link
Collaborator

crw commented Mar 21, 2024

@red8888 tutorials are being developed, and should be available "soon." The reference docs do not have really have usage information, but check out the usage docs for a random function I happened to have open, it will show naming conventions and usage conventions:

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/functions/arn_build

@jrhouston
Copy link
Contributor

We are working on a provider defined function for this particular use-case in the Kubernetes provider here: hashicorp/terraform-provider-kubernetes#2428.

@apparentlymart
Copy link
Contributor

The hashicorp/kubernetes provider now offers a function named manifest_decode_multi that should meet the originally-described use-case.

To use it you'll need to be using Terraform v1.8.0 or later (because that was the first release to support provider-contributed functions) and hashicorp/kubernetes v2.28.0 or later (because that's when the functions were first introduced).

Here's an adaptation of the example in the original issue comment to use the new function:

terraform {
  required_version = ">= 1.8.0"

  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.28.0"
    }
  }
}

locals {
  crds = provider::kubernetes::manifest_decode_multi(data.http.crd.body)
}

resource "kubernetes_manifest" "crd" {
  for_each = {
    for manifest in local.crds :
    "${manifest.kind}--${manifest.metadata.name}" => manifest
  }

  manifest = each.value
}

The reported use-case therefore seems to be met now, albeit using a Kubernetes provider feature instead of a builtin. Terraform's built-in functions aim to meet broad use-cases; multi-document YAML is less common but broadly used in the Kubernetes community, so the Kubernetes provider is a reasonable home for that functionality when used in conjunction with Kubernetes.

If there are other prominent use-cases for multi-document YAML beyond Kubernetes then it might be worthwhile to have a separate functions-only provider that's focused primarily on YAML without any Kubernetes-specific assumptions, but since all of the examples in this issue were Kubernetes-related we'll wait to see if that generalization is warranted.

Thanks!

@mdesouky
Copy link

@dcshiman thanks, tried this yesterday and worked like a charm 👍🏼

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests