Support multi-document YAML files with yamldecode() #29729

nyurik · 2021-10-11T16:24:07Z

yamldecode causes an error if YAML contains more than one document, causing significant complications when working with Kubernetes. Please remove this restriction, returning a list of objects in case of a multidocument yaml. Alternatively, you could introduce a yamllistdecode() function that always assumes the yaml to be multidocument, and always returns a list of objects.

Current Terraform Version

Terraform v1.0.8
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.11.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.5.0
...

Use-cases

GIven an arbitrary Kubernetes YAML manifest, e.g. a CRD manifest downloaded from a site, one would be able to apply it using this code. Related stack overflow questions: 1 and 2

# DOES NOT WORK -- yamldecode() will fail on multi-document YAML 
resource "kubernetes_manifest" "crd" {
  # create a map of  {  "kind" -- "name"  =>  yaml  }
  for_each = {
    for value in yamldecode(data.http.crd.body) : "${value["kind"]}--${value["metadata"]["name"]}" => value
  }
  manifest = each.value
}

Attempted Solutions

This solution works ok, but is obviously very brittle with all the regex yaml manipulations

resource "kubernetes_manifest" "crd" {
  # Create a map { "kind--name" => yaml_doc } from the multi-document yaml text.
  # Each element is a separate kubernetes resource.
  # Must use \n---\n to avoid splitting on strings and comments containing "---".
  # YAML allows "---" to be the first and last line of a file, so make sure
  # raw yaml begins and ends with a newline.
  # The "---" can be followed by spaces, so need to remove those too.
  # Skip blocks that are empty or comments-only in case yaml began with a comment before "---".
  for_each = {
    for value in [
      for yaml in split(
        "\n---\n",
        "\n${replace(data.http.crd.body, "/(?m)^---[[:blank:]]*(#.*)?$/", "---")}\n"
      ) :
      yamldecode(yaml)
      if trimspace(replace(yaml, "/(?m)(^[[:blank:]]*(#.*)?$)+/", "")) != ""
    ] : "${value["kind"]}--${value["metadata"]["name"]}" => value
  }
  manifest = each.value
}

Proposal

Per above, allow yamldecode(...) to decode multi-document yaml files, or introduce an additional parameter or a new function to handle them.

terraform-multidecoder-yaml_json

Access multiple YAML and/or JSON files with their relative paths in one step.

Documentantion can be found here:

GitHub:
https://github.com/levmel/terraform-multidecoder-yaml_json

Terraform Registry:
https://registry.terraform.io/modules/levmel/yaml_json/multidecoder/latest

Usage

Place this module in the location where you need to access multiple different YAML and/or JSON files (different paths possible) and pass
your path/-s in the parameter filepaths which takes a set of strings of the relative paths of YAML and/or JSON files as an argument. You can change the module name if you want!

module "yaml_json_decoder" {
  source  = "levmel/yaml_json/multidecoder"
  version = "0.2.1"
  filepaths = ["routes/nsg_rules.yml", "failover/cosmosdb.json", "network/private_endpoints/*.yaml", "network/private_links/config_file.yml", "network/private_endpoints/*.yml", "pipeline/config/*.json"]
}

Patterns to access YAML and/or JSON files from relative paths:

To be able to access all YAML and/or JSON files in a folder entern your path as follows "folder/rest_of_folders/*.yaml", "folder/rest_of_folders/*.yml" or "folder/rest_of_folders/*.json".

To be able to access a specific YAML and/or a JSON file in a folder structure use this "folder/rest_of_folders/name_of_yaml.yaml", "folder/rest_of_folders/name_of_yaml.yml" or "folder/rest_of_folders/name_of_yaml.json"

If you like to select all YAML and/or JSON files within a folder, then you should use "*.yml", "*.yaml", "*.json" format notation. (see above in the USAGE section)

YAML delimiter support is available from version 0.1.0!

WARNING: Only the relative path must be specified. The path.root (it is included in the module by default) should not be passed, but everything after it.

Access YAML and JSON entries

Now you can access all entries within all the YAML and/or JSON files you've selected like that: "module.yaml_json_decoder.files.[name of your YAML or JSON file].entry". If the name of your YAML or JSON file is "name_of_your_config_file" then access it as follows "module.yaml_json_decoder.files.name_of_your_config_file.entry".

Example of multi YAML and JSON file accesses from different paths (directories)

first YAML file:

routes/nsg_rules.yml

rdp:
  name: rdp
  priority: 80
  direction: Inbound
  access: Allow
  protocol: Tcp
  source_port_range: "*"
  destination_port_range: 3399
  source_address_prefix: VirtualNetwork
  destination_address_prefix: "*"
  
---
  
ssh:
  name: ssh
  priority: 70
  direction: Inbound
  access: Allow
  protocol: Tcp
  source_port_range: "*"
  destination_port_range: 24
  source_address_prefix: VirtualNetwork
  destination_address_prefix: "*"

second YAML file:

services/logging/monitoring.yml

application_insights:
  application_type: other
  retention_in_days: 30
  daily_data_cap_in_gb: 20
  daily_data_cap_notifications_disabled: true
  logs:
  # Optional fields
   - "AppMetrics"
   - "AppAvailabilityResults"
   - "AppEvents"
   - "AppDependencies"
   - "AppBrowserTimings"
   - "AppExceptions"
   - "AppExceptions"
   - "AppPerformanceCounters"
   - "AppRequests"
   - "AppSystemEvents"
   - "AppTraces"

first JSON file:

test/config/json_history.json

{
    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
						"GlossSeeAlso": ["GML", "XML"]
                    },
					"GlossSee": "markup"
                }
            }
        }
    }
}

main.tf

module "yaml_json_multidecoder" {
  source  = "levmel/yaml_json/multidecoder"
  version = "0.2.1"
  filepaths = ["routes/nsg_rules.yml", "services/logging/monitoring.yml", test/config/*.json]
}

output "nsg_rules_entry" {
  value = module.yaml_json_multidecoder.files.nsg_rules.aks.ssh.source_address_prefix
}

output "application_insights_entry" {
  value = module.yaml_json_multidecoder.files.monitoring.application_insights.daily_data_cap_in_gb
}

output "json_history" {
  value = module.yaml_json_multidecoder.files.json_history.glossary.title
}

Changes to Outputs:

nsg_rules_entry = "VirtualNetwork"
application_insights_entry = 20
json_history = "example glossary"

brandongallagher-tag · 2022-10-06T14:41:57Z

@levmel Does this support multiple yaml objects in the same yaml file, delimited by ---?

key: value
---
key: value

Then parses 2 YAML objects?

levmel · 2022-10-06T15:11:46Z

@brandongallagher-tag Not yet to be honest, because I split up my configuration always logically into different config files. That is why I never use delimiters in my config. I can add it to the next release.

For now it supports multiple YAML objects that are located in the same file, but without the delimiter.

Edit:
I added delimiter support and jsondecode support in the newest version 0.2.1. Check out my GitHub repo. :) Cheers! Btw. I updated my previous comment completely.

dcshiman · 2022-12-01T04:48:25Z

Instead of string splitting, a nice way of traversing multi-doc yaml https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/data-sources/kubectl_file_documents.

i think this is the best solution, my implementation tin install kubectl documents with multi document yaml is as follows

data "http" "yaml_file" {
  url = "https://path.to.yaml.file.yaml"
}

data "kubectl_file_documents" "docs" {
  content = data.http.yaml_file.response_body
}

locals {
  yaml_file = [
    for v in data.kubectl_file_documents.docs.documents : {
      data : yamldecode(v)
      content : v
    }
  ]
}

resource "kubectl_manifest" "install" {
  for_each = {
    for v in local.yaml_file : lower(join("/", compact([
      v.data.apiVersion,
      v.data.kind,
      lookup(lookup(v.data, "metadata", {}), "namespace", ""),
      lookup(lookup(v.data, "metadata", {}), "name", "")
    ]))) => v.content
  }
  yaml_body  = each.value
}

But this is to apply kubectl files, not for general usecases

avbenavides · 2023-09-22T12:00:55Z

I found a way to do it w/o any additional providers. Let me know if it works for your particular use case.

# Assign to a local after splitting by "---"
locals { my_manifests = split("---", templatefile("multiple-manifests.yaml") }

# Iterate by mapping with a range
resource "kubernetes_manifest" "many_objects" {
  for_each = zipmap(range(0,length(local.my_manifests)),local.my_manifests)
  manifest = yamldecode(each.value)
}

joaocc · 2023-10-04T20:12:39Z

Hi. I was facing this exact question at this moment, when trying to process (via templates) multiple kubernetes files, converting them to HCL map/objects/...
The workarounds that are proposed seem to work on the majority of cases, but will probably fail if we have "---" ocurring anywhere on the files or, even worse, if we use heredocs with embedded multi-yaml files (in which case the heredoc-ed YAML will also get split, creating a mess).
For a more robust handling, something needs to convert to convert to multidocs (like the kubectl_file_documents provider mentioned above proposes to do). Unfortunately gavinbunney/kubectl seems to be abandoned and the fork that currently seems more active and volunteering to fix some issues (alekc/kubectl) still has not managed to address pending ones, making this difficult to recommend as a general solution.

tvildo · 2023-11-29T14:17:52Z

Following algorithm works well for me:

Detect and replace document separator with SPECIAL_TOKEN="YAMLSEPARATOR"
Split string on special token
We need to remove invalid strings which can contain only comments: For that we try to dceode YAML.
At the end we have to compact array to remove empty strings
We will have separate yaml documents as array

locals {
  yamlstrings= compact([ for s in split("YAMLSEPARATOR",replace(file("${path.module}/files/standard-install.yaml"), "/(?m:^---$)/", "YAMLSEPARATOR")) : 
          try (yamlencode(yamldecode(s)), "")
      ])
}

elasticdotventures · 2024-01-24T21:58:54Z

Please don't propose to address this by splitting multipart documents using antipatterns such as a string split "---".

Yes - maybe that solves your immediate need today, and maybe it works fine for 99% of the cases, but what about the tomorrow case, where your HCL falls into the 1% .. where the string split solution doesn't work, ..

such as where the strings --- are embedded (but properly escaped) in the file body, such as having markdown tables "|---|---|" in the YAML, or a multiplart MIME (ex: PEM encoded public certificates ---- BEGIN CERTIFICATE ---) inside the YAML in a text block.

String split is an antipattern, don't use that, your program will break in the future in a non-obvious way, possibly at a critical moment

nyurik · 2024-01-24T22:04:40Z

@elasticdotventures its a hacky workaround to a pressing problem that unfortunately was never addressed. The community clearly needs a solution to this, and in the mean time... uses horribly-unstable crutches - simply because bad solution is better than no solution.

crw · 2024-03-07T00:38:34Z

Thank you for your continued interest in this issue.

Terraform version 1.8 launches with support of provider-defined functions. It is now possible to implement your own functions! We would love to see this implemented as a provider-defined function.

Please see the provider-defined functions documentation to learn how to implement functions in your providers. If you are new to provider development, learn how to create a new provider with the Terraform Plugin Framework. If you have any questions, please visit the Terraform Plugin Development category in our official forum.

We hope this feature unblocks future function development and provides more flexibility for the Terraform community. Thank you for your continued support of Terraform!

red8888 · 2024-03-21T14:11:16Z

@crw I read through some of the documentation but the docs don't really indicate what this looks like from the user's perspective. They really need a full example going from "here's your provider code" => "here's the actual HCL using that provider".

Do you build a custom function into your provider and it becomes available in the global HCL namespace?

So I could create a "better yaml" provider, users add it like any other provider (provider "better-yaml" {}), and then the function betteryamldecode() is just available to them like any native HCL function?

I think the docs need to explain this better- maybe I'm dumb though lol.

crw · 2024-03-21T16:15:14Z

@red8888 tutorials are being developed, and should be available "soon." The reference docs do not have really have usage information, but check out the usage docs for a random function I happened to have open, it will show naming conventions and usage conventions:

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/functions/arn_build

jrhouston · 2024-03-28T03:16:33Z

We are working on a provider defined function for this particular use-case in the Kubernetes provider here: hashicorp/terraform-provider-kubernetes#2428.

apparentlymart · 2024-05-29T22:35:24Z

The hashicorp/kubernetes provider now offers a function named manifest_decode_multi that should meet the originally-described use-case.

To use it you'll need to be using Terraform v1.8.0 or later (because that was the first release to support provider-contributed functions) and hashicorp/kubernetes v2.28.0 or later (because that's when the functions were first introduced).

Here's an adaptation of the example in the original issue comment to use the new function:

terraform {
  required_version = ">= 1.8.0"

  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.28.0"
    }
  }
}

locals {
  crds = provider::kubernetes::manifest_decode_multi(data.http.crd.body)
}

resource "kubernetes_manifest" "crd" {
  for_each = {
    for manifest in local.crds :
    "${manifest.kind}--${manifest.metadata.name}" => manifest
  }

  manifest = each.value
}

The reported use-case therefore seems to be met now, albeit using a Kubernetes provider feature instead of a builtin. Terraform's built-in functions aim to meet broad use-cases; multi-document YAML is less common but broadly used in the Kubernetes community, so the Kubernetes provider is a reasonable home for that functionality when used in conjunction with Kubernetes.

If there are other prominent use-cases for multi-document YAML beyond Kubernetes then it might be worthwhile to have a separate functions-only provider that's focused primarily on YAML without any Kubernetes-specific assumptions, but since all of the examples in this issue were Kubernetes-related we'll wait to see if that generalization is warranted.

Thanks!

mdesouky · 2024-05-29T23:48:05Z

@dcshiman thanks, tried this yesterday and worked like a charm 👍🏼

github-actions · 2024-06-29T02:06:32Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

nyurik added enhancement new new issue not yet triaged labels Oct 11, 2021

bflad added the functions label Dec 2, 2021

crw added the provider/kubernetes label Feb 3, 2022

joaocc mentioned this issue Oct 4, 2023

Add method to load multi-yaml document cloudposse/terraform-provider-utils#331

Open

c4milo mentioned this issue Jan 25, 2024

Support decoding multidoc YAML files zclconf/go-cty-yaml#12

Open

jrhouston mentioned this issue Mar 28, 2024

Add provider defined functions for encoding and decoding Kubernetes manifests hashicorp/terraform-provider-kubernetes#2428

Merged

19 tasks

apparentlymart closed this as completed May 29, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jun 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-document YAML files with yamldecode() #29729

Support multi-document YAML files with yamldecode() #29729

nyurik commented Oct 11, 2021 •

edited

Loading

mvoitko commented Jan 18, 2022

nyurik commented Feb 4, 2022

crw commented Feb 5, 2022

AndreiBanaruTakeda commented Mar 30, 2022

feczo commented Mar 30, 2022 •

edited

Loading

nyurik commented Mar 30, 2022

mvoitko commented Apr 1, 2022

t7tran commented Jun 24, 2022

mvoitko commented Jun 29, 2022

brettjacobson commented Jun 29, 2022

nyurik commented Jun 29, 2022 •

edited

Loading

levmel commented Oct 6, 2022 •

edited

Loading

brandongallagher-tag commented Oct 6, 2022 •

edited

Loading

levmel commented Oct 6, 2022 •

edited

Loading

dcshiman commented Dec 1, 2022 •

edited

Loading

avbenavides commented Sep 22, 2023

joaocc commented Oct 4, 2023

tvildo commented Nov 29, 2023 •

edited

Loading

elasticdotventures commented Jan 24, 2024 •

edited

Loading

nyurik commented Jan 24, 2024

crw commented Mar 7, 2024

red8888 commented Mar 21, 2024

crw commented Mar 21, 2024

jrhouston commented Mar 28, 2024

apparentlymart commented May 29, 2024

mdesouky commented May 29, 2024

github-actions bot commented Jun 29, 2024

Support multi-document YAML files with yamldecode() #29729

Support multi-document YAML files with yamldecode() #29729

Comments

nyurik commented Oct 11, 2021 • edited Loading

Current Terraform Version

Use-cases

Attempted Solutions

Proposal

See also

mvoitko commented Jan 18, 2022

nyurik commented Feb 4, 2022

crw commented Feb 5, 2022

AndreiBanaruTakeda commented Mar 30, 2022

feczo commented Mar 30, 2022 • edited Loading

nyurik commented Mar 30, 2022

mvoitko commented Apr 1, 2022

t7tran commented Jun 24, 2022

mvoitko commented Jun 29, 2022

brettjacobson commented Jun 29, 2022

nyurik commented Jun 29, 2022 • edited Loading

levmel commented Oct 6, 2022 • edited Loading

terraform-multidecoder-yaml_json

Usage

Patterns to access YAML and/or JSON files from relative paths:

YAML delimiter support is available from version 0.1.0!

Access YAML and JSON entries

Example of multi YAML and JSON file accesses from different paths (directories)

first YAML file:

second YAML file:

first JSON file:

brandongallagher-tag commented Oct 6, 2022 • edited Loading

levmel commented Oct 6, 2022 • edited Loading

dcshiman commented Dec 1, 2022 • edited Loading

avbenavides commented Sep 22, 2023

joaocc commented Oct 4, 2023

tvildo commented Nov 29, 2023 • edited Loading

elasticdotventures commented Jan 24, 2024 • edited Loading

nyurik commented Jan 24, 2024

crw commented Mar 7, 2024

red8888 commented Mar 21, 2024

crw commented Mar 21, 2024

jrhouston commented Mar 28, 2024

apparentlymart commented May 29, 2024

mdesouky commented May 29, 2024

github-actions bot commented Jun 29, 2024

nyurik commented Oct 11, 2021 •

edited

Loading

feczo commented Mar 30, 2022 •

edited

Loading

nyurik commented Jun 29, 2022 •

edited

Loading

levmel commented Oct 6, 2022 •

edited

Loading

brandongallagher-tag commented Oct 6, 2022 •

edited

Loading

levmel commented Oct 6, 2022 •

edited

Loading

dcshiman commented Dec 1, 2022 •

edited

Loading

tvildo commented Nov 29, 2023 •

edited

Loading

elasticdotventures commented Jan 24, 2024 •

edited

Loading