Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Generator Plugins for Configuration Generation #3310

Closed
apparentlymart opened this issue Sep 23, 2015 · 16 comments
Closed

Proposal: Generator Plugins for Configuration Generation #3310

apparentlymart opened this issue Sep 23, 2015 · 16 comments

Comments

@apparentlymart
Copy link
Contributor

So far in my use of Terraform in my workplace I've uncovered a number of use-cases where it's useful to generate Terraform configurations based on outside data.

So far I've been handling that by having users run extra preparation steps before they run terraform plan, which write JSON-style Terraform configs into a directory. This has worked okay, but the need for extra steps means that the standard usage pattern of Terraform does not apply.

Based on what I've learned from generating Terraform configs, I'd like to offer a proposal for integrating the concept of configuration generation directly into Terraform. This is an attempt to "pave the cowpath" by taking a pattern that I've already successfully applied and building Terraform syntax around it.

Consider the following configuration as a motivating example:

// Load a directory full of files into an S3 bucket
resource "aws_s3_bucket_object" "file" {
    foreach "dir_entry" {
        path = "${path.module}/htdocs"
        recursive = true
        include_dirs = false
    }

    bucket = "${var.s3_bucket_name}"
    key = "${foreach.relative_path}"
    source = "${foreach.absolute_path}"
}

(This example is inspired by my terraform-s3-dir utility.)

The resource construct is extended with a new child block foreach, which is conceptually similar to count but rather than producing resources based on a sequence of numbers it produces resources based on executing a generator plugin, which in this example is dir_entry.

The contract for a generator plugin is to take the given input arguments and produce (essentially) a map[string]interface{}, where each map key is a hashable unique identifier for an object and the value is a structure that can be used from the ${foreach...} interpolation syntax.

Some details in the sections that follow...

Effect on terraform plan

As noted earlier, foreach is conceptually similar to count in that it causes one resource configuration block to produce multiple resource instances. In the case of count these are named like aws_s3_bucket_object.file.0. In the case offoreacha similar convention applies except that the index is replaced by the result of applying`hashcode.String`` to each item's key.

Since the resources are identified by a hash of their key, diffing can produce a sensible result as long as the keys remain consistent between runs. In the dir_entry example above the object key could be the same as the relative_path attribute value, so adding and removing files would cause the corresponding resource instances to be added and removed in the diff.

foreach configuration block

Since generation is a plan-time concept, the foreach block may only contain interpolations that are known statically, such as var, path, interpolation functions. It specifically may not reference resource or module attributes.

Otherwise the structure of the configuration block is under the control of the generator plugin, much as with provisioners.

${foreach...} interpolation syntax

The ${foreach...} interpolation syntax is again comparable to the ${count...} syntax, but the attributes within it depend on the structure returned by the generator function. Each generator is free to define its own set of attributes in an arbitrary hierarchical structure, just like resources can.

Generator plugin interface

Generator plugins have a similar interface to provisioner plugins, including the same configuration validation methods but with the Apply method replaced with Generate(*ResourceConfig) map[string]interface{} .

It is expected that a generator will produce tens of items at most, so returning the whole structure in-memory (rather than streaming it e.g. using a goroutine) should be sufficient. A generator producing hundreds or thousands of items would result in hundreds or thousands of Terraform resources, which I believe is already beyond Terraform's design assumptions.

Whereas provisioners are verbs, generators should be named as nouns describing what kind of items the generator produces, so that the declaration reads as (for example) "For each directory entry...".

Interaction with count

To avoid the combinatoric complexity that would result, using count and foreach together in the same resource block is not permitted.

Additional Example Use-cases

Some further examples of generators plugins that might be implemented, and what they could be used for...

DNS records from a standard zone file

resource "aws_route53_record" "foo" {
    foreach "dns_zone_record" {
        // A "zone file" per RFC1035
        zone = "${file('example.com.zone')}"
        // Disregard SOA records
        ignore_types = ["SOA"]
    }

    zone_id = "${var.route53_zone_id}"
    name = "${foreach.name}"
    type = "${foreach.type}"
    records = ["${foreach.records}"]
    ttl = "${foreach.ttl}"
}

Users (or indeed anything else) from a YAML file

resource "aws_iam_user" "user" {
    foreach "yaml_entry" {
        // Give a YAML document that either has a mapping or a list at its root,
        // to produce one resource instance per entry in that structure.
        source = "${file('users.yaml')}"
    }

    name = "${foreach.username}"
    path = "/staff/"
}

A custom external program for generating VPN routes

resource "aws_vpn_connection_route" "route" {
    foreach "local_exec_result" {
        // Any executable that produces a JSON object as output and
        // exits with a successful status.
        command = "${path.module}/generate-route-map"
    }

    destination_cidr_block = "${foreach.cidr_block}"
    vpn_connection_id = "${aws_vpn_connection.main.id}"
}
@apparentlymart
Copy link
Contributor Author

I am closing this because I think the same use-cases could be met in a much simpler way in conjunction with some other proposals I've made:

  • Implement Data-driven Terraform Configuration #4169 to allow Terraform to read data from various sources before planning.
  • Add a foreach meta-attribute to resources that takes directly an array or map of values and "fans out" to one resource instance for each array/map item. foreach may be interpolated with var and data dependencies, just like count.
  • Optionally implement Partial/Progressive Configuration Changes #4149 to allow dynamic mappings, at the expense of requiring multiple Terraform runs to successfully apply the configuration completely.

Here are two of the examples above, re-worked to fit this model:

// A new yaml_pairs interpolation function allows YAML parsing
resource "aws_iam_user" "user" {
    foreach = ["${yaml_entries(file("users.yaml"))}"]

    name = "${foreach.value.username}"
    path = "/staff/"
}
// A hypothetical "execute a local program" data-source, in conjunction with a JSON
// parsing function, supports the VPN route table example.
data "local_exec" "route_map" {
    command = "${path.module/generate-route-map}"
}
resource "aws_vpn_connection_route" "route" {
    // The generate-route-map program prints JSON to stdout,
    // which we can now parse.
    foreach = ["${json_entries(data.local_exec.route_map.stdout)}"]

    destination_cidr_block = "${foreach.value.cidr_block}"
    vpn_connection_id = "${aws_vpn_connection.main.id}"
}

However, I'm not going to make another issue for this right now since I've already created a rather substantial pyramid of aspirational architecture proposals, and so I'm going to leave this one for now and consider it again later based on the outcomes of those proposals.

@OJFord
Copy link
Contributor

OJFord commented Jan 14, 2017

Any further thoughts on this @apparentlymart?

As far as I can tell, your second example in above (not original) is possible today with data.external, but the first still isn't.

It's the "bulk upload to S3 without specifying everything" use-case that I'm particularly interested in. I nearly had it with count; then ran into it not allowing interpolation of resource attributes.

@apparentlymart
Copy link
Contributor Author

apparentlymart commented Jan 14, 2017

Hi @OJFord!

This is indeed not yet entirely possible. My time to work on "big stuff" in Terraform has been limited recently due to other life stuff taking priority, but I would still like to find a way to address these use-cases.

For the S3 use-case perhaps it might end up looking something like this, using similar building blocks as I described in my last comment above:

resource "aws_s3_bucket_object" "file" {
  foreach = "${deepfiles("${path.module}/htdocs")}"

  bucket = "${var.s3_bucket_name}"
  key = "${foreach.value}"
  source = "${path.module}/${foreach.value}"
}

The missing parts here are:

  • The foreach meta-attribute as I described before.
  • A new deepfiles interpolation function that returns the relative paths of all of the files descending from a given directory.

This case is simpler than some of the others because reading files from disk doesn't require reaching out to external services and so we can safely do it "statically" while loading the config, similarly to how file works. That means this example can get away with a limited foreach that doesn't support resource attributes, which avoids implementing #4149 before this would work.


If we added that deepfiles function then we could actually get this done with just count today I think, albeit with some performance drawbacks on deep trees due to repeatedly re-evaluating the function:

resource "aws_s3_bucket_object" "file" {
  count = "${length(deepfiles("${path.module}/htdocs"))}"

  bucket = "${var.s3_bucket_name}"
  key = "${element(deepfiles("${path.module}/htdocs"), count.index)}"
  source = "${path.module}/${element(deepfiles("${path.module}/htdocs"), count.index)}"
}

Could be interesting to implement that function first, since it'd probably be useful for other use-cases as well, and then if we add something like foreach later that would build on this by making the syntax more straightforward and ensuring that the function gets called only once.

@OJFord
Copy link
Contributor

OJFord commented Jan 14, 2017

Thanks for the reply!

My time to work on "big stuff" in Terraform has been limited recently due to other life stuff taking priority

As it should - no problem!

If we added that deepfiles function then we could actually get this done with just count today

Oh, that would do it for me if that works. The restriction on counts interpolation is only that it can't access other resources' attributes, then; not that it can't do anything non-constant?

@pmoust
Copy link
Contributor

pmoust commented May 11, 2017

Would you be open to re-opening this?
Related: #8573 #4410

I can see this being very valuable, albeit hard to implement safely and pre-1.0

@apparentlymart
Copy link
Contributor Author

The Terraform team (which I am now a part of, but wasn't when I made my earlier comments) is currently leaning towards the foreach solution I described in my most recent comment above, but in order for that to be useful we first need to fix some annoyances and limitations with how Terraform deals with lists and maps in general, so this has got folded into a general bucket of configuration language improvements.

At the time of writing it's just a set of use-cases rather than a plan, but we're planning to consider all of the use-cases together and see what language features make sense to cover them as best we can with as little complexity as possible.

So I'm going to leave this closed not because there's no plan to work on something like this, but because it represents a defunct plan that will be replaced with a new plan in future. We'll open new issues when there's a more concrete plan to work on these new features.

@pmoust
Copy link
Contributor

pmoust commented May 11, 2017

@apparentlymart awesome, thanks for the heads up.

@BerndWessels
Copy link

Please let us know here once there is something usable coming out, thanks.

@deitch
Copy link

deitch commented Jun 22, 2017

Is there a simple equivalent of deepfiles() ? I would take a simple one-level deep if I could, some way to say, "load all of the entries in this dir up to s3".

@deitch
Copy link

deitch commented Jun 22, 2017

Looking for some way to hack around this. If I could get an ls (or find) to run and saved in a var, I could then split that and use it as input into an aws_s3_bucket_object count(), which is the equivalent. But, yeah, a deepfiles() and/or foreach is worlds better.

@apparentlymart
Copy link
Contributor Author

There is currently no such thing built in, but the external data source provides a way to gather data using an external program, which you may be able to use here with some effort.

As an alternative for the S3 use-case in particular, you could consider instead actually generating configuration, using a tool like terraform-s3-dir. I've successfully used that tool to deploy several static webapps to S3, though indeed it does add an extra step to the workflow.

@deitch
Copy link

deitch commented Jun 22, 2017

@apparentlymart thanks. If there were no other way to do it, I was going to use external (I have used it in number of other places). I shied away from it here because it rigidly insists on json, which means I need to wrap the whole thing to make it work, and because it is so much heavier than a simple lsdir() if it were built in.

I saw terraform-s3-dir as well, but these files a small part of a larger tf module (for kubernetes, in case you are curious), which is used in a larger project. It isn't realistic to ask users to call a separate command before invoking tf for this small part.

external it is, then.

@inferiorhumanorgans
Copy link

@apparentlymart foreach and file globbing would be a much welcomed solution.

@gnydick
Copy link

gnydick commented Mar 6, 2018

I've been thinking about sunsetting the inheritance mechanism in the config language and treating them as simple boilerplates. I'd hand craft the base configs and just write some code that would effectively copy and paste with the differences. We have only simple use cases, so it'd be pretty trivial to abstract out the different config options.

@apparentlymart
Copy link
Contributor Author

Hi all,

For those who are still monitoring this old issue, I just wanted to note that the more recent plan is now discussed in #17179. That issue alone only covers the underlying mechanism of iterating over lists and maps in order to produce multiple resources, but with that building block in place adding functions for creating the lists and maps to iterate over should be reasonably straightforward.

With those functions added, this will be essentially what I'd proposed with the foreach attribute above, albeit renamed to for_each for consistency with our usual naming conventions.

If you'd like to follow the discussion and implementation of this feature, I'd suggest watching #17179.

@apparentlymart
Copy link
Contributor Author

Just to finally close this discussion out:

Today we released Terraform 0.12.8 which includes a new function fileset that can be used in a similar way to the hypothetical deepfiles I was discussing in earlier comments:

resource "aws_s3_bucket_object" "file" {
  for_each = fileset("${path.module}/htdocs", "**/*") # or a more constrained pattern, if you like

  bucket = var.s3_bucket_name
  key    = each.value
  source = "${path.module}/${each.value)}"
}

This also uses the new resource for_each feature, so the resulting instances will have addresses like aws_s3_bucket_object.file["index.html"]. That means that when you add and remove files, Terraform will be able to directly associate each file path with a specific instance and thus apply the necessary create, update, or delete action only to that one instance.

Some other interesting patterns are possible with these and other building blocks too, like differentiating between template files and other files to give the template files special treatment:

locals {
  all_files        = fileset("${path.module}/htdocs", "**/*")
  file_is_template = { for fn in local.all_files : fn => length(regexall(".tmpl$") > 0) }
  static_files     = toset([for fn, is_template in local.file_is_template : fn if !is_template])
  template_files   = {
    for fn, is_template in local.file_is_template :
    substr(fn, 0, length(fn)-5) => fn # rendered path maps to template path
    if !is_template
  }
}

resource "aws_s3_bucket_object" "static" {
  for_each = local.static_files

  bucket = var.s3_bucket_name
  key    = each.value
  source = "${path.module}/${each.value)}"
}

resource "aws_s3_bucket_object" "templated" {
  for_each = local.template_files

  bucket  = var.s3_bucket_name
  key     = each.key
  content = templatefile("${path.module}/${each.value)}", {
    any_key = "any_value",
  })
}

The templatefile call can include data from elsewhere in the configuration, allowing parts of the deployed file tree to include information determined dynamically by Terraform.

I'm sharing this here just in case those who were following this issue find it interesting to see how this work finally concluded -- a good example of why it's helpful to keep the use case separate from the proposed solution! 😀 But given how old this issue is and how long it's been closed, I'm going to lock it now to reflect that we don't intend to keep monitoring this issue. If you have any questions or find any problems with any of the features I've described here, please do feel free to either ask a question in the community forum or open a new issue in this repository and include the information requested in our issue template as best as you are able. Thanks!

@hashicorp hashicorp locked as resolved and limited conversation to collaborators Sep 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants