Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-driven Terraform Configuration #4705

Closed
wants to merge 2 commits into from
Closed

Data-driven Terraform Configuration #4705

wants to merge 2 commits into from

Conversation

apparentlymart
Copy link
Contributor

@apparentlymart apparentlymart commented Jan 17, 2016

This is where I'm working on the implementation of the proposal from #4169.

(This PR supersedes #4961, and has been rebased onto dev-0.7 rather than master so it can build on the reworked plugin bits and type system changes.)

Since this change spans multiple Terraform layers, the sections that follow summarize the changes in each layer, in the hope of making this changeset easier to review. The PR is broken into a sequence of commits which, as far as possible, change only one layer at a time so that each change can be understood in isolation.

Configuration (config package)

In the config layer, data sources are introduced by expanding the existing Resource concept with a new field Mode, which represents which operations/lifecycle this resource follows:

  • ManagedResourceMode: previously the only mode; Terraform creates and "owns" this resource, updating its configuration and eventually destroying it.
  • DataResourceMode: Terraform only reads from this resource

In the configuration language, resource blocks map to ManagedResourceMode resources and data blocks map to DataResourceMode resources.

data blocks don't permit provisioner or lifecycle sub-blocks because these concepts do not make sense for a resource that only has a Refresh action. Internally, data resources always have an empty Provisioners slice and a zero-value ResourceLifecycle instance.

A similar extension has been made to ResourceVariable, which can now represent both the existing TYPE.NAME.ATTR variables and the new data.TYPE.NAME.ATTR variables, again using a Mode field as the discriminator.

Since both traditional resources and data resources are both kinds of resources, they both appear in the Resources slice within the configuration struct. The Resource.Id() implementation keeps them distinct by adding a data. prefix to data resource ids, which is a convention that will continue through to the core layer.

  • ResourceMode enumeration and Mode attribute on config.Resource
  • Parsing of data blocks from configuration files
  • Parsing of data.TYPE.NAME.ATTR variables and Mode attribute on config.ResourceVariable

Core changes

Within core is where we find the biggest divergence of codepaths for managed vs. data resources, since data resources have a simpler lifecycle.

The ResourceProvider interface has a new method DataSources, which is analogous to Resources. The Validate phase is consistent between the two, except that the provider abstraction distinguishes between ValidateResource and ValidateDataSource, both of which are supported by EvalValidate depending on mode.

The remainder of the workflow is completely distinct and handled by two different codepaths, switching on the resource mode inside terraform/transform_resource.go.

Even though ultimately data resources support only a "read" operation, the standard plan/apply model is supported by splitting a read into two steps in the ResourceProvider interface:

  • ReadDataDiff: takes the config and returns a diff as if the data resource were being "created", allowing core to know about the data source's computed attributes without actually reading any data.
  • ReadDataApply: takes the diff, uses it to obtain the configuration attributes, actually loads the data and returns a state.

The important special behavior for data resources is that during the "refresh" walk they will check to see if their config contains computed values, and if it doesn't then the diff/apply steps are run immediately, rather than waiting until the real plan and apply phases. This ensures that non-computed data source attributes can be safely used inside provider configurations, bypassing the chicken-and-egg problems that are caused by computed provider arguments.

A significant difference compared to managed resources is that a data source "read" does not get access to any previous state; we always create an entirely new instance on each refresh. The intended user-facing mental model for data resources is that they are not stateful at all, and we persist them in the on-disk state file only so that -refresh=false can act as expected without breaking the rest of the workflow.

  • ResourceProvider interface changes
  • Provider plugin stubs for these new methods
  • EvalValidate calls appropriate provider validate method based on resource mode.
  • ResourceStateKey understands how to deal with "orphan" data resources in the state.
  • graphNodeExpandedResource branches in EvalTree to support the different lifecycle for data resources.
    • On refresh, do the diff/apply early if configuration is not computed and update the state
    • On plan, record a creation diff if there is no state already recorded from the previous step
    • On apply, if creation diff is present then produce a populated instance state
  • graphNodeOrphanResource branches in EvalTree to support the different lifecycle for data resources.
    • On both refresh and apply, remove the resource from the state
  • Clean up data resource instances from state during terraform destroy (or applying a plan -destroy)

helper/schema support for data sources

In the helper/schema layer, the new map of supported data sources is kept separate from the existing map of supported resources. Data sources use the familiar schema.Resource type but with only a Read implementation required and Create, Update, and Delete functions forbidden.

The Read implementation works in essentially the same way as it does for managed resources, getting access to its configuration attributes via d.Get(...) and setting computed attributes with d.Set(...). The only notable differences are that d.Get(...) won't return values of computed attributes set on previous runs, and calling d.SetId(...) is optional.

To help us migrate existing "logical resources" to instead be data sources, a helper is provided to wrap a data source implementation and shim it to work as a resource implementation. In this case, the Read implementation must call d.SetId(...) in order to meet the expectations of a managed resource implementation.

  • DataSourcesMap within helper.Provider
  • Implementations of DataSources, ValidateData, ReadDataDiff and ReadDataApply
  • Backward-compatibility shim for using data sources as logical resources
    • Deprecation warning when using a resource-shimmed data source

provider/terraform: example remote state data source

As an example to show things working end-to-end, the terraform_remote_state resource is transformed into a data source, and the backward-compatibility shim is used to maintain the now-deprecated resource.

  • terraform_remote_state data source

Targeting Data Resources

ResourceAddress is extended with a ResourceMode to handle the distinct managed and data resource namespaces. data.TYPE.NAME can be used to target data resources, for consistency with how data resources are referenced elsewhere.

  • ResourceAddress support for data.TYPE.NAME syntax and ResourceMode.

UI Changes

When data resource reads appear in plan output, we show them using a distinct presentation to make it clear that no real infrastructure will be altered by this operation:

terraform-datasources-plan

Since a data resource read is internally just a "create" diff for the resource, this is just some sleight of hand in the UI layer to present it differently.

A "read" diff will appear only if the read operation cannot be completed during the "refresh" phase due to computed configuration.

  • Change diff output show "reads" differently
  • Hide the "(ID: ...)" suffix when refreshing data sources, since a data source doesn't get an id until after it is "refreshed".

Other stuff

  • User-oriented documentation
  • Prevent data resources from being tainted explicitly with terraform taint ("tainting" is not meaningful for data resources because they are not created/destroyed.)

This represents a data source configuration.
This allows the config loader to read "data" blocks from the config and
turn them into DataSource objects.

This just reads the data from the config file. It doesn't validate the
data nor do anything useful with it.
@phinze
Copy link
Contributor

phinze commented Feb 2, 2016

Hi @apparentlymart, this is looking good so far! I'm working on the Terraform / Vault integration which could really take advantage of Data Sources for several of its main data types.

So I'm checking in on your near-term plans are for this branch. If you don't expect to be picking it up soon, perhaps we can discuss me picking up where you left off?

Let me know! 😀

@apparentlymart
Copy link
Contributor Author

@phinze, @jen20: I checked off all the items on my list, so this is now feature-complete according to my original plan.

I have done some ad-hoc manual testing to exercise the various different combinations of computed/non-computed configs, dependent resources, dependent providers, etc.


Unfortunately the one situation that still doesn't seem to work is the very case that this feature was intended to solve:

data "null_data_source" "test" {
  inputs = {
    aws_region = "us-west-2"
  }
}

provider "aws" {
  region = "${data.null_data_source.test.outputs.aws_region}"
}

resource "aws_instance" "foo" {
  instance_type = "t1.foo"
  ami           = "ami-abc123"
}

In the above configuration, the aws provider configuration depends on the data resource. The intended behavior is that the data instance gets created before the aws provider is configured. However, this fails with the following error on terraform plan:

$ terraform plan
Error configuring: 1 error(s) occurred:

* no error reported by variable "test" is nil

This error seems to be occuring during the "input" walk; running with -input=false makes the error go away and the plan complete as expected:

$ terraform plan -input=false
Refreshing Terraform state prior to plan...

data.null_data_source.test: Refreshing state...

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ aws_instance.foo
    ami:                      "ami-abc123"
    availability_zone:        "<computed>"
    ebs_block_device.#:       "<computed>"
    ephemeral_block_device.#: "<computed>"
    instance_state:           "<computed>"
    instance_type:            "t1.foo"
    key_name:                 "<computed>"
    placement_group:          "<computed>"
    private_dns:              "<computed>"
    private_ip:               "<computed>"
    public_dns:               "<computed>"
    public_ip:                "<computed>"
    root_block_device.#:      "<computed>"
    security_groups.#:        "<computed>"
    source_dest_check:        "1"
    subnet_id:                "<computed>"
    tenancy:                  "<computed>"
    vpc_security_group_ids.#: "<computed>"


Plan: 2 to add, 0 to change, 0 to destroy.

I'm not sure this is really resolvable... the only viable path I can see would be to skip asking for inputs on a provider that has any computed configuration, but even that doesn't seem like it'd work since it is the interpolation itself that is failing. If you have any other ideas I'd love to hear them! 😀

@ghost
Copy link

ghost commented Apr 25, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants