Data-driven Terraform Configuration #4705

apparentlymart · 2016-01-17T00:24:26Z

This is where I'm working on the implementation of the proposal from #4169.

(This PR supersedes #4961, and has been rebased onto dev-0.7 rather than master so it can build on the reworked plugin bits and type system changes.)

Since this change spans multiple Terraform layers, the sections that follow summarize the changes in each layer, in the hope of making this changeset easier to review. The PR is broken into a sequence of commits which, as far as possible, change only one layer at a time so that each change can be understood in isolation.

Configuration (`config` package)

In the config layer, data sources are introduced by expanding the existing Resource concept with a new field Mode, which represents which operations/lifecycle this resource follows:

ManagedResourceMode: previously the only mode; Terraform creates and "owns" this resource, updating its configuration and eventually destroying it.
DataResourceMode: Terraform only reads from this resource

In the configuration language, resource blocks map to ManagedResourceMode resources and data blocks map to DataResourceMode resources.

data blocks don't permit provisioner or lifecycle sub-blocks because these concepts do not make sense for a resource that only has a Refresh action. Internally, data resources always have an empty Provisioners slice and a zero-value ResourceLifecycle instance.

A similar extension has been made to ResourceVariable, which can now represent both the existing TYPE.NAME.ATTR variables and the new data.TYPE.NAME.ATTR variables, again using a Mode field as the discriminator.

Since both traditional resources and data resources are both kinds of resources, they both appear in the Resources slice within the configuration struct. The Resource.Id() implementation keeps them distinct by adding a data. prefix to data resource ids, which is a convention that will continue through to the core layer.

ResourceMode enumeration and Mode attribute on config.Resource
Parsing of data blocks from configuration files
Parsing of data.TYPE.NAME.ATTR variables and Mode attribute on config.ResourceVariable

Core changes

Within core is where we find the biggest divergence of codepaths for managed vs. data resources, since data resources have a simpler lifecycle.

The ResourceProvider interface has a new method DataSources, which is analogous to Resources. The Validate phase is consistent between the two, except that the provider abstraction distinguishes between ValidateResource and ValidateDataSource, both of which are supported by EvalValidate depending on mode.

The remainder of the workflow is completely distinct and handled by two different codepaths, switching on the resource mode inside terraform/transform_resource.go.

Even though ultimately data resources support only a "read" operation, the standard plan/apply model is supported by splitting a read into two steps in the ResourceProvider interface:

ReadDataDiff: takes the config and returns a diff as if the data resource were being "created", allowing core to know about the data source's computed attributes without actually reading any data.
ReadDataApply: takes the diff, uses it to obtain the configuration attributes, actually loads the data and returns a state.

The important special behavior for data resources is that during the "refresh" walk they will check to see if their config contains computed values, and if it doesn't then the diff/apply steps are run immediately, rather than waiting until the real plan and apply phases. This ensures that non-computed data source attributes can be safely used inside provider configurations, bypassing the chicken-and-egg problems that are caused by computed provider arguments.

A significant difference compared to managed resources is that a data source "read" does not get access to any previous state; we always create an entirely new instance on each refresh. The intended user-facing mental model for data resources is that they are not stateful at all, and we persist them in the on-disk state file only so that -refresh=false can act as expected without breaking the rest of the workflow.

`helper/schema` support for data sources

In the helper/schema layer, the new map of supported data sources is kept separate from the existing map of supported resources. Data sources use the familiar schema.Resource type but with only a Read implementation required and Create, Update, and Delete functions forbidden.

The Read implementation works in essentially the same way as it does for managed resources, getting access to its configuration attributes via d.Get(...) and setting computed attributes with d.Set(...). The only notable differences are that d.Get(...) won't return values of computed attributes set on previous runs, and calling d.SetId(...) is optional.

To help us migrate existing "logical resources" to instead be data sources, a helper is provided to wrap a data source implementation and shim it to work as a resource implementation. In this case, the Read implementation must call d.SetId(...) in order to meet the expectations of a managed resource implementation.

DataSourcesMap within helper.Provider
Implementations of DataSources, ValidateData, ReadDataDiff and ReadDataApply
Backward-compatibility shim for using data sources as logical resources
- Deprecation warning when using a resource-shimmed data source

`provider/terraform`: example remote state data source

As an example to show things working end-to-end, the terraform_remote_state resource is transformed into a data source, and the backward-compatibility shim is used to maintain the now-deprecated resource.

terraform_remote_state data source

Targeting Data Resources

ResourceAddress is extended with a ResourceMode to handle the distinct managed and data resource namespaces. data.TYPE.NAME can be used to target data resources, for consistency with how data resources are referenced elsewhere.

ResourceAddress support for data.TYPE.NAME syntax and ResourceMode.

UI Changes

When data resource reads appear in plan output, we show them using a distinct presentation to make it clear that no real infrastructure will be altered by this operation:

Since a data resource read is internally just a "create" diff for the resource, this is just some sleight of hand in the UI layer to present it differently.

A "read" diff will appear only if the read operation cannot be completed during the "refresh" phase due to computed configuration.

Change diff output show "reads" differently
Hide the "(ID: ...)" suffix when refreshing data sources, since a data source doesn't get an id until after it is "refreshed".

Other stuff

User-oriented documentation
Prevent data resources from being tainted explicitly with terraform taint ("tainting" is not meaningful for data resources because they are not created/destroyed.)

This represents a data source configuration.

This allows the config loader to read "data" blocks from the config and turn them into DataSource objects. This just reads the data from the config file. It doesn't validate the data nor do anything useful with it.

phinze · 2016-02-02T17:28:30Z

Hi @apparentlymart, this is looking good so far! I'm working on the Terraform / Vault integration which could really take advantage of Data Sources for several of its main data types.

So I'm checking in on your near-term plans are for this branch. If you don't expect to be picking it up soon, perhaps we can discuss me picking up where you left off?

Let me know! 😀

apparentlymart · 2016-05-10T20:25:51Z

@phinze, @jen20: I checked off all the items on my list, so this is now feature-complete according to my original plan.

I have done some ad-hoc manual testing to exercise the various different combinations of computed/non-computed configs, dependent resources, dependent providers, etc.

Unfortunately the one situation that still doesn't seem to work is the very case that this feature was intended to solve:

data "null_data_source" "test" {
  inputs = {
    aws_region = "us-west-2"
  }
}

provider "aws" {
  region = "${data.null_data_source.test.outputs.aws_region}"
}

resource "aws_instance" "foo" {
  instance_type = "t1.foo"
  ami           = "ami-abc123"
}

In the above configuration, the aws provider configuration depends on the data resource. The intended behavior is that the data instance gets created before the aws provider is configured. However, this fails with the following error on terraform plan:

$ terraform plan
Error configuring: 1 error(s) occurred:

* no error reported by variable "test" is nil

This error seems to be occuring during the "input" walk; running with -input=false makes the error go away and the plan complete as expected:

$ terraform plan -input=false
Refreshing Terraform state prior to plan...

data.null_data_source.test: Refreshing state...

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ aws_instance.foo
    ami:                      "ami-abc123"
    availability_zone:        "<computed>"
    ebs_block_device.#:       "<computed>"
    ephemeral_block_device.#: "<computed>"
    instance_state:           "<computed>"
    instance_type:            "t1.foo"
    key_name:                 "<computed>"
    placement_group:          "<computed>"
    private_dns:              "<computed>"
    private_ip:               "<computed>"
    public_dns:               "<computed>"
    public_ip:                "<computed>"
    root_block_device.#:      "<computed>"
    security_groups.#:        "<computed>"
    source_dest_check:        "1"
    subnet_id:                "<computed>"
    tenancy:                  "<computed>"
    vpc_security_group_ids.#: "<computed>"


Plan: 2 to add, 0 to change, 0 to destroy.

I'm not sure this is really resolvable... the only viable path I can see would be to skip asking for inputs on a provider that has any computed configuration, but even that doesn't seem like it'd work since it is the interpolation itself that is failing. If you have any other ideas I'd love to hear them! 😀

ghost · 2020-04-25T02:38:47Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

apparentlymart added 2 commits January 16, 2016 16:18

Config DataSource structure

1a03337

This represents a data source configuration.

Data source loading

e842b22

This allows the config loader to read "data" blocks from the config and turn them into DataSource objects. This just reads the data from the config file. It doesn't validate the data nor do anything useful with it.

apparentlymart added enhancement core wip labels Jan 17, 2016

This was referenced Jan 25, 2016

Allow providing stub resources / read-only resources to improve modularity. #3646

Closed

New resource - aws_availability_zones #4848

Closed

apparentlymart closed this Feb 2, 2016

ghost locked and limited conversation to collaborators Apr 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-driven Terraform Configuration #4705

Data-driven Terraform Configuration #4705

apparentlymart commented Jan 17, 2016 •

edited

Loading

phinze commented Feb 2, 2016

apparentlymart commented May 10, 2016

ghost commented Apr 25, 2020

Data-driven Terraform Configuration #4705

Data-driven Terraform Configuration #4705

Conversation

apparentlymart commented Jan 17, 2016 • edited Loading

Configuration (config package)

Core changes

helper/schema support for data sources

provider/terraform: example remote state data source

Targeting Data Resources

UI Changes

Other stuff

phinze commented Feb 2, 2016

apparentlymart commented May 10, 2016

ghost commented Apr 25, 2020

apparentlymart commented Jan 17, 2016 •

edited

Loading

Configuration (`config` package)

`helper/schema` support for data sources

`provider/terraform`: example remote state data source