-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data-driven Terraform Configuration #6598
Conversation
1711372
to
16b57ad
Compare
@phinze, @jen20: I checked off all the items on my list, so this is now feature-complete according to my original plan. Along with running the automated tests many times along the way 😀, I have done some ad-hoc manual testing to exercise the various different combinations of computed/non-computed configs, dependent resources, dependent providers, etc. It seems to work as I expected. |
89b8d2d
to
c13d0d7
Compare
c13d0d7
to
b48374c
Compare
Previously resources were assumed to always support the full set of create, read, update and delete operations, and Terraform's resource management lifecycle. Data sources introduce a new kind of resource that only supports the "read" operation. To support this, a new "Mode" field is added to the Resource concept within the config layer, which can be set to ManagedResourceMode (to indicate the only mode previously possible) or DataResourceMode (to indicate that only "read" is supported). To support both managed and data resources in the tests, the stringification of resources in config_string.go is adjusted slightly to use the Id() method rather than the unusual type[name] serialization from before, causing a simple mechanical adjustment to the loader tests' expected result strings.
This allows the config loader to read "data" blocks from the config and turn them into DataSource objects. This just reads the data from the config file. It doesn't validate the data nor do anything useful with it.
This allows ${data.TYPE.NAME.FIELD} interpolation syntax at the configuration level, though since there is no special handling of them in the core package this currently just acts as an alias for ${TYPE.NAME.FIELD}.
This is a breaking change to the ResourceProvider interface that adds the new operations relating to data sources. DataSources, ValidateDataSource, ReadDataDiff and ReadDataApply are the data source equivalents of Resources, Validate, Diff and Apply (respectively) for managed resources. The diff/apply model seems at first glance a rather strange workflow for read-only resources, but implementing data resources in this way allows them to fit cleanly into the standard plan/apply lifecycle in cases where the configuration contains computed arguments and thus the read must be deferred until apply time. Along with breaking the interface, we also fix up the plugin client/server and helper/schema implementations of it, which are all of the callers used when provider plugins use helper/schema. This would be a breaking change for any provider plugin that directly implements the provider interface, but no known plugins do this and it is not recommended. At the helper/schema layer the implementer sees ReadDataApply as a "Read", as opposed to "Create" or "Update" as in the managed resource Apply implementation. The planning mechanics are handled entirely within helper/schema, so that complexity is hidden from the provider implementation itself.
In the "schema" layer a Resource is just any "thing" that has a schema and supports some or all of the CRUD operations. Data sources introduce a new use of Resource to represent read-only resources, which require some different InternalValidate logic.
Historically we've had some "read-only" and "logical" resources. With the addition of the data source concept these will gradually become data sources, but we need to retain backward compatibility with existing configurations that use the now-deprecated resources. This shim is intended to allow us to easily create a resource from a data source implementation. It adjusts the schema as needed and adds stub Create and Delete implementations. This would ideally also produce a deprecation warning whenever such a shimmed resource is used, but the schema system doesn't currently have a mechanism for resource-specific validation, so that remains just a TODO for the moment.
As a first example of a real-world data source, the pre-existing terraform_remote_state resource is adapted to be a data source. The original resource is shimmed to wrap the data source for backward compatibility.
For backward compatibility we will continue to support using the data sources that were formerly logical resources as resources for the moment, but we want to warn the user about it since this support is likely to be removed in future. This is done by adding a new "deprecation message" feature to schema.Resource, but for the moment this is done as an internal feature (not usable directly by plugins) so that we can collect additional use-cases and design a more general interface before creating a compatibility constraint.
This will undoubtedly evolve as implementation continues, but this is some initial documentation based on the design doc.
Once a data resource gets into the state, the state system needs to be able to parse its id to match it with resources in the configuration. Since data resources live in a separate namespace than managed resources, the extra "mode" discriminator is required to specify which namespace we're talking about, just like we do in the resource configuration.
data resources are a separate namespace of resources than managed resources, so we need to call a different provider method depending on what mode of resource we're visiting. Managed resources use ValidateResource, while data resources use ValidateDataSource, since at the provider level of abstraction each provider has separate sets of resources and data sources respectively.
The key difference between data and managed resources is in their respective lifecycles. Now the expanded resource EvalTree switches on the resource mode, generating a different lifecycle for each mode. For this initial change only managed resources are implemented, using the same implementation as before; data resources are no-ops. The data resource implementation will follow in a subsequent change.
This implements the main behavior of data resources, including both the early read in cases where the configuration is non-computed and the split plan/apply read for cases where full configuration can't be known until apply time.
The handling of data "orphans" is simpler than for managed resources because the only thing we need to deal with is our own state, and the validation pass guarantees that by the time we get to refresh or apply the instance state is no longer needed by any other resources and so we can safely drop it with no fanfare.
Previously they would get left behind in the state because we had no support for planning their destruction. Now we'll create a "destroy" plan and act on it by just producing an empty state on apply, thus ensuring that the data resources don't get left behind in the state after everything else is gone.
The ResourceAddress struct grows a new "Mode" field to match with Resource, and its parser learns to recognize the "data." prefix so it can set that field. Allows -target to be applied to data sources, although that is arguably not a very useful thing to do. Other future uses of resource addressing, like the state plumbing commands, may be better uses of this.
Since the data resource lifecycle contains no steps to deal with tainted instances, we must make sure that they never get created. Doing this out in the command layer is not the best, but this is currently the only layer that has enough information to make this decision and so this simple solution was preferred over a more disruptive refactoring, under the assumption that this taint functionality eventually gets reworked in terms of StateFilter anyway.
Data resources don't have ids when they refresh, so we'll skip showing the "(ID: ...)" indicator for these. Showing it with no id makes it look like something is broken.
New resources logically don't have "old values" for their attributes, so showing them as updates from the empty string is misleading and confusing. Instead, we'll skip showing the old value in a creation diff.
Internally a data source read is represented as a creation diff for the resource, but in the UI we'll show it as a distinct icon and color so that the user can more easily understand that these operations won't affect any real infrastructure. Unfortunately by the time we get to formatting the plan in the UI we only have the resource names to work with, and can't get at the original resource mode. Thus we're forced to infer the resource mode by exploiting knowledge of the naming scheme.
A companion to the null_resource resource, this is here primarily to enable manual quick testing of data sources workflows without depending on any external services. The "inputs" map gets copied to the computed "outputs" map on read, "rand" gives a random number to exercise cases with constantly-changing values (an anti-pattern!), and "has_computed_default" is settable in config but computed if not set.
b48374c
to
f95dccf
Compare
Provider nodes interpolate their config during the input walk, but this is very early and so it's pretty likely that any resources referenced are entirely absent from the state. As a special case then, we tolerate the normally-fatal case of having an entirely missing resource variable so that the input walk can complete, albeit skipping the providers that have such interpolations. If these interpolations end up still being unresolved during refresh (e.g. because the config references a resource that hasn't been created yet) then we will catch that error on the refresh pass, or indeed on the plan pass if -refresh=false is used.
b093de4
to
453fc50
Compare
Hi @apparentlymart! Both @phinze and I have reviewed this and decided to merge as-is. This is an amazing piece of work, and a fantastic OSS contribution! It will be in the first 0.7 beta along with a couple of additional sources. Hopefully by the time 0.7 actually hits we will have been able to expand the range of data sources available significantly. Thanks for all your work both on the initial proposal and on a solid implementation! |
@apparentlymart Martin, the amount and quality of work you did for Terraform is just incredible. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This is where I'm working on the implementation of the proposal from #4169.
This has been re-opened a bunch of times by this point as it's moved from my fork to the main repo, from master to dev-0.7, and now back from dev-0.7 to master again. 😑
Since this change spans multiple Terraform layers, the sections that follow summarize the changes in each layer, in the hope of making this changeset easier to review. The PR is broken into a sequence of commits which, as far as possible, change only one layer at a time so that each change can be understood in isolation.
Configuration (
config
package)In the
config
layer, data sources are introduced by expanding the existingResource
concept with a new fieldMode
, which represents which operations/lifecycle this resource follows:ManagedResourceMode
: previously the only mode; Terraform creates and "owns" this resource, updating its configuration and eventually destroying it.DataResourceMode
: Terraform only reads from this resourceIn the configuration language,
resource
blocks map toManagedResourceMode
resources anddata
blocks map toDataResourceMode
resources.data
blocks don't permitprovisioner
orlifecycle
sub-blocks because these concepts do not make sense for a resource that only has a "read" action. Internally, data resources always have an emptyProvisioners
slice and a zero-valueResourceLifecycle
instance.A similar extension has been made to
ResourceVariable
, which can now represent both the existingTYPE.NAME.ATTR
variables and the newdata.TYPE.NAME.ATTR
variables, again using aMode
field as the discriminator.Since both traditional resources and data resources are both kinds of resources, they both appear in the
Resources
slice within the configuration struct. TheResource.Id()
implementation keeps them distinct by adding adata.
prefix to data resource ids, which is a convention that will continue through to the core layer.ResourceMode
enumeration andMode
attribute onconfig.Resource
data
blocks from configuration filesdata.TYPE.NAME.ATTR
variables andMode
attribute onconfig.ResourceVariable
Core changes
Within core is where we find the biggest divergence of codepaths for managed vs. data resources, since data resources have a simpler lifecycle.
The
ResourceProvider
interface has a new methodDataSources
, which is analogous toResources
. The Validate phase is consistent between the two, except that the provider abstraction distinguishes betweenValidateResource
andValidateDataSource
, both of which are supported byEvalValidate
depending on mode.The remainder of the workflow is completely distinct and handled by two different codepaths, switching on the resource mode inside
terraform/transform_resource.go
.Even though ultimately data resources support only a "read" operation, the standard plan/apply model is supported by splitting a read into two steps in the
ResourceProvider
interface:ReadDataDiff
: takes the config and returns a diff as if the data resource were being "created", allowing core to know about the data source's computed attributes without actually reading any data.ReadDataApply
: takes the diff, uses it to obtain the configuration attributes, actually loads the data and returns a state.The important special behavior for data resources is that during the "refresh" walk they will check to see if their config contains computed values, and if it doesn't then the diff/apply steps are run immediately, rather than waiting until the real plan and apply phases. This ensures that non-computed data source attributes can be safely used inside provider configurations, bypassing the chicken-and-egg problems that are caused by computed provider arguments.
A significant difference compared to managed resources is that a data source "read" does not get access to any previous state; we always create an entirely new instance on each refresh. The intended user-facing mental model for data resources is that they are not stateful at all, and we persist them in the on-disk state file only so that
-refresh=false
can act as expected without breaking the rest of the workflow.ResourceProvider
interface changesEvalValidate
calls appropriate provider validate method based on resource mode.ResourceStateKey
understands how to deal with "orphan" data resources in the state.graphNodeExpandedResource
branches inEvalTree
to support the different lifecycle for data resources.graphNodeOrphanResource
branches inEvalTree
to support the different lifecycle for data resources.terraform destroy
(or applying aplan -destroy
)helper/schema
support for data sourcesIn the
helper/schema
layer, the new map of supported data sources is kept separate from the existing map of supported resources. Data sources use the familiarschema.Resource
type but with only aRead
implementation required andCreate
,Update
, andDelete
functions forbidden.The
Read
implementation works in essentially the same way as it does for managed resources, getting access to its configuration attributes viad.Get(...)
and setting computed attributes withd.Set(...)
. The only notable differences are thatd.Get(...)
won't return values of computed attributes set on previous runs, and callingd.SetId(...)
is optional.To help us migrate existing "logical resources" to instead be data sources, a helper is provided to wrap a data source implementation and shim it to work as a resource implementation. In this case, the
Read
implementation must calld.SetId(...)
in order to meet the expectations of a managed resource implementation.DataSourcesMap
withinhelper.Provider
DataSources
,ValidateData
,ReadDataDiff
andReadDataApply
provider/terraform
: example remote state data sourceAs an example to show things working end-to-end, the
terraform_remote_state
resource is transformed into a data source, and the backward-compatibility shim is used to maintain the now-deprecated resource.terraform_remote_state
data sourceTargeting Data Resources
ResourceAddress
is extended with aResourceMode
to handle the distinct managed and data resource namespaces.data.TYPE.NAME
can be used to target data resources, for consistency with how data resources are referenced elsewhere.ResourceAddress
support fordata.TYPE.NAME
syntax andResourceMode
.UI Changes
When data resource reads appear in plan output, we show them using a distinct presentation to make it clear that no real infrastructure will be altered by this operation:
Since a data resource read is internally just a "create" diff for the resource, this is just some sleight of hand in the UI layer to present it differently.
A "read" diff will appear only if the read operation cannot be completed during the "refresh" phase due to computed configuration.
Other stuff
terraform taint
("tainting" is not meaningful for data resources because they are not created/destroyed.)