-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use a DAG to resolve locals, globals, and include #858
base: main
Are you sure you want to change the base?
Conversation
locals
, globals
, and include
Obviously it hasn't been hooked into any of the actual config parsing yet, either. |
@apottere Could you update the README so we have a better idea of how you expect these config features to be used? @yorinasub17 I think you were driving these discussions and did a lot of the recent work in this area. Could you take a look and share your thoughts? |
Yup I plan on taking a look, but have been buried with my other tasks. I took a quick look earlier in the week and realized this is pretty big and requires some deeper thinking so I decided to wait until I have a bit of time to sit down and dive deep. I expect to have some time early next week to review this. |
@brikis98 I'd rather chat about functionality here (instead of updating the readme) if that's ok, since there's a very good chance what I put in the readme now will have to change. This was my desired solution to #814, which allows a developer to have This approach also allows you to reference The test in the PR has an example of how this could be used, but I'll explain in more detail here.
And a child
Terragrunt can figure out which variables depend on other variables using builtin HCL2 methods, and then graph them out to determine a resolution order. It can figure out that the child's Once terragrunt has resolved the include path (and only the locals that are required for that), it can go find the parent and merge the parent's globals and locals into the graph. At that point it has everything it needs to evaluate all of the variables and create a map that can be added to the EvaluationContext for decoding the rest of the HCL. In the example above, the parent has For the parent's Another thing I forgot to mention in my first comment is that this approach is fairly trivial to modify to support multiple includes. You basically just repeat the |
Also another reminder that this is a POC and not finished code, e.g. it's not actually hooked up to the real evaluation pipeline yet and has a bunch of |
Also, I'm going to be out of town for 2 weeks, so no rush getting to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got the chance to sit down and read through this. Thanks for your contribution! I've been thinking about the graph approach, but wanted to start simple so it is good to see an iteration from the community that explores this.
I think this is a good start. I took a pass through and highlighted a few of the glaring things that stood out, but unfortunately I must admit that I didn't do a complete thorough review and add all my comments. This is primarily because the lack of comments on all the functions made it hard to follow and understand what you were going for in each of the implementation. E.g why does addVertices
need a side effect function to add vertices to the callers' array? Why does evaluateVariable
return a boolean, and seems to ignore the errors? You don't need to answer these as I realized this after reading the code a few times, but definitely could have used some comments to assist in this.
So as a next step, can you:
- Add comments to each function and struct with a description of the intended action, and expected return items, especially those functions that are unconventional like
evaluateVariable
. - I could use some help understanding the intended usage of
globals
in relation tolocals
, especially after all this graph logic. E.g when canglobals
referencelocals
? Can a childlocal
reference the parentlocal
? Can a child reference the parentlocal
in the inputs? What aboutinclude
? And what is supposed to happen if both the child and the parent hasglobal
with the same name? Does the child win or the parent? Ideally, you would capture all this in the README.
Speaking of that...
I'd rather chat about functionality here (instead of updating the readme) if that's ok, since there's a very good chance what I put in the readme now will have to change.
We follow RDD here and it is actually important to us that our contributors follow it as much as possible. We mention this practice in our contribution guideline. For example, if you had updated the README first for feedback with your intentions of supporting globals referencing locals in a loop, it would have made it a lot easier to understand what you were going for in the graph approach. We also could have discussed there if the graph was necessary, or if we should go for the simpler approach of reorganizing the order of evaluation. Also as a reviewer, it is much easier to understand and discuss a README change than it is to discuss the code update.
|
||
const local = "local" | ||
const global = "global" | ||
const include = "include" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const namespace is shared with the package. Since this can be used for other files in this package, I think it better to put this in a constants.go
file.
// 4. Verify DAG and reduce graph | ||
// a. Verify no globals are used in path to include (if exists) | ||
// 5. Evaluate everything except globals | ||
// 6. If include exists, find parent HCL, parse, and extract locals and globals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the child can refer to parent locals? I am not sure I want to support that. The whole point of separating globals
(which we intend to support all the looping logic) and locals
was to avoid all the messy logic in merging variable references. What happens if both the child and parent define the same name? Does the child win? Or the parent? What about when referring to locals in the parent? Is that referring to the childs local?
For this reason, the idea was to keep locals
as variables that are only referencable in the current config, and globals
are variables that are in the shared namespace. Having this split allows us to recommend locals
for most use cases, and for the cases where you need to reference other config or for resolving variables that depend on include
, you can use globals
but the usage should be limited since it adds complexity to your terragrunt config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, locals
are always local
to the file where they're defined. globals
can be shared between files, and their values can be based off locals
in the file they're declared in.
If a child config overrides a global variable, the child config's expression will be evaluated based on the child config's locals, and the value of the global will be used in the parent and the child for that global.
// a. Verify that there are no globals that are empty. | ||
// 10. Evaluate everything, skipping things that were evaluated in (5) | ||
func ParseConfigVariables(filename string, terragruntOptions *options.TerragruntOptions) (*EvaluationResult, *EvaluationResult, error) { | ||
globals := evaluatorGlobals{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globals := evaluatorGlobals{ | |
globalEvaluatorConfig := evaluatorGlobals{ |
or globalEvaluatorOptions
? Using a name like globals
makes it easy to confuse with the globals
block later on in the code.
// Add root of graph | ||
globals.graph.Add(globals.root) | ||
|
||
child := *newConfigEvaluator(filename, globals, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why dereference the pointer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignorance, mostly :D I really don't have a good grasp of pointer semantics in go, so that's something I'll read up on before I clean up the code.
|
||
// 5 | ||
diags := globals.evaluateVariables(false) | ||
if diags != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if diags != nil { | |
if diags != nil && diags.HasErrors() { |
|
||
// 10 | ||
diags = globals.evaluateVariables(true) | ||
if diags != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if diags != nil { | |
if diags != nil && diags.HasErrors() { |
return blocksByType[locals][0].Body, blocksByType[globals][0].Body, blocksByType[include][0].Body, diags | ||
} else { | ||
return blocksByType[locals][0].Body, blocksByType[globals][0].Body, nil, diags | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should return a struct to simplify this logic? This will also make it clear what each positional return type is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this code will change significantly to be less brittle. Right now it's just happy-path.
return nil | ||
} | ||
|
||
func (eval *configEvaluator) evaluateVariable(vertex variableVertex, diags hcl.Diagnostics, evaluateGlobals bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be easier to understand if the evaluateGlobals
bool was replaced with a list of variable references to evaluate.
} | ||
} | ||
|
||
func (eval *configEvaluator) addVertices(vertexType string, block hcl.Body, consumer func(vertex variableVertex) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding what this function is doing. I can tell that it is a routine to add vertices to the graph, but what is difficult to understand are:
- Each of the conditions for when a vertex is added.
- The purpose of the consumer function. In general, side effect based functions make it hard to parse and understand, which makes it harder to maintain the code. If you can come up with a way to avoid that, it would be great. E.g you can make this function return all the vertices that were added to the graph, and then have the caller do what it needs to do by looping through those vertices. That is easier to parse than having to understand that this is a callback that is called on each vertex that was added to the graph.
|
||
variables := target.Expr.Variables() | ||
|
||
if variables == nil || len(variables) <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if variables == nil || len(variables) <= 0 { | |
if variables == nil || len(variables) == 0 { |
@yorinasub17 I just got back and settled from vacation, so I should be more responsive from here on out. Thanks for taking a look! I'll update the README shortly with some examples of what I'd expect to be possible. I'll also take a look at all the comments in the code, but I tried to stress in my original comment that is is not releasable code in any way, shape, or form. I'd really like to just discuss the entire strategy before I spend time perfecting the implementation, which might just be thrown away. If/when I get the go-ahead, I'll get the code to a place where I think it's ready for review and it should be a lot easier to follow then - till then, it's 100% a POC. Ideally, you should only have to look at the README (coming), PR, and example test for now. I'll add another comment once the README changes are done. |
@yorinasub17 Made a first pass of updates to the readme, mainly in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to write the README! It was very useful in understanding how each of the new functionality can help.
Other than the naming changes, I think what you have makes sense. I am not entirely convinced that we should support the cross calls between globals
and locals
, but I think there is a way to make the graph parsing logic easier to follow so as long as we can get it simplified and easier to read, I think I am ok moving forward with the graph based approach.
However, given that it is a pretty big change, I would like a second opinion from @brikis98 .
what you want. | ||
|
||
|
||
##### `include.relative` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be named path_relative_to_include
, so the direction is clear in the name, and it parallels the existing function.
`locals` defined in the parent config of an `include` block into the current context. If you wish to reuse variables | ||
globally, consider using `yaml` or `json` files that are included and merged using the `terraform` built in functions | ||
available to `terragrunt`. | ||
##### `include.relative_reverse` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, this should be named path_relative_from_include
.
@brikis98 did you get a chance to take a look at this yet? |
Not yet. Really buried this week. Will try to get to it as soon as I can. |
Hey folks, apologies for the very long delay. I kept snoozing this in favor of other priorities... And somehow more than a month went by before I got to it. I appreciate your patience. I looked through the README updates in this PR and here are some thoughts:
|
No worries, thanks for taking a look @brikis98!
I agree - I've been mulling over this in the back of my head for a while now and I think what I would really like from a project like terragrunt is basically terraform, but where terraform projects (with one remote state each) are the resources. This is already partially realized with the current "dependency" block and local variables.
I'm also starting to think that the answer to this might be
I really like this idea, but:
I definitely agree. I'm also open to other suggestions, and I'd like to hear @yorinasub17's thoughts on it as well. |
I've been getting around the lack of inherited "globals" by using some smarts in my leaf
Of course if some |
I like this approach a lot! This makes sense and agree that reduces the complexity of the language.
This distinction already exists in the
From a design principle perspective, we've been leaning towards being explicit as opposed to implicit, and generally want to avoid adding implicit features if we can. I think if we were implementing our configs ourselves, we would always err on the side of explicitly specifying the imports. This is because it avoids ambiguities where you may have multiple |
Alright, so is anyone up for opening a new PR that updates the |
As a user, this sort of feels like shoehorning two use-cases into a single block for the sole purpose of not having to support a second block. I think it's fair to say that:
This could also be solved by only allowing a single dependency to auto-import, and it would be opt-in. If you want fine-grained control over merging, just don't add that flag. If you want auto-merging, it would merge your configurations exactly as it does today. -- One thing I didn't think of in my original comment is that using dependency blocks doesn't solve the use-case of letting child configs affect the behavior of parent configs. We would need to have some concept of variables or inputs for a |
Another issue we need to figure out is whether or not to allow dependencies in included configs, and if allowed, how to also include those dependencies in your own config. |
Also, would things like |
Sorry for all of the individual comments, I just keep thinking of things after the fact. Another thought: does it make sense for the importer to get access to all of the imported config's locals? It seems like it would be useful to have a way to get some intermediate values from a config you import, but that would kind of violate the concept of those variables being |
Those are fair points. You are right about the I am actually now convinced that we should have an
I think adding this in with the limitation would frustrate more users. Psychologically, it is much easier to argue for 1+N imports when we already support 1 import.
This was always awkward to me because there is logic in the parent config that depends on who is importing. This resembles monkey patching, which is powerful but usually frowned upon. The nice thing about the new model is that this is handled by being more explicit in the config. E.g in the current model, we rely on
(Note that this assumes merge does a nested merge, which I am not 100% sure it does. But we can probably implement a helper function in terragrunt that allows it.) Now I know this is slowly getting into flame war territory (explicit vs implicit; convention vs configuration), but I really like the fact that everything is now one way and thus much easier to mentally parse. This also provides a workaround for cyclic imports. Another example: currently we rely on
prod/terragrunt.hcl
prod/region/terragrunt.hcl
prod/region/env/terragrunt.hcl
prod/region/env/vpc/terragrunt.hcl
If everything can only be imported one way, then I think we can and should allow dependencies. Having a different construct for
If we assume that imports are only one way, then I think this makes sense and is not much mental overhead to process. The only reason why I was against auto merging/making available |
Another reason why allowing locals across imports in the new model feels better than the old model: the parent |
I'm definitely a fan of explicit vs implicit, as long as it doesn't get in the way. For context, a lot of our terragrunt configs have ended up looking something like this:
The leaf
All of the heavy lifting is done in the parent While auto-import would help keep our "marker" files simple, I'm definitely open to other solutions too. One that I can think of off the top of my head is moving everything that affects terraform (
And our child configs could look something like this:
With a deep-merge helper function, it gets much easier to include all of the parent config except a certain part as well. This approach also relies heavily on being able to affect the behavior of the config based on where it's imported from. I'm all for making that behavior change explicit, though. Just like terraform modules, we could have variables specified in the config that must be supplied (or have defaults) in order to import it. This would remove a lot of the "magic" we have right now while still supporting the use-case. Consider the following example:
Obviously how the variables are defined and used is up for debate, but something like this would allow you to explicitly change the behavior of an imported config with expected results.
I like that as well, but it still feels weird that you could break another config that includes the current config by changing the name or value of a After typing this all out, its starting to feel like I'm just trying to re-create all of the features of a terraform module in terragrunt... not sure if that's a good or bad thing. |
Makes sense to me.
What you're really defining then is not a config file (something relatively static), but a function that can be called from other places. Functions have inputs, which affect their behavior, and outputs, which are the data they return to the rest of the world. So to implement the usage pattern where the parent config has almost all the logic, the pattern you're describing might look like this: import "common" {
path = "../../common/terragrunt.hcl"
}
terraform = common.some_function(input1, input2) That's the explicit version of what you're describing... But I'm not sure we want to go down that path?
Yup. The reality is that Terragrunt exists only to work around weaknesses in Terraform... And it would be far better if Terraform didn't have those weaknesses in the first place. I wrote up a mini RFC before that basically gets Terragrunt to do what we want from Terraform: #759. IMO, this approach is DRY, explicit, and easy to understand and maintain. People didn't seem too enthused about it though 😁 |
Yeah, what I'm looking for is a function in the theoretical sense - inputs get computed and make an output. However, what I really think I want is not an HCL function, but something like a data source from terraform. Since "imports" would no longer be "parents", I think it's reasonable for them to be parameterized - and maybe "imports" is the wrong terminology at that point. Data sources are a good analogy because they're essentially pure functions - you put in inputs, and you can use the resulting outputs without side-effects. An un-parameterized import is very simple to understand, but without parameters its extremely limited on how DRY it can really keep your code. At some point you'll have to make the last-mile adjustments to the config, and you'll most likely do that by using copy and paste in the files that import it.
That's pretty much exactly what I'm looking for, it's unfortunate it didn't gain much traction. Having a |
Yup, but the data sources in Terraform are implemented as functions in Go, not in Terraform 😁
It seems like the main complaints were:
|
I was talking more about the UX, and less about the implementation. Obviously data sources are functions somehwere 😛
Yeah, both of those issues make sense. I definitely think that separate files would be necessary to keep things clean, but it would be really nice for them to have access to the other modules in the directory without explicitly specifying dependency locations - that way dependencies in-environment can be discovered naturally by references the way terraform does it. As far as "hitting up + enter destroying things accidentally" goes, why not require the non xxx-all commands to specify a single module (edit: if there's more than just the default
|
FYI I was a huge fan of the single folder approach, and still am. I think it's worth trying and would definitely use it for all of my projects. |
I wrote up an RFC for the idea proposed in #858 (comment). Would appreciate feedback from those following this PR as it is an alternative approach to addressing the problem that |
I believe that the RFC is the way to go. With imports more problems are solved than when using just |
source-prefix = "src-" | ||
} | ||
globals { | ||
region = "us-west-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
west?
I know it's been a while, is this still something that is active? |
@7nwwrkdnakht3 try the new deep merge option within the |
This is a possible solution to #814.
This PR uses the same DAG that terraform uses to perform dependency resolution. All
locals
,globals
, andinclude
block attributes are added to the graph, and then edges are generated between nodes by extracting variable references in the attribute's expression (this is supported natively in HCL2).The graph is then evaluated and updated in a few stages:
locals
/include
are evaluated, to see if an include exists.locals
andglobals
are added to the graph if a parent is includedThis allows any variable to depend on any other variable, with a few exceptions:
include
expression can depend on aglobal
variable, necessarilyDocs from the code:
There's more cleanup/validation/documentation that needs to be done, but a POC can be found in
config/config_graph_test.go
andtest/fixture-config-graph
that shows how a config might be structured with this PR.Note:
include
now has to be a variable instead of a function because AFAIK there's no way to extract function references from an HCL2 expression.