Dag for local data eval #13155

lbajolet-hashicorp · 2024-08-30T18:50:56Z

This PR is a proof-of-concept for a DAG-based approach for scheduling evaluation of datasources and locals.

This is meant to address the current limitation that data sources cannot use locals in their configuration, thereby preventing post-processing through functions in a data-source's configuration, and daisy-chaining of multiple datasources and locals.

This is also meant to serve as a base for later work on introducing DAG-based scheduling for the rest of Packer core in the future.

This code works in several phases:

1- Gather dependencies: for each local and datasource, we round-up the dependencies we can extract from the HCL code. This approach is not flawless and can be thrown off by local aliasing as we rely on the traverals extracted from the raw code, without knowledge of the scope in which the reference is made.
2- Build a DAG with component as vertexes and dependencies as edges: in this step we build a simple DAG from the locals and datasources. Since we already have the detected dependencies, we can then add them all to the graph, and since the graph can be validated (i.e. checks for circular dependencies), this removes the need to recursively evaluate the components.
3- Walk the graph: walking the graph with a function that evaluates the node is the final step here. Since the dependencies are closer to the root of the graph, they are evaluated before proceeding to the next vertex, so we should be safe to evaluate that component at this time.

This DAG approach is gated behind a command-line flag so we don't make it the default just yet. Assuming this makes its way into the Packer code, this should stay gated for a while so users can experiment with it and report potential bugs/limitations that we can address before making this the default.
Once we are confident this can become the default, the UseDAG flag can be repurposed/inverted to specify that we want to rollback to the current, phased approach to evaluation, as an escape hatch in case the DAG breaks their workflow.

The hcl2template package contains references already, but these are linked to a particular type. This becomes problematic if we want to support cross-type references, so this commit adds a new abstraction: refString. A refString contains the component type, its type (if applicable), and its name, so that the combination of those points to a cty object that can be linked to a block in the configuration. Right now, only `var`, `local` and `data` are supported, but the type is extensible enough that anything else that fits this model such as sources can be supported in the future potentially.

The `Name` function is used by the DAG library to determine which vertex is associated to a component, so the `Name` attribute/function needs to replicate the combination of type and name, so the vertex can be accurately fetched from the graph.

Since we are in the process of integrating a new way to orchestrate dependency management and evaluation for datsources and local variables, we need to split the current function that manages the evaluation of said local variables, so that recursion and the actual evaluation of a local variable are two separate functions. This will allow for evaluating a single variable once the dag is ready to be introduced.

As with variables, this commit introduces a new function whose purpose is evaluating the datasource directly, without looking at its dependencies, or recursively trying to execute them. Instead, we rely on the DAG to determine when it is safe to execute it, or if using the phased approach, the current logic will apply.

When registering dependencies for datasources and locals, we now use refString. This allows for the functions that detect dependencies to not only be able to register the same types as dependencies, but instead generalises it to any type that refString supports, so data, local and var. This can then be leveraged for orchestrating evaluation of those components in a non-phased way (i.e. with a DAG for dependency management).

The dag package is a port over from Terraform to Packer, changing what little there was to fit our current dependency ecosystem. Most of the changes are on the type of diagnostics returned, as Terraform has its own type for them, while we rely on hcl's Diagnostics. Other than that, the functionality is essentially equivalent, and the code was barely touched.

As we have finished setting-up the codebase for it, this commit adds the logic that uses the internal DAG package, and is able to orchestrate evaluation of datasources and locals in a non-phased way. Instead, this code acts by first detecting the dependencies for those components, builds a graph from them, with edges representing the dependency links between them, and finally walking on the graph breadth-first to evaluate those components. This can act as a drop-in replacement for the current phased logic, but both should be supported until we are confident that the approach works, and that there are little to no bugs left to squash.

Following up on the DAG work, this commit adds a new option for initialisation that disables DAG on request. By default we are going to use the DAG approach, with an option to fallback to using the older algorithm for evaluation in case users end-up in an edge-case that prevents them from building a template.

For all the commands that call Initialise, we introduce a new flag: UseSequential. This disables DAG scheduling for evaluating datasources and locals as a fallback to the newly introduced DAG scheduling approach. `hcl2_upgrade` is a special case here, as the template is always JSON, there cannot be any datasource, so the DAG in this case becomes meaningless, and is not integrated in this code path.

The implementation of the DAG as extracted from Terraform relied on a Root vertex being injected into the graph as the last node to visit. This is used as a sanity check for Terraform, but doesn't apply to our use-case for now, as we are always executing everything and have no need for this root node. Instead, we change how Validate operates so it does not error in case there is no valid root node for the graph, but enables us calling it to check for self-referencing edges, and circular dependencies.

Local variables had an attribute called Name with the name of the local variable. However, when producing an error while walking the DAG of local/datasources, if an error is encountered during validation, the raw structure of the vertex was printed out, making the error message produced hard to understand. Therefore in order to clean it up, we rename the `Name` attribute for Local variables as `LocalName`, and introduce a `Name()` function for that block so that the complete name of the variable is clearly reported.

Walk uses a reverse topological order to walk on the graph, doing that visit concurrently if possible. This is nice as we can speed-up execution of datasources and locals, however since the `Variables` map stored in the config, and the production of the context for it, are not meant to be used concurrently, this means that we end-up in cases where Packer crashes because of concurrent accesses to that map. So until we can change this behaviour, we will fallback to using the sequential visit algorithm for those vertexes, therefore limiting the risk of those conflicts.

When preparing the datasources to add into the DAG for evaluating the build prerequisites, we ended-up in a weird situation in which the datasources for each vertex pointed to the same one. This is because of the loop semantics of Go, where the same object is reused over and over again during each loop, so in the end every datasource vertex pointed to the same instance of a datasource block. To avoid this, we instead grab them through their reference, making the reference to the datasource purely local, and pointing to the actual datasource block, not the one scoped to the function.

Evaluating local variables used to be directly written to the PackerConfig while each variable was created. This was somewhat of an issue with testing, as we have a bunch of tests that relied on `PackerConfig.Variables` being set only when we actually write something. This is not really a concern for normal use, just for testing, but to limit the number of changes to the tests in hcl2template, I opted to change how variables' values are retained, so that evaluating a single variable returns a Variable in addition to hcl.Diagnostics, so we can reify the approach and only create the map of variables if there's something evaluated.

Since we introduce the DAG with this series of commits, only on locals and data sources, we need to make sure that the behaviour is what we expect. Therefore, this commit adds a basic test with Packer build, and packer validate, to evaluate a template with both locals and data sources depending on one another. This is rejected with the sequential evaluation methods, as we process the different types one-by-one, whereas the DAG allows us to mix the order between the two, while still rejecting circular dependencies (and doing that before they even get evaluated), and self-references.

JenGoldstrich

Largely this LGTM, I think we should cleanup the unused DAG functions and consider adding some more tests, let me know what you think!

JenGoldstrich · 2024-09-13T17:03:44Z

internal/dag/dag.go

+// Copyright (c) HashiCorp, Inc.
+// SPDX-License-Identifier: BUSL-1.1
+
+package dag


Since this code was largely copied from Terraform, I wonder if it would make sense for this to put the DAG HCL logic in its own repository, I understand that's a bit more work, but it removes some of the complexity from the Packer code base for something that multiple HashiCorp tools using HCL need to build a DAG to solve the same problem

So I do agree with you, if the DAG lib becomes a common component for multiple products from HCP, it makes sense to make that a library, but I think this is something to discuss with Terraform, especially the people who developed/maintain that code.
I would consider opening an issue later and include them in it to discuss that possibility, or out-of-band (maybe make that a first step before opening issues).

JenGoldstrich · 2024-09-13T17:24:52Z

internal/dag/tarjan.go

+
+// StronglyConnected returns the list of strongly connected components
+// within the Graph g. This information is primarily used by this package
+// for cycle detection, but strongly connected components have widespread


Nit but I think we should just mention what its used for in the case of HCL component DAGs

Suggested change

// for cycle detection, but strongly connected components have widespread

// for cycle detection.

Yeah, I imagine we can cut short on the comment here, good call

JenGoldstrich · 2024-09-13T18:15:26Z

internal/dag/walk.go

+// Copyright (c) HashiCorp, Inc.
+// SPDX-License-Identifier: BUSL-1.1
+
+package dag


In my opinion we should not include these unused dag functions, just to minimize the amount of imported code. I know we may use it in the future but its also likely we'll have to debug this existing functionality, and extra unused code makes that harder.

As we discussed, I understand the rationale for that simplification, it does reduce the drag on maintenance, and if we don't know if/when we'll use those functions, we can get rid of them, and re-introduce them later.
That said, given the code was imported as-is (with very minimal differences) from TF, I would imagine it is relevant to keep it as similar as possible, if only so that if a bug is found in either or an improvement is added later, we can adopt it easily, or if there are changes that we want, it may be easier to reason with if there are no significant differences between the two packages.
It's more of a maybe/potential though, in the current state, I can understand why reducing the amount of code introduced would be desirable.
Any thoughts @nywilken @devashish-patel ?

JenGoldstrich · 2024-09-13T18:20:35Z

hcl2template/parser.go

+ // report it as a local variable, which is not necessarily what we
+ // want to process here, so we continue.
+ //
+ // Note: this is kinda brittle, we should understand scopes to accurately


You mention this is brittle, can you think of a scenario where this would cause valid dependencies to be ignored?

So on this, I don't have an example to be fair, I had tried to replicate a concern of mine with local aliases in a for expression, but the local aliases weren't returned by the hcl function, so it's mostly a non-concern at this point.
I can remove this comment until we do encounter a real-life example that highlights this problem, but at the present time, it seems somewhat solid on the templates we have in the repo.

JenGoldstrich · 2024-09-13T18:30:21Z

hcl2template/parser.go

+ return &retGraph, nil
+}
+
+func (cfg *PackerConfig) evaluateBuildPrereqs(skipDatasources bool) hcl.Diagnostics {


The logic in this parser seems minimally unit tested, I notice that we have the packer_tests that invoke a larger section of the code, but I feel like it may be worth adding more testing to the logic in the parser, maybe we could make these functions public instead of private and invoke them in unit tests, just small tests that make sure we skip things aliased as local for example

I agree it is not unit tested, and to be frank there are a bunch of pre-requisites to make this unit-testable, at the moment we rely on the Initialize tests (and subsequent phases) to test it, at least for regression checks.

Also to be clear: that code is in a parser.go file, but at this point we're not doing any kind of parsing, that was done before, it's kinda clumsy to have that logic in this file. I opted to add it here for this PR as this is where the phased code was already located, but honestly we should split this logic into separate files to avoid the confusion,

lbajolet-hashicorp added enhancement stage/needs-discussion labels Aug 30, 2024

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch 4 times, most recently from dd532f8 to 9946fe1 Compare September 3, 2024 15:35

lbajolet-hashicorp added 8 commits September 5, 2024 15:37

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch from 9946fe1 to 81511d4 Compare September 5, 2024 19:40

lbajolet-hashicorp added 7 commits September 9, 2024 11:10

lbajolet-hashicorp force-pushed the dag_for_local_data_eval branch from 7fae5b6 to ac566c9 Compare September 9, 2024 15:10

lbajolet-hashicorp marked this pull request as ready for review September 11, 2024 18:27

lbajolet-hashicorp requested a review from a team as a code owner September 11, 2024 18:27

JenGoldstrich reviewed Sep 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dag for local data eval #13155

Dag for local data eval #13155

lbajolet-hashicorp commented Aug 30, 2024

JenGoldstrich left a comment

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

JenGoldstrich Sep 13, 2024

lbajolet-hashicorp Sep 13, 2024

	// for cycle detection, but strongly connected components have widespread
	// for cycle detection.

Dag for local data eval #13155

Are you sure you want to change the base?

Dag for local data eval #13155

Conversation

lbajolet-hashicorp commented Aug 30, 2024

JenGoldstrich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment