Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dag for local data eval #13155

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

lbajolet-hashicorp
Copy link
Contributor

This PR is a proof-of-concept for a DAG-based approach for scheduling evaluation of datasources and locals.

This is meant to address the current limitation that data sources cannot use locals in their configuration, thereby preventing post-processing through functions in a data-source's configuration, and daisy-chaining of multiple datasources and locals.

This is also meant to serve as a base for later work on introducing DAG-based scheduling for the rest of Packer core in the future.

This code works in several phases:

1- Gather dependencies: for each local and datasource, we round-up the dependencies we can extract from the HCL code. This approach is not flawless and can be thrown off by local aliasing as we rely on the traverals extracted from the raw code, without knowledge of the scope in which the reference is made.
2- Build a DAG with component as vertexes and dependencies as edges: in this step we build a simple DAG from the locals and datasources. Since we already have the detected dependencies, we can then add them all to the graph, and since the graph can be validated (i.e. checks for circular dependencies), this removes the need to recursively evaluate the components.
3- Walk the graph: walking the graph with a function that evaluates the node is the final step here. Since the dependencies are closer to the root of the graph, they are evaluated before proceeding to the next vertex, so we should be safe to evaluate that component at this time.

This DAG approach is gated behind a command-line flag so we don't make it the default just yet. Assuming this makes its way into the Packer code, this should stay gated for a while so users can experiment with it and report potential bugs/limitations that we can address before making this the default.
Once we are confident this can become the default, the UseDAG flag can be repurposed/inverted to specify that we want to rollback to the current, phased approach to evaluation, as an escape hatch in case the DAG breaks their workflow.

The hcl2template package contains references already, but these are
linked to a particular type.
This becomes problematic if we want to support cross-type references, so
this commit adds a new abstraction: refString.

A refString contains the component type, its type (if applicable), and
its name, so that the combination of those points to a cty object that
can be linked to a block in the configuration.

Right now, only `var`, `local` and `data` are supported, but the type is
extensible enough that anything else that fits this model such as
sources can be supported in the future potentially.
The `Name` function is used by the DAG library to determine which vertex
is associated to a component, so the `Name` attribute/function needs to
replicate the combination of type and name, so the vertex can be
accurately fetched from the graph.
Since we are in the process of integrating a new way to orchestrate
dependency management and evaluation for datsources and local variables,
we need to split the current function that manages the evaluation of
said local variables, so that recursion and the actual evaluation of a
local variable are two separate functions.

This will allow for evaluating a single variable once the dag is ready
to be introduced.
As with variables, this commit introduces a new function whose purpose
is evaluating the datasource directly, without looking at its
dependencies, or recursively trying to execute them.

Instead, we rely on the DAG to determine when it is safe to execute it,
or if using the phased approach, the current logic will apply.
When registering dependencies for datasources and locals, we now use
refString.

This allows for the functions that detect dependencies to not only be
able to register the same types as dependencies, but instead generalises
it to any type that refString supports, so data, local and var.

This can then be leveraged for orchestrating evaluation of those
components in a non-phased way (i.e. with a DAG for dependency
management).
The dag package is a port over from Terraform to Packer, changing what
little there was to fit our current dependency ecosystem.
Most of the changes are on the type of diagnostics returned, as
Terraform has its own type for them, while we rely on hcl's Diagnostics.

Other than that, the functionality is essentially equivalent, and the
code was barely touched.
As we have finished setting-up the codebase for it, this commit adds the
logic that uses the internal DAG package, and is able to orchestrate
evaluation of datasources and locals in a non-phased way.

Instead, this code acts by first detecting the dependencies for those
components, builds a graph from them, with edges representing the
dependency links between them, and finally walking on the graph
breadth-first to evaluate those components.

This can act as a drop-in replacement for the current phased logic, but
both should be supported until we are confident that the approach works,
and that there are little to no bugs left to squash.
Following up on the DAG work, this commit adds a new option for
initialisation that disables DAG on request.

By default we are going to use the DAG approach, with an option to
fallback to using the older algorithm for evaluation in case users
end-up in an edge-case that prevents them from building a template.
For all the commands that call Initialise, we introduce a new flag:
UseSequential.

This disables DAG scheduling for evaluating datasources and locals as a
fallback to the newly introduced DAG scheduling approach.

`hcl2_upgrade` is a special case here, as the template is always JSON,
there cannot be any datasource, so the DAG in this case becomes
meaningless, and is not integrated in this code path.
The implementation of the DAG as extracted from Terraform relied on a
Root vertex being injected into the graph as the last node to visit.

This is used as a sanity check for Terraform, but doesn't apply to our
use-case for now, as we are always executing everything and have no need
for this root node.

Instead, we change how Validate operates so it does not error in case
there is no valid root node for the graph, but enables us calling it to
check for self-referencing edges, and circular dependencies.
Local variables had an attribute called Name with the name of the local
variable.

However, when producing an error while walking the DAG of
local/datasources, if an error is encountered during validation, the raw
structure of the vertex was printed out, making the error message
produced hard to understand.

Therefore in order to clean it up, we rename the `Name` attribute for
Local variables as `LocalName`, and introduce a `Name()` function for
that block so that the complete name of the variable is clearly
reported.
Walk uses a reverse topological order to walk on the graph, doing that
visit concurrently if possible.

This is nice as we can speed-up execution of datasources and locals,
however since the `Variables` map stored in the config, and the
production of the context for it, are not meant to be used concurrently,
this means that we end-up in cases where Packer crashes because of
concurrent accesses to that map.

So until we can change this behaviour, we will fallback to using the
sequential visit algorithm for those vertexes, therefore limiting the
risk of those conflicts.
When preparing the datasources to add into the DAG for evaluating the
build prerequisites, we ended-up in a weird situation in which the
datasources for each vertex pointed to the same one.

This is because of the loop semantics of Go, where the same object is
reused over and over again during each loop, so in the end every
datasource vertex pointed to the same instance of a datasource block.

To avoid this, we instead grab them through their reference, making the
reference to the datasource purely local, and pointing to the actual
datasource block, not the one scoped to the function.
Evaluating local variables used to be directly written to the
PackerConfig while each variable was created.

This was somewhat of an issue with testing, as we have a bunch of tests
that relied on `PackerConfig.Variables` being set only when we actually
write something.

This is not really a concern for normal use, just for testing, but to
limit the number of changes to the tests in hcl2template, I opted to
change how variables' values are retained, so that evaluating a single
variable returns a Variable in addition to hcl.Diagnostics, so we can
reify the approach and only create the map of variables if there's
something evaluated.
Since we introduce the DAG with this series of commits, only on locals
and data sources, we need to make sure that the behaviour is what we
expect.

Therefore, this commit adds a basic test with Packer build, and packer
validate, to evaluate a template with both locals and data sources
depending on one another.

This is rejected with the sequential evaluation methods, as we process
the different types one-by-one, whereas the DAG allows us to mix the
order between the two, while still rejecting circular dependencies (and
doing that before they even get evaluated), and self-references.
@lbajolet-hashicorp lbajolet-hashicorp marked this pull request as ready for review September 11, 2024 18:27
@lbajolet-hashicorp lbajolet-hashicorp requested a review from a team as a code owner September 11, 2024 18:27
Copy link
Contributor

@JenGoldstrich JenGoldstrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely this LGTM, I think we should cleanup the unused DAG functions and consider adding some more tests, let me know what you think!

// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1

package dag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this code was largely copied from Terraform, I wonder if it would make sense for this to put the DAG HCL logic in its own repository, I understand that's a bit more work, but it removes some of the complexity from the Packer code base for something that multiple HashiCorp tools using HCL need to build a DAG to solve the same problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do agree with you, if the DAG lib becomes a common component for multiple products from HCP, it makes sense to make that a library, but I think this is something to discuss with Terraform, especially the people who developed/maintain that code.
I would consider opening an issue later and include them in it to discuss that possibility, or out-of-band (maybe make that a first step before opening issues).


// StronglyConnected returns the list of strongly connected components
// within the Graph g. This information is primarily used by this package
// for cycle detection, but strongly connected components have widespread
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but I think we should just mention what its used for in the case of HCL component DAGs

Suggested change
// for cycle detection, but strongly connected components have widespread
// for cycle detection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I imagine we can cut short on the comment here, good call

// Copyright (c) HashiCorp, Inc.
// SPDX-License-Identifier: BUSL-1.1

package dag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion we should not include these unused dag functions, just to minimize the amount of imported code. I know we may use it in the future but its also likely we'll have to debug this existing functionality, and extra unused code makes that harder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I understand the rationale for that simplification, it does reduce the drag on maintenance, and if we don't know if/when we'll use those functions, we can get rid of them, and re-introduce them later.
That said, given the code was imported as-is (with very minimal differences) from TF, I would imagine it is relevant to keep it as similar as possible, if only so that if a bug is found in either or an improvement is added later, we can adopt it easily, or if there are changes that we want, it may be easier to reason with if there are no significant differences between the two packages.
It's more of a maybe/potential though, in the current state, I can understand why reducing the amount of code introduced would be desirable.
Any thoughts @nywilken @devashish-patel ?

// report it as a local variable, which is not necessarily what we
// want to process here, so we continue.
//
// Note: this is kinda brittle, we should understand scopes to accurately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention this is brittle, can you think of a scenario where this would cause valid dependencies to be ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on this, I don't have an example to be fair, I had tried to replicate a concern of mine with local aliases in a for expression, but the local aliases weren't returned by the hcl function, so it's mostly a non-concern at this point.
I can remove this comment until we do encounter a real-life example that highlights this problem, but at the present time, it seems somewhat solid on the templates we have in the repo.

return &retGraph, nil
}

func (cfg *PackerConfig) evaluateBuildPrereqs(skipDatasources bool) hcl.Diagnostics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this parser seems minimally unit tested, I notice that we have the packer_tests that invoke a larger section of the code, but I feel like it may be worth adding more testing to the logic in the parser, maybe we could make these functions public instead of private and invoke them in unit tests, just small tests that make sure we skip things aliased as local for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is not unit tested, and to be frank there are a bunch of pre-requisites to make this unit-testable, at the moment we rely on the Initialize tests (and subsequent phases) to test it, at least for regression checks.

Also to be clear: that code is in a parser.go file, but at this point we're not doing any kind of parsing, that was done before, it's kinda clumsy to have that logic in this file. I opted to add it here for this PR as this is where the phased code was already located, but honestly we should split this logic into separate files to avoid the confusion,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants