Abstract manifest generation from tasks #6565

stu-k · 2023-01-10T17:59:11Z

resolves #6357
resolves #6534

Description

Abstract manifest generation from directly inside the task classes to their instantiation.

Checklist

I have read the contributing guide and understand what's expected of me
I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
I have run changie new to create a changelog entry

ChenyuLInx

Overall looks good!

core/dbt/cli/main.py

core/dbt/task/base.py

core/dbt/task/compile.py

core/dbt/task/generate.py

core/dbt/task/runnable.py

core/dbt/cli/main.py

stu-k · 2023-01-18T18:53:36Z

core/dbt/cli/requires.py

+    # if there are no args, the decorator was used without params @decorator
+    # otherwise, the decorator was called with params @decorator(arg)
+    if len(args0) == 0:
+        return outer_wrapper
+    return outer_wrapper(args0[0])


Is there a better more pythonic way to have a decorator accept args?

ChenyuLInx

Overall looks good, I was able to confirm a run works as expected locally.

core/dbt/cli/requires.py

core/dbt/task/base.py

jtcohen6

Great work on this @stu-k! I left some comments to contextualize a few of the interrelated pieces that this PR is touching.

I called out two required changes that I'd consider blocking to merge:

Now that we've converted more manifest-using commands, we'll need @requires.manifest on those commands as well
The build command/task needs different behavior for compile_manifest

Then, I think we should open up a separate ticket to take another look at abstracting manifest compilation / graph generation from tasks.

jtcohen6 · 2023-01-22T14:51:02Z

core/dbt/task/run_operation.py

@@ -47,7 +42,7 @@ def _run_unsafe(self) -> agate.Table:

    def run(self) -> RunOperationResultsArtifact:
        start = datetime.utcnow()
-        self._runtime_initialize()
+        self.compile_manifest()


Current behavior: run-operation does not "compile" the manifest, and it does not support node selection, i.e. dbt run-operation --select ... --exclude .... (The former is a prerequisite for the latter.)

There is a compelling proposal for why run-operation should support node selection: #5005. (This might also be a step toward eventually supporting macros that ref() ephemeral models in run-operation, though this change by itself doesn't get us there. That's a long, separate story.)

The only downside is that compilation (= resolving ephemeral models + constructing the DAG) takes time, especially in large projects. There is no caching/reuse/diffing between invocations, in that way that there is for project parsing, so it's a fixed cost every time.

So: I don't think this is a bad change, I just want to call out that it is a change.

core/dbt/cli/requires.py

core/dbt/task/compile.py

jtcohen6 · 2023-01-22T15:10:04Z

core/dbt/cli/main.py

 def parse(ctx, **kwargs):
    """Parses the project and provides information on performance"""
-    click.echo(f"`{inspect.stack()[0][3]}` called\n flags: {ctx.obj['flags']}")
+    # manifest generation and writing happens in @requires.manifeset
    return None, True


The changes in this PR mean that:

dbt parse will always write the manifest (= overwrite manifest.json). IMO that's a good change! I'd opened an issue for it recently: [CT-1759] dbt parse should (over)write manifest.json by default #6534. So we can also remove --write-manifest as a parameter for this command, and in params.py.

We no longer have the option of dbt parse --compile, which allows us to output detailed performance timing that includes the DAG construction step (= resolving ephemeral model references + linking nodes/edges).

For the second point: I'd be happy with kicking that out of scope for this PR, and opening a tech debt ticket to track it. There's a bigger idea here: "Abstract manifest compilation / graph generation from tasks." Idea being, move compile_manifest() out from task initialization, into either a conditional step within @requires.manifest, or as its own @requires.graph step. Then, tasks would accept the graph as an argument.

Considerations there:

Manifest compilation / graph generation does require the adapter, and therefore RuntimeConfig, as an input

It mutates the manifest and returns a Graph

The build task produces + uses a different graph from all other tasks, with extra test edges

If we stored the graph on ctx.obj, and passed it into each task explicitly, we would unlock the ability for programmatic invocations to cache + reuse the graph between invocations. That could make a difference for performance in very large projects. (With the important caveat that the external application would be responsible for distinguishing between standard and build-specific graphs.)

Finally, a separate proposal, out of scope for this PR but highly relevant: The parse task should return the Manifest here, instead of None (#6547). That could be as simple as changing this last line to:

def parse(ctx, **kwargs) -> Tuple[Manifest, bool]: """Parses the project and provides information on performance""" # manifest generation and writing happens in @requires.manifest return ctx.obj["manifest"], True

And then:

from dbt.cli.main import dbtRunner dbt = dbtRunner() manifest, _ = dbt.invoke(['parse'])

There's a bit more refinement we should do on that proposal first, to make sure it's an idea we're happy with.

Thank you for the context. I had spoken with Gerda about the function of the existing ParseTask class, and she had said I could take out the tracking. Adding it back in shouldn't be difficult, and could be done conditionally.

For your proposal of returning the manifest from the task directly, I think that's something we need to discuss on the exec team. I believe we should have a standard output for each of these functions, perhaps a dict with a result or message or whatever key should contain result, in case we want to add meta information to that result object later.

I had spoken with Gerda about the function of the existing ParseTask class, and she had said I could take out the tracking. Adding it back in shouldn't be difficult, and could be done conditionally.

This is fine with me too. Gerda consolidated all the events in this task, anyway, to just ParseCmdOut (in the main branch). We'll just want to remove that event now that it's no longer being called anywhere. For simplicity, let's track that in a separate issue, rather than try to do branch merging shenanigans here.

The point I was making above was around, dbt parse --compile would also conditionally include the step around manifest compilation / graph generation, which can be very slow (as we've seen in recent reports/issues from folks with large projects). Let's track that in the new issue, "Abstract manifest compilation / graph generation from tasks."

For your proposal of returning the manifest from the task directly, I think that's something we need to discuss on the exec team. I believe we should have a standard output for each of these functions, perhaps a dict with a result or message or whatever key should contain result, in case we want to add meta information to that result object later.

Agreed! Let's keep discussing in the separate linked issue. The biggest considerations are:

How to keep the results relatively consistent for all commands (although different tasks already return differently typed result objects)

Should we expose the full manifest, as part of our public API? Just a part of it? Should we call this "experimental" functionality (liable to change in future versions)? We have the power to decide these things!

We might want to call these changes out here:
https://docs.getdbt.com/guides/migration/versions/upgrading-to-v1.5

Did the proposed "Abstract manifest compilation / graph generation from tasks." issue ever get created?

In the meantime, I'm going to delete both of these from params.py as part of #6546 since they don't appear to do anything:

dbt-core/core/dbt/cli/params.py

Lines 44 to 49 in 3f76f82

compile_parse = click.option(

"--compile/--no-compile",

envvar=None,

help="TODO: No help text currently available",

default=True,

)

dbt-core/core/dbt/cli/params.py

Lines 491 to 496 in 3f76f82

write_manifest = click.option(

"--write-manifest/--no-write-manifest",

envvar=None,

help="TODO: No help text currently available",

default=True,

)

core/dbt/cli/main.py

core/dbt/task/build.py

core/dbt/cli/main.py

core/dbt/cli/requires.py

jtcohen6

Nice work @stu-k!

Could I ask you to open two follow-up issues that came up in conversation?

Remove ParseCmdOut, now that the parse command isn’t doing any of its own custom event firing / logging. (Or, conditionally add the logging back.)
Abstracting graph generation into a decorator, rather than a step buried within task initialization. We can plan to tackle it in the next few months, whether as part of "API-ification" (phase 2) or "library-ification" (support for programmatic invocations / dbt-server).

stu-k · 2023-01-24T16:32:35Z

@jtcohen6 Created

MichelleArk · 2023-01-24T23:02:29Z

core/dbt/cli/main.py

 def docs_generate(ctx, **kwargs):
    """Generate the documentation website for your project"""
-    config = RuntimeConfig.from_parts(ctx.obj["project"], ctx.obj["profile"], ctx.obj["flags"])
-    task = GenerateTask(ctx.obj["flags"], config)
+    task = GenerateTask(ctx.obj["flags"], ctx.obj["runtime_config"])


sorry that I'm catching this post-merge but - should GenerateTask be passed the manifest?

I think so, it is inheriting CompileTask, the reason it need manifest is that generate task would go to the warehouse and fetch all info about the models in current dbt project.

Good catch! docs generate also performs a full compile of all manifest nodes — unless --select is passed, in which case only those nodes; or if --no-compile is passed, in which case it doesn't

Looks like the issue is being addressed by @aranke in the cli test work here.

cla-bot bot added the cla:yes label Jan 10, 2023

Abstract manifest generation from tasks

a0a4d5b

stu-k force-pushed the CT-1582/abstract-task-manifest-gen branch from ea5841a to a0a4d5b Compare January 10, 2023 18:28

Add generated CLI API docs

8615b39

ChenyuLInx self-requested a review January 13, 2023 21:00

ChenyuLInx requested changes Jan 13, 2023

View reviewed changes

stu-k added 4 commits January 18, 2023 11:04

decorators, writing, remove env var

c4ac379

Merge branch 'feature/click-cli' into CT-1582/abstract-task-manifest-gen

2237873

Remove parse from legacy main.py

0f1ecad

rebuild docs

e5fea9a

stu-k commented Jan 18, 2023

View reviewed changes

stu-k and others added 3 commits January 18, 2023 14:42

take manifest in task init, write in decorator

592033e

Add generated CLI API docs

b673061

remove unused kwargs to write_manifest

d42de06

ChenyuLInx approved these changes Jan 18, 2023

View reviewed changes

core/dbt/cli/requires.py Show resolved Hide resolved

core/dbt/task/base.py Show resolved Hide resolved

stu-k added 4 commits January 20, 2023 14:46

Refactor RunOperation unit tests

0945988

Merge branch 'feature/click-cli' into CT-1582/abstract-task-manifest-gen

c675dd9

Manifest decorator doesn't overwrite

4a62ada

Regen docs, fix error

22eb4a7

stu-k marked this pull request as ready for review January 20, 2023 21:15

stu-k requested a review from a team January 20, 2023 21:15

stu-k requested review from a team as code owners January 20, 2023 21:15

stu-k requested a review from gshank January 20, 2023 21:15

jtcohen6 requested changes Jan 22, 2023

View reviewed changes

stu-k and others added 5 commits January 23, 2023 12:00

Use manifest decorator, pass into new tasks

a7e778b

Add generated CLI API docs

b6d7ea2

Fix manifest typos

fcf49a9

Add generated CLI API docs

28efb2a

Add compile_manifest back to BuildTask

c1a1fe8

stu-k requested a review from jtcohen6 January 23, 2023 18:18

jtcohen6 approved these changes Jan 24, 2023

View reviewed changes

This was referenced Jan 24, 2023

[CT-1890] Reintroduce logging to manifest generation #6707

Closed

[CT-1891] Abstract graph generation from task classes #6708

Closed

stu-k merged commit 92b7166 into feature/click-cli Jan 24, 2023

stu-k deleted the CT-1582/abstract-task-manifest-gen branch January 24, 2023 17:05

MichelleArk reviewed Jan 24, 2023

View reviewed changes

This was referenced Jan 25, 2023

[CT-1582] [Feature] Refactor tasks to breakout parsing the manifest into a separate piece #6357

Closed

[CT-1759] dbt parse should (over)write manifest.json by default #6534

Closed

This was referenced Jan 31, 2023

[CT-1947] Alias --models to --select for all commands except dbt ls #6787

Merged

Move docs to run on PR merge #6805

Closed

This was referenced Feb 8, 2023

Fix Click CLI test DB Name #6895

Merged

Fix Project Env Var Tests #6916

Merged

Merge feature/click-cli into main #6931

Merged

dbeatty10 mentioned this pull request Jul 19, 2023

[CT-2849] [Bug] Giving key error snowflake #8154

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract manifest generation from tasks #6565

Abstract manifest generation from tasks #6565

stu-k commented Jan 10, 2023 •

edited by dbeatty10

Loading

ChenyuLInx left a comment

stu-k Jan 18, 2023

ChenyuLInx left a comment

jtcohen6 left a comment

jtcohen6 Jan 22, 2023

jtcohen6 Jan 22, 2023

stu-k Jan 23, 2023

jtcohen6 Jan 24, 2023

dbeatty10 Mar 16, 2023

jtcohen6 left a comment

stu-k commented Jan 24, 2023

MichelleArk Jan 24, 2023

ChenyuLInx Jan 24, 2023

jtcohen6 Jan 25, 2023

MichelleArk Jan 25, 2023

	compile_parse = click.option(
	"--compile/--no-compile",
	envvar=None,
	help="TODO: No help text currently available",
	default=True,
	)

	write_manifest = click.option(
	"--write-manifest/--no-write-manifest",
	envvar=None,
	help="TODO: No help text currently available",
	default=True,
	)

Abstract manifest generation from tasks #6565

Abstract manifest generation from tasks #6565

Conversation

stu-k commented Jan 10, 2023 • edited by dbeatty10 Loading

Description

Checklist

ChenyuLInx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChenyuLInx left a comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

stu-k commented Jan 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stu-k commented Jan 10, 2023 •

edited by dbeatty10

Loading