Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: dbt show #7208

Merged
merged 32 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
e6dcca4
New command: dbt show
aranke Mar 22, 2023
f0ad043
merge main
aranke Mar 22, 2023
eab733f
merge main
aranke Mar 24, 2023
d726061
migrate classes to google proto
aranke Mar 24, 2023
bc09cd7
make proto_types
aranke Mar 24, 2023
f28af9c
fix test_events
aranke Mar 24, 2023
2bcb790
get most tests working
aranke Mar 24, 2023
e0db983
write couple tests for interactive preview
aranke Mar 24, 2023
45a7f1a
Merge branch 'main' into command_show
aranke Mar 24, 2023
3133d22
more fixing and refactoring
aranke Mar 24, 2023
09bec45
get everything except ephemeral models working
aranke Mar 24, 2023
e845b37
remove debug print
aranke Mar 24, 2023
ea6b2b0
fix test_events
aranke Mar 24, 2023
f25b44b
preview ephemeral nodes
aranke Mar 24, 2023
2ff263d
add changelog
aranke Mar 24, 2023
b3cc37d
fix style issue in requires.py
aranke Mar 24, 2023
9a6baa9
Address comments from Jerco
aranke Mar 28, 2023
3f36494
Restore core/dbt/cli/requires.py
Mar 28, 2023
f885193
Merge branch 'main' into command_show
aranke Mar 28, 2023
aac83b5
remove numbers from log
aranke Mar 28, 2023
ac008e0
test agate table for length
aranke Mar 28, 2023
5c96420
use table instead of limit_table
aranke Mar 28, 2023
ffc66bd
don't keep null in seed
aranke Mar 28, 2023
e20af84
remove flaky tests
aranke Mar 28, 2023
0bec7c4
don't test json formatting via string
aranke Mar 28, 2023
310ddf3
simplify show tests
aranke Mar 29, 2023
05fcd34
Fire events for excluded nodes
aranke Apr 4, 2023
b576018
merge from main
aranke Apr 4, 2023
c5d4f11
fix proto_types
aranke Apr 4, 2023
b1f29c1
rename: node -> result
aranke Apr 4, 2023
06fd124
fix if/else, log @ debug
aranke Apr 4, 2023
4309fac
fix args lookup
aranke Apr 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20230324-123621.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: 'New command: dbt show'
time: 2023-03-24T12:36:21.173419-07:00
custom:
Author: aranke
Issue: 7207 7179 6359
48 changes: 48 additions & 0 deletions core/dbt/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from dbt.task.debug import DebugTask
from dbt.task.run import RunTask
from dbt.task.serve import ServeTask
from dbt.task.show import ShowTask
from dbt.task.test import TestTask
from dbt.task.snapshot import SnapshotTask
from dbt.task.seed import SeedTask
Expand Down Expand Up @@ -262,6 +263,7 @@ def docs_serve(ctx, **kwargs):
@p.favor_state
@p.deprecated_favor_state
@p.full_refresh
@p.show_output_format
@p.indirect_selection
@p.introspect
@p.parse_only
Expand Down Expand Up @@ -298,6 +300,52 @@ def compile(ctx, **kwargs):
return results, success


# dbt show
@cli.command("show")
@click.pass_context
@p.defer
@p.deprecated_defer
@p.exclude
@p.favor_state
@p.deprecated_favor_state
@p.full_refresh
@p.show_output_format
@p.show_limit
@p.indirect_selection
@p.introspect
@p.parse_only
@p.profile
@p.profiles_dir
@p.project_dir
@p.select
@p.selector
@p.inline
@p.state
@p.deprecated_state
@p.target
@p.target_path
@p.threads
@p.vars
@p.version_check
@requires.preflight
@requires.profile
@requires.project
@requires.runtime_config
@requires.manifest
def show(ctx, **kwargs):
aranke marked this conversation as resolved.
Show resolved Hide resolved
"""Generates executable SQL for a named resource or inline query, runs that SQL, and returns a preview of the
results. Does not materialize anything to the warehouse."""
task = ShowTask(
ctx.obj["flags"],
ctx.obj["runtime_config"],
ctx.obj["manifest"],
)

results = task.run()
success = task.interpret_results(results)
return results, success


# dbt debug
@cli.command("debug")
@click.pass_context
Expand Down
19 changes: 17 additions & 2 deletions core/dbt/cli/params.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from dbt.cli.resolvers import default_project_dir, default_profiles_dir
from dbt.version import get_version_information


args = click.option(
"--args",
envvar=None,
Expand Down Expand Up @@ -187,6 +186,22 @@
default="selector",
)

show_output_format = click.option(
"--output",
envvar=None,
help="Output format for dbt compile and dbt show",
type=click.Choice(["json", "text"], case_sensitive=False),
default="text",
)

show_limit = click.option(
"--limit",
envvar=None,
help="Limit the number of results returned by dbt show",
type=click.INT,
default=5,
)

output_keys = click.option(
"--output-keys",
envvar=None,
Expand Down Expand Up @@ -335,7 +350,7 @@
"type": tuple,
}

inline = click.option("--inline", envvar=None, help="Pass SQL inline to dbt compile and preview")
inline = click.option("--inline", envvar=None, help="Pass SQL inline to dbt compile and show")

# `--select` and `--models` are analogous for most commands except `dbt list` for legacy reasons.
# Most CLI arguments should use the combined `select` option that aliases `--models` to `--select`.
Expand Down
37 changes: 26 additions & 11 deletions core/dbt/events/types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1643,17 +1643,6 @@ message ConcurrencyLineMsg {
ConcurrencyLine data = 2;
}

// Q028
message CompiledNode {
string node_name = 1;
string compiled = 2;
}

message CompiledNodeMsg {
EventInfo info = 1;
CompiledNode data = 2;
}

// Q029
message WritingInjectedSQLForNode {
NodeInfo node_info = 1;
Expand Down Expand Up @@ -1781,6 +1770,32 @@ message CommandCompletedMsg {
CommandCompleted data = 2;
}

// Q041
message ShowNode {
string node_name = 1;
string preview = 2;
bool is_inline = 3;
string output_format = 4;
}

message ShowNodeMsg {
EventInfo info = 1;
ShowNode data = 2;
}

// Q042
message CompiledNode {
Copy link
Contributor

@jtcohen6 jtcohen6 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we recreating as a new event type (with a new code)? The additional fields aren't a breaking change, right?

I think this will need to be updated to account for the changes we made last week to our event/proto system (#7190) I was mistaken about what had changed in that PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not, I just ran out of space in the numbering and wanted to move them to the right spot before shipping.

string node_name = 1;
string compiled = 2;
bool is_inline = 3;
string output_format = 4;
}

message CompiledNodeMsg {
EventInfo info = 1;
CompiledNode data = 2;
}

// W - Node testing

// Skipped W001
Expand Down
46 changes: 38 additions & 8 deletions core/dbt/events/types.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import json

from dbt.ui import line_wrap_message, warning_tag, red, green, yellow
from dbt.constants import MAXIMUM_SEED_SIZE_NAME, PIN_PACKAGE_URL
from dbt.events.base_types import (
Expand Down Expand Up @@ -1605,14 +1607,6 @@ def message(self) -> str:
return f"Concurrency: {self.num_threads} threads (target='{self.target_name}')"


class CompiledNode(InfoLevel):
def code(self):
return "Q028"

def message(self) -> str:
return f"Compiled node '{self.node_name}' is:\n{self.compiled}"


class WritingInjectedSQLForNode(DebugLevel):
def code(self):
return "Q029"
Expand Down Expand Up @@ -1719,6 +1713,42 @@ def message(self) -> str:
return f"Command `{self.command}` {status} at {self.completed_at} after {self.elapsed:0.2f} seconds"


class ShowNode(InfoLevel):
def code(self):
return "Q041"

def message(self) -> str:
if self.output_format == "json":
if self.is_inline:
return json.dumps({"show": json.loads(self.preview)}, indent=2)
else:
return json.dumps(
{"node": self.node_name, "show": json.loads(self.preview)}, indent=2
)
else:
if self.is_inline:
return f"Previewing inline node:\n{self.preview}"
else:
return f"Previewing node '{self.node_name}':\n{self.preview}"


class CompiledNode(InfoLevel):
def code(self):
return "Q042"

def message(self) -> str:
if self.output_format == "json":
if self.is_inline:
return json.dumps({"compiled": self.compiled}, indent=2)
else:
return json.dumps({"node": self.node_name, "compiled": self.compiled}, indent=2)
Comment on lines +1740 to +1744
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Not blocking] It makes sense to me that we'd want to support similar arguments for both compile and show, including --output. The current JSON output does feel a bit inconsistent between them:

10:54:04  {
  "node": "my_sql_model",
  "compiled": "select 1 as id"
}
...
10:54:17  Previewing node 'my_sql_model':
[{"id": 1.0}]

I don't have a very strong feeling about what to show here. The most important use case for JSON-formatted output is programmatic consumers of the show result set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
This is a regression, fixed with a new test.

else:
if self.is_inline:
return f"Compiled inline node is:\n{self.compiled}"
else:
return f"Compiled node '{self.node_name}' is:\n{self.compiled}"


# =======================================================
# W - Node testing
# =======================================================
Expand Down
498 changes: 251 additions & 247 deletions core/dbt/events/types_pb2.py

Large diffs are not rendered by default.

35 changes: 18 additions & 17 deletions core/dbt/task/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@
import traceback
from abc import ABCMeta, abstractmethod
from contextlib import nullcontext
from typing import Type, Union, Dict, Any, Optional
from datetime import datetime
from typing import Type, Union, Dict, Any, Optional

import dbt.exceptions
from dbt import tracking
from dbt.flags import get_flags
from dbt.adapters.factory import get_adapter
from dbt.config import RuntimeConfig, Project
from dbt.config.profile import read_profile
from dbt.contracts.graph.manifest import Manifest
from dbt.contracts.results import (
NodeStatus,
Expand All @@ -17,13 +20,7 @@
RunStatus,
RunningStatus,
)
from dbt.exceptions import (
NotImplementedError,
CompilationError,
DbtRuntimeError,
DbtInternalError,
)
from dbt.logger import log_manager
from dbt.events.contextvars import get_node_info
from dbt.events.functions import fire_event
from dbt.events.types import (
LogDbtProjectError,
Expand All @@ -38,14 +35,16 @@
NodeCompiling,
NodeExecuting,
)
from dbt.events.contextvars import get_node_info
from .printer import print_run_result_error

from dbt.adapters.factory import get_adapter
from dbt.config import RuntimeConfig, Project
from dbt.config.profile import read_profile
import dbt.exceptions
from dbt.exceptions import (
NotImplementedError,
CompilationError,
DbtRuntimeError,
DbtInternalError,
)
from dbt.flags import get_flags
from dbt.graph import Graph
from dbt.logger import log_manager
from .printer import print_run_result_error


class NoneConfig:
Expand Down Expand Up @@ -204,6 +203,8 @@ def __init__(self, config, adapter, node, node_index, num_nodes):
self.skip = False
self.skip_cause: Optional[RunResult] = None

self.run_ephemeral_models = False

@abstractmethod
def compile(self, manifest: Manifest) -> Any:
pass
Expand Down Expand Up @@ -325,7 +326,7 @@ def compile_and_execute(self, manifest, ctx):
ctx.timing.append(timing_info)

# for ephemeral nodes, we only want to compile, not run
if not ctx.node.is_ephemeral_model:
if not ctx.node.is_ephemeral_model or self.run_ephemeral_models:
ctx.node.update_event_status(node_status=RunningStatus.Executing)
fire_event(
NodeExecuting(
Expand Down
42 changes: 28 additions & 14 deletions core/dbt/task/compile.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from dbt.contracts.graph.manifest import WritableManifest
from dbt.contracts.results import RunStatus, RunResult
from dbt.events.functions import fire_event
from dbt.events.types import CompileComplete, CompiledNode
from dbt.events.types import CompiledNode
from dbt.exceptions import DbtInternalError, DbtRuntimeError
from dbt.graph import ResourceTypeSelector
from dbt.node_types import NodeType
Expand Down Expand Up @@ -61,23 +61,27 @@ def get_runner_type(self, _):
return CompileRunner

def task_end_messages(self, results):
if getattr(self.args, "inline", None):
result = results[0]
fire_event(
CompiledNode(node_name=result.node.name, compiled=result.node.compiled_code)
)
is_inline = bool(getattr(self.args, "inline", None))

if self.selection_arg:
if is_inline:
matched_results = [result for result in results if result.node.name == "inline_query"]
elif self.selection_arg:
matched_results = [
result for result in results if result.node.name == self.selection_arg[0]
result for result in results if result.node.name in self.selection_arg
]
if len(matched_results) == 1:
result = matched_results[0]
fire_event(
CompiledNode(node_name=result.node.name, compiled=result.node.compiled_code)
)
# No selector passed, compiling all nodes
else:
matched_results = []

fire_event(CompileComplete())
for result in matched_results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 did we agree on firing one event for each selected node is a good idea? what if we accidentally selected 1000 node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenyuLInx The node name explicitly needs to be in the selector, so this situation is pretty unlikely. e.g., selector my_model+ will be filtered down to just my_model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had my druthers, I'd prefer we do this in a way that kept more of this logic within dbt's selection syntax, e.g. "we only want to support the FQN selector method."

As it is, we're going to apply the selection syntax, and then filter it down to only preview the nodes whose names explicitly appear in the --select arg. This means the show command won't support yaml selectors, tag:-based selection (even if only one node has that tag), etc. We also won't be showing a log message explaining why, if a node is selected, it's not being previewed.

$ dbt show -s tag:doesnt_exist
17:26:05  Running with dbt=1.5.0-b4
17:26:05  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:05  The selection criterion 'tag:doesnt_exist' does not match any nodes
17:26:05
17:26:05  Nothing to do. Try checking your model configs and model specification args
$ dbt show -s tag:one_node_has_this_tag
17:26:20  Running with dbt=1.5.0-b4
17:26:21  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21
17:26:21  Concurrency: 5 threads (target='dev')
17:26:21

In terms of the "happy path" for the show command, the intent is to only show resource(s) explicitly specified in the --select syntax. Could I ask that we at least fire an event here, if a set of nodes is returned from the selection syntax, and then filtered out because the node's name wasn't explicitly passed to --select?

$ dbt show -s tag:one_node_has_this_tag
17:26:20  Running with dbt=1.5.0-b4
17:26:21  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21  The selection criterion 'tag:one_node_has_this_tag' selected one node, but the 'show' command will only preview models explicitly named in the 'select' argument
17:26:21
17:26:21  Concurrency: 5 threads (target='dev')
17:26:21

If we can do that, then this is fine by me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment addressed? @aranke

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenyuLInx Yes, done now.

fire_event(
CompiledNode(
node_name=result.node.name,
compiled=result.node.compiled_code,
is_inline=is_inline,
output_format=self.args.output,
)
)

def _get_deferred_manifest(self) -> Optional[WritableManifest]:
if not self.args.defer:
Expand Down Expand Up @@ -119,3 +123,13 @@ def _runtime_initialize(self):
process_node(self.config, self.manifest, sql_node)

super()._runtime_initialize()

def _handle_result(self, result):
super()._handle_result(result)

if (
result.node.is_ephemeral_model
and type(self) is CompileTask
and (self.args.select or getattr(self.args, "inline", None))
):
self.node_results.append(result)
Loading