-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New command: dbt show #7208
New command: dbt show #7208
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this!! I took it for a spin and left some comments/questions. Depending on the complexity involved, I'm open to chatting through what's a blocker to merging, and what could be a follow-on UX improvement for later.
} | ||
|
||
// Q042 | ||
message CompiledNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we recreating as a new event type (with a new code)? The additional fields aren't a breaking change, right?
I think this will need to be updated to account for the changes we made last week to our event/proto system (#7190) I was mistaken about what had changed in that PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not, I just ran out of space in the numbering and wanted to move them to the right spot before shipping.
if self.output_format == "json": | ||
if self.is_inline: | ||
return json.dumps({"compiled": self.compiled}, indent=2) | ||
else: | ||
return json.dumps({"node": self.node_name, "compiled": self.compiled}, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Not blocking] It makes sense to me that we'd want to support similar arguments for both compile
and show
, including --output
. The current JSON output does feel a bit inconsistent between them:
10:54:04 {
"node": "my_sql_model",
"compiled": "select 1 as id"
}
...
10:54:17 Previewing node 'my_sql_model':
[{"id": 1.0}]
I don't have a very strong feeling about what to show here. The most important use case for JSON-formatted output is programmatic consumers of the show
result set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
This is a regression, fixed with a new test.
execution_time=0, | ||
message=None, | ||
adapter_response=adapter_response.to_dict(), | ||
agate_table=execute_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this (agate_table
) how we'd recommend that programmatic invocations access the result set of the show
command?
>>> from dbt.cli.main import dbtRunner
>>> dbt = dbtRunner()
>>> results, success = dbt.invoke(['show', '--select', 'my_sql_model', '--output', 'json'])
>>> results[0].agate_table.print_table()
| id |
| -- |
| 1 |
# do some io.StringIO() hacking to get JSON value
If someone has requested --output json
, I think the show
task should also return the JSON-formatted output to the caller, one way or another.
(I understand that the result will also be available in the logs, but this feels like something the result object should really contain directly!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, yes :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that illustrates this behavior.
Docs issue created at dbt-labs/docs.getdbt.com#3097 |
Created a backlog ticket to investigate why the length of the Agate tables are wrong in tests: #7234. Results are correct during 🎩. |
@ChenyuLInx I've confirmed in Jaffle Shop that neither |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good over all! When I tested locally if I try to show a wide model(has many columns), I would get this
00:20:46 Running with dbt=1.5.0-b4
00:20:46 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 312 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
00:20:46
00:20:46 Concurrency: 1 threads (target='dev')
00:20:46
00:20:46 Previewing node 'orders':
| order_id | customer_id | order_date | status | credit_card_amount | coupon_amount | ... |
| -------- | ----------- | ---------- | --------- | ------------------ | ------------- | --- |
| 1 | 1 | 2018-01-01 | returned | 10 | 0 | ... |
| 2 | 3 | 2018-01-02 | completed | 20 | 0 | ... |
| 3 | 94 | 2018-01-04 | completed | 0 | 1 | ... |
| 4 | 50 | 2018-01-05 | completed | 0 | 25 | ... |
| 5 | 64 | 2018-01-05 | completed | 0 | 0 | ... |```
def test_second_ephemeral_model(self, project): | ||
run_dbt(["deps"]) | ||
(results, log_output) = run_dbt_and_capture( | ||
["show", "--inline", models__second_ephemeral_model] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding so many ephemeral model tests! Do we have one that tests a model that have a ref to a ephemeral model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, the second ephemeral model is an ephemeral model that references another ephemeral model.
@@ -39,6 +40,7 @@ def test_default(self, project): | |||
assert get_lines("first_model") == ["select 1 as fun"] | |||
assert any("_test_compile as schema" in line for line in get_lines("second_model")) | |||
|
|||
@pytest.mark.skip("Investigate flaky test #7179") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to prioritize this issue before the 1.5 release, feels like something we would want to have coverage on.
|
||
fire_event(CompileComplete()) | ||
for result in matched_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 did we agree on firing one event for each selected node is a good idea? what if we accidentally selected 1000 node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenyuLInx The node name explicitly needs to be in the selector, so this situation is pretty unlikely. e.g., selector my_model+
will be filtered down to just my_model
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had my druthers, I'd prefer we do this in a way that kept more of this logic within dbt's selection syntax, e.g. "we only want to support the FQN
selector method."
As it is, we're going to apply the selection syntax, and then filter it down to only preview the nodes whose names explicitly appear in the --select
arg. This means the show
command won't support yaml selectors, tag:
-based selection (even if only one node has that tag), etc. We also won't be showing a log message explaining why, if a node is selected, it's not being previewed.
$ dbt show -s tag:doesnt_exist
17:26:05 Running with dbt=1.5.0-b4
17:26:05 Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:05 The selection criterion 'tag:doesnt_exist' does not match any nodes
17:26:05
17:26:05 Nothing to do. Try checking your model configs and model specification args
$ dbt show -s tag:one_node_has_this_tag
17:26:20 Running with dbt=1.5.0-b4
17:26:21 Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21
17:26:21 Concurrency: 5 threads (target='dev')
17:26:21
In terms of the "happy path" for the show
command, the intent is to only show
resource(s) explicitly specified in the --select
syntax. Could I ask that we at least fire an event here, if a set of nodes is returned from the selection syntax, and then filtered out because the node's name wasn't explicitly passed to --select
?
$ dbt show -s tag:one_node_has_this_tag
17:26:20 Running with dbt=1.5.0-b4
17:26:21 Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21 The selection criterion 'tag:one_node_has_this_tag' selected one node, but the 'show' command will only preview models explicitly named in the 'select' argument
17:26:21
17:26:21 Concurrency: 5 threads (target='dev')
17:26:21
If we can do that, then this is fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment addressed? @aranke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenyuLInx Yes, done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about using the new postflight
decorator. Otherwise after looking at the following nothing stands out
- Command was created in
cli/main.py
- New params were created in
cli/params.py
- Events in
events/types.proto
look correct - Messages for events in
events/types.py
look correct - Tests for both compile and show make sense
Will leave for someone else to approve.
@ChenyuLInx @jtcohen6 The issue I discussed during standup was due to some bad configuration locally, recreating from scratch seemed to resolve the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two last things, one is related to preview doesn't print all columns when there are many of them(see my last review comment), the other one is related to selection behavior mentioned below.
|
||
fire_event(CompileComplete()) | ||
for result in matched_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment addressed? @aranke
When just Examples:
vs
|
@Mathyoub I played around with this a little, unfortunately there isn't a way to do this without making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this live, there't an issue regarding seed node is still included in the show but we can's show it. A follow up issue will be created.
dbt show --select +orders
19:52:09 Running with dbt=1.5.0-b5
19:52:10 Found 5 models, 20 tests, 0 snapshots, 0 analyses, 309 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
19:52:10
19:52:10 Concurrency: 3 threads (target='dev')
19:52:10
Error: Traceback (most recent call last):
File "/Users/chenyuli/git/dbt-core/core/dbt/cli/requires.py", line 100, in wrapper
result, success = func(*args, **kwargs)
File "/Users/chenyuli/git/dbt-core/core/dbt/cli/main.py", line 345, in show
results = task.run()
File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 438, in run
result = self.execute_with_hooks(selected_uids)
File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 401, in execute_with_hooks
res = self.execute_nodes()
File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 350, in execute_nodes
self.run_queue(pool)
File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 259, in run_queue
self._raise_set_error()
File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 240, in _raise_set_error
raise self._raise_next_tick
dbt.exceptions.DbtRuntimeError: Runtime Error
Database Error in seed raw_payments (seeds/raw_payments.csv)
can't execute an empty query
resolves #7207
resolves #7179
resolves #6359
Description
Checklist
changie new
to create a changelog entry