Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: dbt show #7208

Merged
merged 32 commits into from
Apr 4, 2023
Merged

New command: dbt show #7208

merged 32 commits into from
Apr 4, 2023

Conversation

aranke
Copy link
Member

@aranke aranke commented Mar 22, 2023

resolves #7207
resolves #7179
resolves #6359

Description

Checklist

@cla-bot cla-bot bot added the cla:yes label Mar 22, 2023
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@aranke aranke marked this pull request as ready for review March 24, 2023 19:38
@aranke aranke requested a review from a team March 24, 2023 19:38
@aranke aranke requested review from a team as code owners March 24, 2023 19:38
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this!! I took it for a spin and left some comments/questions. Depending on the complexity involved, I'm open to chatting through what's a blocker to merging, and what could be a follow-on UX improvement for later.

}

// Q042
message CompiledNode {
Copy link
Contributor

@jtcohen6 jtcohen6 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we recreating as a new event type (with a new code)? The additional fields aren't a breaking change, right?

I think this will need to be updated to account for the changes we made last week to our event/proto system (#7190) I was mistaken about what had changed in that PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not, I just ran out of space in the numbering and wanted to move them to the right spot before shipping.

Comment on lines +1740 to +1744
if self.output_format == "json":
if self.is_inline:
return json.dumps({"compiled": self.compiled}, indent=2)
else:
return json.dumps({"node": self.node_name, "compiled": self.compiled}, indent=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Not blocking] It makes sense to me that we'd want to support similar arguments for both compile and show, including --output. The current JSON output does feel a bit inconsistent between them:

10:54:04  {
  "node": "my_sql_model",
  "compiled": "select 1 as id"
}
...
10:54:17  Previewing node 'my_sql_model':
[{"id": 1.0}]

I don't have a very strong feeling about what to show here. The most important use case for JSON-formatted output is programmatic consumers of the show result set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
This is a regression, fixed with a new test.

execution_time=0,
message=None,
adapter_response=adapter_response.to_dict(),
agate_table=execute_result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (agate_table) how we'd recommend that programmatic invocations access the result set of the show command?

>>> from dbt.cli.main import dbtRunner
>>> dbt = dbtRunner()
>>> results, success = dbt.invoke(['show', '--select', 'my_sql_model', '--output', 'json'])
>>> results[0].agate_table.print_table()
| id |
| -- |
|  1 |
# do some io.StringIO() hacking to get JSON value

If someone has requested --output json, I think the show task should also return the JSON-formatted output to the caller, one way or another.

(I understand that the result will also be available in the logs, but this feels like something the result object should really contain directly!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, yes :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that illustrates this behavior.

core/dbt/cli/requires.py Outdated Show resolved Hide resolved
core/dbt/cli/main.py Outdated Show resolved Hide resolved
core/dbt/task/show.py Outdated Show resolved Hide resolved
@aranke
Copy link
Member Author

aranke commented Mar 28, 2023

Docs issue created at dbt-labs/docs.getdbt.com#3097

@aranke
Copy link
Member Author

aranke commented Mar 28, 2023

Created a backlog ticket to investigate why the length of the Agate tables are wrong in tests: #7234.

Results are correct during 🎩.

@aranke
Copy link
Member Author

aranke commented Mar 28, 2023

@ChenyuLInx I've confirmed in Jaffle Shop that neither compile nor show create new tables in DuckDB.

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good over all! When I tested locally if I try to show a wide model(has many columns), I would get this

00:20:46  Running with dbt=1.5.0-b4
00:20:46  Found 5 models, 20 tests, 0 snapshots, 0 analyses, 312 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
00:20:46  
00:20:46  Concurrency: 1 threads (target='dev')
00:20:46  
00:20:46  Previewing node 'orders':
| order_id | customer_id | order_date | status    | credit_card_amount | coupon_amount | ... |
| -------- | ----------- | ---------- | --------- | ------------------ | ------------- | --- |
|        1 |           1 | 2018-01-01 | returned  |                 10 |             0 | ... |
|        2 |           3 | 2018-01-02 | completed |                 20 |             0 | ... |
|        3 |          94 | 2018-01-04 | completed |                  0 |             1 | ... |
|        4 |          50 | 2018-01-05 | completed |                  0 |            25 | ... |
|        5 |          64 | 2018-01-05 | completed |                  0 |             0 | ... |```

tests/functional/show/test_show.py Outdated Show resolved Hide resolved
def test_second_ephemeral_model(self, project):
run_dbt(["deps"])
(results, log_output) = run_dbt_and_capture(
["show", "--inline", models__second_ephemeral_model]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding so many ephemeral model tests! Do we have one that tests a model that have a ref to a ephemeral model?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the second ephemeral model is an ephemeral model that references another ephemeral model.

tests/functional/show/fixtures.py Outdated Show resolved Hide resolved
@@ -39,6 +40,7 @@ def test_default(self, project):
assert get_lines("first_model") == ["select 1 as fun"]
assert any("_test_compile as schema" in line for line in get_lines("second_model"))

@pytest.mark.skip("Investigate flaky test #7179")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to prioritize this issue before the 1.5 release, feels like something we would want to have coverage on.


fire_event(CompileComplete())
for result in matched_results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 did we agree on firing one event for each selected node is a good idea? what if we accidentally selected 1000 node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenyuLInx The node name explicitly needs to be in the selector, so this situation is pretty unlikely. e.g., selector my_model+ will be filtered down to just my_model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had my druthers, I'd prefer we do this in a way that kept more of this logic within dbt's selection syntax, e.g. "we only want to support the FQN selector method."

As it is, we're going to apply the selection syntax, and then filter it down to only preview the nodes whose names explicitly appear in the --select arg. This means the show command won't support yaml selectors, tag:-based selection (even if only one node has that tag), etc. We also won't be showing a log message explaining why, if a node is selected, it's not being previewed.

$ dbt show -s tag:doesnt_exist
17:26:05  Running with dbt=1.5.0-b4
17:26:05  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:05  The selection criterion 'tag:doesnt_exist' does not match any nodes
17:26:05
17:26:05  Nothing to do. Try checking your model configs and model specification args
$ dbt show -s tag:one_node_has_this_tag
17:26:20  Running with dbt=1.5.0-b4
17:26:21  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21
17:26:21  Concurrency: 5 threads (target='dev')
17:26:21

In terms of the "happy path" for the show command, the intent is to only show resource(s) explicitly specified in the --select syntax. Could I ask that we at least fire an event here, if a set of nodes is returned from the selection syntax, and then filtered out because the node's name wasn't explicitly passed to --select?

$ dbt show -s tag:one_node_has_this_tag
17:26:20  Running with dbt=1.5.0-b4
17:26:21  Found 1 model, 0 tests, 0 snapshots, 1 analysis, 420 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics, 0 groups
17:26:21  The selection criterion 'tag:one_node_has_this_tag' selected one node, but the 'show' command will only preview models explicitly named in the 'select' argument
17:26:21
17:26:21  Concurrency: 5 threads (target='dev')
17:26:21

If we can do that, then this is fine by me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment addressed? @aranke

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenyuLInx Yes, done now.

Copy link
Contributor

@stu-k stu-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about using the new postflight decorator. Otherwise after looking at the following nothing stands out

  • Command was created in cli/main.py
  • New params were created in cli/params.py
  • Events in events/types.proto look correct
  • Messages for events in events/types.py look correct
  • Tests for both compile and show make sense

Will leave for someone else to approve.

core/dbt/cli/main.py Outdated Show resolved Hide resolved
@aranke
Copy link
Member Author

aranke commented Mar 29, 2023

@ChenyuLInx @jtcohen6 The issue I discussed during standup was due to some bad configuration locally, recreating from scratch seemed to resolve the issue.
After we fire events for excluded nodes, I think this PR is ready to merge?

@aranke aranke requested review from jtcohen6 and ChenyuLInx March 29, 2023 17:52
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two last things, one is related to preview doesn't print all columns when there are many of them(see my last review comment), the other one is related to selection behavior mentioned below.


fire_event(CompileComplete())
for result in matched_results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment addressed? @aranke

@Mathyoub
Copy link
Contributor

When just dbt show is run with no --select or --inline it doesn’t handle it super “gracefully” and returns the whole stack trace. Its a little different than dbt docs where you need to give it another argument but I think we should handle this case more similarly to how docs does.

Examples:

(env) ➜  postgres_jaffle_shop git:(main) ✗ dbt show         
16:23:34  Running with dbt=1.5.0-b4
Traceback (most recent call last):
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/bin/dbt", line 33, in <module>
    sys.exit(load_entry_point('dbt-core==1.5.0b4', 'console_scripts', 'dbt')())
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/requires.py", line 68, in wrapper
    (results, success) = func(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/requires.py", line 127, in wrapper
    return func(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/requires.py", line 149, in wrapper
    return func(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/requires.py", line 175, in wrapper
    return func(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/requires.py", line 210, in wrapper
    return func(*args, **kwargs)
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/cli/main.py", line 335, in show
    results = task.run()
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/task/runnable.py", line 420, in run
    self._runtime_initialize()
  File "/Users/matthewbeall/src/dbt-test-projects/postgres_jaffle_shop/env/lib/python3.9/site-packages/dbt/task/show.py", line 37, in _runtime_initialize
    raise DbtRuntimeError("Either --select or --inline must be passed to show")
dbt.exceptions.DbtRuntimeError: Runtime Error
  Either --select or --inline must be passed to show

vs

(env) ➜  postgres_jaffle_shop git:(main) ✗ dbt docs  
Usage: dbt docs [OPTIONS] COMMAND [ARGS]...

  Generate or serve the documentation website for your project

Options:
  -h, --help  Show this message and exit.

Commands:
  generate  Generate the documentation website for your project
  serve     Serve the documentation website for your project

@aranke aranke requested a review from ChenyuLInx April 4, 2023 19:04
@aranke
Copy link
Member Author

aranke commented Apr 4, 2023

@Mathyoub I played around with this a little, unfortunately there isn't a way to do this without making inline and select sub-commands of show so I'm going to punt on this for now.

Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this live, there't an issue regarding seed node is still included in the show but we can's show it. A follow up issue will be created.

dbt show --select +orders 
19:52:09  Running with dbt=1.5.0-b5
19:52:10  Found 5 models, 20 tests, 0 snapshots, 0 analyses, 309 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
19:52:10  
19:52:10  Concurrency: 3 threads (target='dev')
19:52:10  
Error: Traceback (most recent call last):
  File "/Users/chenyuli/git/dbt-core/core/dbt/cli/requires.py", line 100, in wrapper
    result, success = func(*args, **kwargs)
  File "/Users/chenyuli/git/dbt-core/core/dbt/cli/main.py", line 345, in show
    results = task.run()
  File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 438, in run
    result = self.execute_with_hooks(selected_uids)
  File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 401, in execute_with_hooks
    res = self.execute_nodes()
  File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 350, in execute_nodes
    self.run_queue(pool)
  File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 259, in run_queue
    self._raise_set_error()
  File "/Users/chenyuli/git/dbt-core/core/dbt/task/runnable.py", line 240, in _raise_set_error
    raise self._raise_next_tick
dbt.exceptions.DbtRuntimeError: Runtime Error
  Database Error in seed raw_payments (seeds/raw_payments.csv)
    can't execute an empty query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants