[CT-1585] [Feature] New top level commands: interactive preview #6359

ChenyuLInx · 2022-12-02T01:43:13Z

Describe the feature

See #6358 for more background of what interactive means.

This one is for supporting interactive preview function as a native function for dbt cli. Example of how this happens in dbt-rpc and dbt-server(link to actual code)

AC:
See last comment

jtcohen6 · 2022-12-02T11:16:35Z

A lot of community interest in this one! :) #5418, #6087, #6090

Mostly the same comments as in #6358 (comment). I do think a totally new command for this one makes sense. It would work much the same as interactive compile, with the added step of actually executing the compiled query.

$ dbt preview --code <base64encoded>

From the CLI perspective, this should feel much the same as:

-- macros/preview.sql
{% macro preview(code) %}
  {% set result = run_query(code) %}
  {% do result.print_table() %}
{% endmacro %}

$ dbt run-operation preview --args '{"code": "select 1 as id, True as is_odd union all select 2, False"}'
| id | is_odd |
| -- | ------ |
|  1 |   True |
|  2 |  False |

formatting + passing the result

In the existing implementation, execute_result returns an agate.Table, and then we turn it into a simpler ResultTable Python object. My inclination here is that, on the CLI, we'd send the result to stdout, with the format depending on the --log-format:

Text format: agate.Table.print_table() (GitHub-flavored Markdown), though capturing the output rather than sending it straight to sys.stdout.
- Here's a fun one: did you know there's a dbt seed --show? The way we implemented that (a long time ago) is pretty funny :)
JSON format: either agate.Table.print_json(), or our own JSON serialization of the ResultTable object
Binary format: ~~...hm...~~ let's ignore for now

For programmatic consumers (dbt-server), should we return the ResultTable directly as a Python object? Include a serialized representation in a structured log event? Is there a way we could proto-tize that result object, to maximize interoperability? (Is it time for us to finally roll our own "dataframe" helper for result sets, and replace agate?)

limiting result set

What are the arguments we want to support for this? # of lines?

Hm. I'm not 100% sure that this should be in dbt-core, but it does feel like it could be a place to enact a configurable limit to the result set. I'd be curious to hear what folks like @bossnunta @isaac-taylor @davidharting think.

The default should either be:

something reasonable (--limit 500)
nothing at all; dbt doesn't try to limit the result set unless passed an explicit configuration

Options on how to implement:

Modify SQL! Add limit 500 to the end of queries — or detect if they contain their own limit clause, in which case, don't add a limit clause.
- Of course, this makes sense for SQL, but not for other languages! We could add as an optional arg to the statement macro, and attempt some conditional logic based on language. Python dataframes do support methods like .take(<size>) or .head(<size>), and we could try passing that into submit_python_job (which would also need a refactor to support "interactive" queries, never mind how slow it would be on some platforms).
Add a new argument to adapter.execute: adapter.execute(<code>, fetch=True, limit=500). Then, in each database adapter, we plumb this through to connections.execute. Many cursors support fetchmany(size=<int>) (e.g. psycopg2, snowflake-connector-python), and there's a natural spot for it here.
- This feels like the surest way to go, and least error-prone
- But if a user really did want >500 rows, we'd need to either do SQL parsing to detect that their query includes limit 1000 (as above), or provide another means to pass in that information, e.g. in the dbt Cloud IDE.

Let's keep thinking through this part. I think it can be out of scope for the initial first-pass implementation.

isaac-taylor · 2022-12-02T14:44:57Z

Hm. I'm not 100% sure that this should be in dbt-core, but it does feel like it could be a place to enact a configurable limit to the result set. I'd be curious to hear what folks like @bossnunta @isaac-taylor @davidharting think.

In my head, the question comes down to 'should we modify user code directly?' And if so (the IDE does today), where in the stack should that happen.

My guess is Core is the right place long term. That will make extending interactive preview/compile support to Python (if that's even doable) a bit easier, as we won't have multiple components needing to understand user code. I'm thinking the IDE shouldn't know about anything related to dbt code parsing itself.

jtcohen6 · 2022-12-05T09:32:16Z

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

isaac-taylor · 2022-12-05T21:35:35Z

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

Seems like just a missing feature from 1.1 that we had in 1.0. Very doable from a technical perspective. It's about UX and prioritization I think Let me bring that one up to Nate to see if that's on his radar.

bossnunta · 2022-12-05T22:08:55Z

@isaac-taylor Agreed. We can discuss this more offline — what's your + @bossnunta's appetite to have a "Limit [500]" checkbox in the IDE again? That's probably the right UX, if what we're doing is passing a --limit [500] argument into the top-level dbt-core command (programmatically via dbt-server).

I think having a version of "limit [500]" as a UX (and/or centralized admin configuration) is a very good approach here. I don't believe we had that in IDE v1.0 either. I shared notes on this in Slack. (What we do in both 1.0 and 1.1 is the default to 500 rows that is overridden when you have a limit clause).

However, from product perspective, the IDE shouldn't really be doing sql parsing to deal with this in my opinion. Ideally we could do this with dbt-server (or maybe Core given that it's about parsing?)

jtcohen6 · 2022-12-06T14:57:54Z

However, from product perspective, the IDE shouldn't really be doing sql parsing to deal with this in my opinion. Ideally we could do this with dbt-server (or maybe Core given that it's about parsing?)

If we have a checkbox, we don't need to do SQL parsing at all, anywhere. Ideal!

It will be on the user to either:

put their own number in the "limit" checkbox
uncheck the box and add their own custom limit <number> within their query

davidharting · 2022-12-13T19:30:16Z

Just for historical record, we did at one time have a "limit 500" checkbox in cloud IDE v1.0, until we merged https://github.com/dbt-labs/dbt-cloud/pull/2284. Looking at the PR and the issue it closes, it is lost to history precisely why we made this change. I do know that it was unpopular with some folks though and I am more than happy to go back to a checkbox style approach.

In my memory, the biggest reason for the change was the amount of UI real estate that the checkbox took up. It seemed to mismatch how "primary" we thought this functionality should be.

jtcohen6 · 2023-01-05T11:20:49Z

As in #6358 (comment), we'd also want the ability to "preview" a specific model, within the context of that model, to correctly resolve any instances of (e.g.) {{ this }} or {{ is_incremental() }}:

$ dbt preview --select specific_model

The idea being, we compile specific_model (versus arbitrary Jinja-SQL), and then preview its compiled_code. I'm using --select above for consistency, but it obviously wouldn't make sense for more than one node to be selected for preview. That's a check we can add ("must select only one node"), or we can use a different flag/argument for clarity.

Thinking about "compile" and "preview" in the dbt Cloud IDE: Who should be responsible for determining if the user is compiling/preview a specific model in the project, versus some arbitrary Jinja-SQL in a scratchpad? Should the IDE detect that the file open is an actual saved file in the project? Should it pass along the file name, in addition to file contents, and dbt-core/dbt-server/someone detects that the file name matches a manifest node, and use the code to update the manifest node? This gets trickier if we think about a user compiling/preview unsaved code changes that correspond to a manifest-registered node.

davidharting · 2023-01-05T14:29:00Z

Who should be responsible for determining if the user is compiling/preview a specific model in the project, versus some arbitrary Jinja-SQL in a scratchpad?

The way I would naively expect this to work is for dbt itself to support two different mutually exclusive flags for the two use cases. --select for existing models and --query (or something like that) to pass in arbitrary sql or python (base 64 encoded probably?).

Then I would expect it to be the job of clients to determine which case they are in and use the appropriate flag.

Does that seem plausible?

b-per · 2023-01-16T12:55:57Z

I'd agree with the last comment of 2 different flags/options.

In BigQuery, the--select option could use the free Preview feature (e.g. with bq head) rather than running a query that would scan data and cost money.

jtcohen6 · 2023-01-16T13:19:54Z

Thanks @b-per! Two comments that came up in our related conversation just now:

In cases where users are just trying to preview an already-existing table, we could try doing a "free" preview of that table, instead of actually submitting select * from {{ table }} as a standalone query. This would require us to understand if the table already exists, if its query logic has been updated since it was last built, ...
BQ makes it possible to estimate the cost of a query before running it. The closest analogue on other databases would be to return an EXPLAIN plan. Could we expose this somehow? dbt preview --dry-run? dbt preview --explain? Maybe this is actually the right "workflow" for explaining queries (= estimating cost/complexity), first imagined in this quite old issue: workflow for explain queries? #401

To be clear, I think both of those should be out of scope for the initial implementation (this issue). But it's good to have these potential (adapter-specific) extensions in the back of our minds, while doing the work to add + expose this foundational capability.

jtcohen6 · 2023-01-20T20:01:34Z

Not something addressed previously in this issue, and definitely out of scope for the first iteration: Previewing transformations written in Python. On some platforms, this would require writing the final df.head(500) to a temp table, and then select * from that temp table.

How to know if it's Python?

In the case of dbt preview --select specific_model, we'd be able to inspect that model's language attribute.
Should preview --code also accept a --language argument?

lostmygithubaccount · 2023-01-23T02:23:16Z

@jtcohen6 it's not uncommon for enterprise users to request the ability to disable preview in the IDE. would it be worth having a project-level setting with override flag we pass through to dbt Server and into Cloud? then they could have that as an option and always pass the override into their dbt commands

jtcohen6 · 2023-01-23T09:15:29Z

@lostmygithubaccount To me, dbt_project.yml doesn't feel like the right place to define that, versus an account-level option within dbt Cloud access administration. I'd expect that setting to flow from Cloud app config into Runtime + IDE. I suspect we don't want to show a "Preview" button at all in the IDE if it's disabled, rather than having everything appear the same but then raising a dbt-core error.

iknox-fa · 2023-02-06T20:17:05Z

Per BLG 2/6/23
@ChenyuLInx is going to check with the IDE team to verify whether or not a run is needed to reflect the current state of the selected model.

@aranke What is the desired behavior for CLI usage wrt above, can you double check it with Doug / Jeremy?

ChenyuLInx · 2023-03-15T16:27:59Z

Run is not needed.

aranke · 2023-03-16T20:21:14Z

Acceptance criteria for ticket:

Support both inline and model execution similar to interactive compile.
Support both command-line table printing and JSON output with flag output --json
Limit the number of rows to show with the --limit flag.
Use 5 by default to copy Pandas head behavior.

Nice to have:

Deprecate the --show option for dbt seed and dbt build in favor of dbt preview cc @jtcohen6

ChenyuLInx added enhancement New feature or request triage labels Dec 2, 2022

github-actions bot changed the title ~~[Feature] [Feature] New top level commands: interactive preview~~ [CT-1585] [Feature] [Feature] New top level commands: interactive preview Dec 2, 2022

ChenyuLInx mentioned this issue Dec 2, 2022

[CT-1581] [Epic] dbt-core as a library: first steps #6356

Closed

23 tasks

ChenyuLInx changed the title ~~[CT-1585] [Feature] [Feature] New top level commands: interactive preview~~ [CT-1585] [Feature] New top level commands: interactive preview Dec 2, 2022

dbeatty10 added Refinement Maintainer input needed and removed triage labels Dec 2, 2022

jtcohen6 added python_api Issues related to dbtRunner Python entry point Team:Execution and removed Refinement Maintainer input needed labels Dec 2, 2022

jtcohen6 mentioned this issue Jan 5, 2023

[CT-1751] Config to optionally skip population of relation cache #6526

Closed

lostmygithubaccount mentioned this issue Jan 10, 2023

[CT-1361] dbt CLI preview command #6087

Closed

3 tasks

leahwicz assigned aranke Jan 31, 2023

jtcohen6 mentioned this issue Mar 13, 2023

[CT-2301] [Epic] API-ification: outstanding tasks for v1.5 #7162

Closed

aranke mentioned this issue Mar 24, 2023

New command: dbt show #7208

Merged

6 tasks

aranke closed this as completed in #7208 Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1585] [Feature] New top level commands: interactive preview #6359

[CT-1585] [Feature] New top level commands: interactive preview #6359

ChenyuLInx commented Dec 2, 2022 •

edited

Loading

jtcohen6 commented Dec 2, 2022 •

edited

Loading

isaac-taylor commented Dec 2, 2022

jtcohen6 commented Dec 5, 2022

isaac-taylor commented Dec 5, 2022

bossnunta commented Dec 5, 2022

jtcohen6 commented Dec 6, 2022

davidharting commented Dec 13, 2022 •

edited

Loading

jtcohen6 commented Jan 5, 2023

davidharting commented Jan 5, 2023

b-per commented Jan 16, 2023

jtcohen6 commented Jan 16, 2023 •

edited

Loading

jtcohen6 commented Jan 20, 2023

lostmygithubaccount commented Jan 23, 2023

jtcohen6 commented Jan 23, 2023

iknox-fa commented Feb 6, 2023 •

edited

Loading

ChenyuLInx commented Mar 15, 2023

aranke commented Mar 16, 2023 •

edited by jtcohen6

Loading

[CT-1585] [Feature] New top level commands: interactive preview #6359

[CT-1585] [Feature] New top level commands: interactive preview #6359

Comments

ChenyuLInx commented Dec 2, 2022 • edited Loading

Describe the feature

jtcohen6 commented Dec 2, 2022 • edited Loading

formatting + passing the result

limiting result set

isaac-taylor commented Dec 2, 2022

jtcohen6 commented Dec 5, 2022

isaac-taylor commented Dec 5, 2022

bossnunta commented Dec 5, 2022

jtcohen6 commented Dec 6, 2022

davidharting commented Dec 13, 2022 • edited Loading

jtcohen6 commented Jan 5, 2023

davidharting commented Jan 5, 2023

b-per commented Jan 16, 2023

jtcohen6 commented Jan 16, 2023 • edited Loading

jtcohen6 commented Jan 20, 2023

lostmygithubaccount commented Jan 23, 2023

jtcohen6 commented Jan 23, 2023

iknox-fa commented Feb 6, 2023 • edited Loading

ChenyuLInx commented Mar 15, 2023

aranke commented Mar 16, 2023 • edited by jtcohen6 Loading

ChenyuLInx commented Dec 2, 2022 •

edited

Loading

jtcohen6 commented Dec 2, 2022 •

edited

Loading

davidharting commented Dec 13, 2022 •

edited

Loading

jtcohen6 commented Jan 16, 2023 •

edited

Loading

iknox-fa commented Feb 6, 2023 •

edited

Loading

aranke commented Mar 16, 2023 •

edited by jtcohen6

Loading