Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2429] dbt show: option to use already-materialized model instead of rerunning SQL #7391

Open
jtcohen6 opened this issue Apr 18, 2023 · 9 comments
Labels
enhancement New feature or request Refinement Maintainer input needed

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 18, 2023

As a dbt developer, especially on BigQuery, I want to preview my just-built table in the fastest & cheapest way possible.


When users run dbt show --select my_model, rather than recompiling and executing that model's SQL, give users the option to preview the already-materialized version of my_model in the data warehouse.

This goes beyond just --inline "select * from {{ ref('that_model') }}", because it would also unlock adapter-specific optimizations.

Options:

  • Config, --from-materialized (I hate this name)
  • Different command: dbt head, to match bq head?

In the short term, it would be the responsibility of users to track logical/applied state, and do their own "cache invalidation" if they've edited the model's definition since they last built it. (In the longer term, we could get cleverer about this in stateful runtime environments.)

This would be functionally identical for views and ephemeral models.

@jtcohen6 jtcohen6 added enhancement New feature or request Team:Execution Team:Adapters Issues designated for the adapter area of the code labels Apr 18, 2023
@github-actions github-actions bot changed the title dbt show: option to use already-materialized model instead of rerunning SQL [CT-2429] dbt show: option to use already-materialized model instead of rerunning SQL Apr 18, 2023
@b-per
Copy link
Contributor

b-per commented Apr 18, 2023

Throwing other options for the config name:

  • --existing
  • --last-refreshed
  • --materialized
  • --last-materialized

@jtcohen6
Copy link
Contributor Author

In theory, this would also work for Python models — build first, then preview.

Another option is to have this command/config implicitly materialize the model first: run it, then show it. I'd rather the building be explicit, though — and then provide the option of showing the just-built thing.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 17, 2024
@b-per
Copy link
Contributor

b-per commented Jan 17, 2024

I'd love for this to happen! This could be very useful to save costs when querying data and avoid having people to go to their Data Warehouse interface to do so.

@github-actions github-actions bot removed the stale Issues that have gone stale label Jan 18, 2024
@mikealfare mikealfare removed the Team:Adapters Issues designated for the adapter area of the code label Feb 13, 2024
@dbeatty10
Copy link
Contributor

New config vs. different command

Of the two options, I prefer a different command (like described in #10265).

e.g. I greatly prefer this:

dbt head --select my_model

Rather than:

dbt show --select my_model --head

Advantages

Creating a new command has the advantage of being twice as easy to type out. It would also handle situations that dbt show doesn't currently:

  • Python models
  • sources
  • snapshots
  • incremental models
  • etc.

Trade-offs

Unless we did some fancy footwork, it wouldn't work in any cases where dbt doesn't materialize a database object though:

  • ephemeral models
  • analyses
  • etc.

e.g., if we just do a direct select * from {{ ref('that_model') }}, then new command (or CLI flag) would work IFF the node type "exists in database". But we can imagine adding some logic to handle the case where there won't be an object in the database (because it is ephemeral, etc.) by just doing the same thing as the current dbt show in those specific cases.

@pransito
Copy link

I think this is a very good idea to allow dbt show to do more

dbt show with these flags would be great!

also these flags would be helpful:

--to-file "my_output.parquet" --file-output-format "parquet"

@maxmue
Copy link

maxmue commented Aug 29, 2024

It's off-topic, but how do I make dbt show (latest cloud cli) show all columns instead of omitting many while just showing ...?

@dbeatty10
Copy link
Contributor

@maxmue to see all the columns, I use the --output json option which will output a JSON format of one line per column (rather than one line per row).

$ dbt show -s table_x --output json
14:58:28  Running with dbt=1.8.5
14:58:31  Registered adapter: duckdb=1.8.3
14:58:31  Found 4 models, 410 macros, 1 unit test
14:58:31  
14:58:32  Concurrency: 1 threads (target='dev')
14:58:32  
14:58:32  {
  "node": "table_x",
  "show": [
    {
      "starship": "USS Voyager",
      "quote": "There's coffee in that nebula"
    }
  ]
}

If you're hoping for some way to keep the same tabular format but without the ..., please open a feature request here.

@dbeatty10
Copy link
Contributor

@pransito

also these flags would be helpful:

--to-file "my_output.parquet" --file-output-format "parquet"

In general, this would need to be a separate feature request.

But I can say ahead of time that we're not planning on exporting the results of dbt show into different file formats like Parquet, CSV, TSV, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

6 participants