Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dbt-sql and default-sql templates public #1463

Merged
merged 9 commits into from
Jun 4, 2024

Conversation

lennartkats-db
Copy link
Contributor

Changes

This makes the dbt-sql and default-sql templates public.

These templates were previously not listed and marked "experimental" since structured streaming tables were still in gated preview and would result in weird error messages when a workspace wasn't enabled for the preview.

This PR also incorporates some of the feedback and learnings for these templates so far.

@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 0% with 7 lines in your changes missing coverage. Please review.

Project coverage is 53.57%. Comparing base (e22dd8a) to head (2deaa71).
Report is 128 commits behind head on main.

Files Patch % Lines
libs/template/materialize.go 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1463      +/-   ##
==========================================
+ Coverage   52.25%   53.57%   +1.31%     
==========================================
  Files         317      351      +34     
  Lines       18004    20271    +2267     
==========================================
+ Hits         9408    10860    +1452     
- Misses       7903     8612     +709     
- Partials      693      799     +106     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

- 'dbt run --target=${bundle.target} --vars "{ dev_schema: {{$dev_schema}} }"'
{{- if (regexp "^yes").MatchString .personal_schemas}}
- 'dbt seed --target=${bundle.target} --vars "{ dev_schema: ${workspace.current_user.short_name} }"'
- 'dbt run --target=${bundle.target} --vars "{ dev_schema: ${workspace.current_user.short_name} }"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also apply to the prod target, which contradicts the explanation in the template:

yes, use a schema based on the current user name during development

You could use target overrides to only apply this for the dev target.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually work as intended since the "dev_schema" var is only used in the dev profile: https://github.com/databricks/cli/pull/1463/files#diff-0ca9390ffa1ff34b28194f7947d477dd79b02f9fe6aa14a578483469913e4d9aR21.

This is the place where our dev/prod targets and the dbt profiles come together. Which can look a bit weird. To make the logic of the generated yml seem bit simpler, I'm no longer passing "dev_schema" unless yes, use a schema based on the current user name during development is actually enabled. Once that is enabled, customers will have to stare at the slightly more complicated ymkl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to do this exclusively for the dev target using overrides. Then it is more clear the dev schema only pertains to the dev target. It's pretty confusing as is, IMO.

targets:
  dev:
    resources:
      jobs:
        dbt_project_job:
          tasks:
            - task_key: dbt
              dbt_task:
                commands:
                  - 'dbt deps --target=${bundle.target}'
                  - 'dbt seed --target=${bundle.target} --vars "{ dev_schema: ${workspace.current_user.short_name} }"'
                  - 'dbt run --target=${bundle.target} --vars "{ dev_schema: ${workspace.current_user.short_name} }"'

  prod:
    resources:
      jobs:
        dbt_project_job:
          tasks:
            - task_key: dbt
              dbt_task:
                commands:
                  - 'dbt deps --target=${bundle.target}'
                  - 'dbt seed --target=${bundle.target}'
                  - 'dbt run --target=${bundle.target}'

The top level resource then shouldn't specify commands, as lists are appended in overrides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or define another variable that interpolates the inner string for --vars.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge with overrides is that it duplicates things and breaks visual editing. And in this case, we're just passing the value of a parameter, not changing behavior. So I'd rather just keep a single version of this job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a comment then? The generated configuration should pre-empt questions people have. Maybe "the dev schema is not used in the prod target" or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds good, added!

@pietern
Copy link
Contributor

pietern commented Jun 3, 2024

Right now the workspace host is displayed as if it were a variable. The intent is for this to be human-readable right?

Screenshot 2024-06-03 at 16 24 44

@lennartkats-db
Copy link
Contributor Author

@pietern

Right now the workspace host is displayed as if it were a variable. The intent is for this to be human-readable right?

I'm not sure what you mean? It doesn't look like a variable on your screenshot? Do you mean that it looks a bit like an input prompt? That is intentional to make all settings appear similar.

@pietern
Copy link
Contributor

pietern commented Jun 3, 2024

I would expect it to read like prose, like the preceding text.

The template targets workspace https://...

As opposed to a key: value.

@lennartkats-db
Copy link
Contributor Author

@pietern yeah, it totally could. My mental model has been that this is just another setting but one that is set indirectly. That seemed like the easiest way to explain it. Any suggestions for phrasing if we present that in a different way?

I tried

Welcome to the dbt template for Databricks Asset Bundles!

Workspace selected based on your current profile: https://e2-dogfood.staging.cloud.databricks.com
(See https://docs.databricks.com/dev-tools/cli/profiles.html for how to change this).

Please provide a unique name for this project.
project_name [dbt_project]:

But the text just seems a bit out of place

# The dbt commands to run (see also the dev/prod profiles in dbt_profiles/profiles.yml)
- 'dbt deps --target=${bundle.target}'
- 'dbt seed'
- 'dbt run'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the target not required after running deps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it is. Good catch. As you can see it is used in the other branch of the {{if}}.

@lennartkats-db lennartkats-db enabled auto-merge June 4, 2024 08:52
@lennartkats-db lennartkats-db added this pull request to the merge queue Jun 4, 2024
Merged via the queue into databricks:main with commit aa36aee Jun 4, 2024
5 checks passed
@lennartkats-db lennartkats-db deleted the enable-sql-templates branch June 4, 2024 09:05
pietern added a commit that referenced this pull request Jun 4, 2024
CLI:
 * Update OpenAPI spec ([#1466](#1466)).

Bundles:
 * Upgrade TF provider to 1.46.0 ([#1460](#1460)).
 * Add support for Lakehouse monitoring ([#1307](#1307)).
 * Make dbt-sql and default-sql templates public ([#1463](#1463)).

Internal:
 * Abstract over filesystem interaction with libs/vfs ([#1452](#1452)).
 * Add `filer.Filer` to read notebooks from WSFS without omitting their extension ([#1457](#1457)).
 * Fix listing notebooks in a subdirectory ([#1468](#1468)).

API Changes:
 * Changed `databricks account storage-credentials list` command to return .
 * Added `databricks consumer-listings batch-get` command.
 * Added `databricks consumer-providers batch-get` command.
 * Removed `databricks apps create-deployment` command.
 * Added `databricks apps deploy` command.

OpenAPI commit 37b925eba37dfb3d7e05b6ba2d458454ce62d3a0 (2024-06-03)

Dependency updates:
 * Bump github.com/hashicorp/go-version from 1.6.0 to 1.7.0 ([#1454](#1454)).
 * Bump github.com/hashicorp/hc-install from 0.6.4 to 0.7.0 ([#1453](#1453)).
@pietern pietern mentioned this pull request Jun 4, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jun 4, 2024
CLI:
* Update OpenAPI spec
([#1466](#1466)).

Bundles:
* Upgrade TF provider to 1.46.0
([#1460](#1460)).
* Add support for Lakehouse monitoring
([#1307](#1307)).
* Make dbt-sql and default-sql templates public
([#1463](#1463)).

Internal:
* Abstract over filesystem interaction with libs/vfs
([#1452](#1452)).
* Add `filer.Filer` to read notebooks from WSFS without omitting their
extension ([#1457](#1457)).
* Fix listing notebooks in a subdirectory
([#1468](#1468)).

API Changes:
* Changed `databricks account storage-credentials list` command to
return .
 * Added `databricks consumer-listings batch-get` command.
 * Added `databricks consumer-providers batch-get` command.
 * Removed `databricks apps create-deployment` command.
 * Added `databricks apps deploy` command.

OpenAPI commit 37b925eba37dfb3d7e05b6ba2d458454ce62d3a0 (2024-06-03)

Dependency updates:
* Bump github.com/hashicorp/go-version from 1.6.0 to 1.7.0
([#1454](#1454)).
* Bump github.com/hashicorp/hc-install from 0.6.4 to 0.7.0
([#1453](#1453)).
github-merge-queue bot pushed a commit that referenced this pull request Jun 4, 2024
## Changes

This fixes a last-minute regression that snuck into
#1463: unfortunately we need to
use `USE IDENTIFIER('schema')` to select a schema for now. In the future
we expect we can just use `USE SCHEMA 'schema'`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants