Skip to content

Latest commit

 

History

History
138 lines (94 loc) · 4.87 KB

README.md

File metadata and controls

138 lines (94 loc) · 4.87 KB

Palm dbt

dbt plugin for Palm CLI

This plugin adds dbt-specific commands for use with Palm CLI

Installing

Install this plugin along with palm

pip install palm-dbt

Or from source

python3 -m pip install .

Configuring your project

To configure your project to use the palm-dbt plugin, you will need a .palm/config.yaml this can be created by running palm init, once you have your config file, add the dbt-palm plugin with the following configuration:

plugins:
  - dbt

Check dbt plugin version

Check the version of palm-dbt inside a project in which you have configured palm with the dbt plugin: palm plugin versions

Palm-ing an existing dbt project

palm-dbt ships with a command to containerize and convert your existing dbt project.

For example, if you wanted to containerize your existing dbt project running on 0.21.0, you would run:

  palm containerize --version 0.21.0

Adding palm dbt macros

palm-dbt uses the git branch name to set the schema for all commands via env vars. This allows palm to clean up test data after each run, ensuring that your data warehouse stays clean and free of development/test data.

To enable this functionality, palm-dbt ships with 2 macros that handle schema naming and cleanup:

  • generate_schema_name - This macro overrides the dbt-core macro to auto-generate a schema name based on your current git branch and PALM_DBT_ENV.

  • drop_branch_schemas - This macro uses the branch named schema and the TEST database to clean up any models generated by running dbt in development or test environments. Calls to this macro are baked in to many of the palm dbt commands.

See the section about the palm dbt naming macros below for more information.

To install these macros, run palm install from within a project that is configured to use the palm-dbt plugin.

Recommended (optional) protected branch configuration

In order to ensure your runs are idempotent, we recommend that you do not run palm-dbt commands against main, master or any other production-like branches you may be using.

To prevent palm running against specific branches, add the following config to your project's .palm/config.yaml

protected_branches:
  - main
  - master
  # Any other branches you want to protect

About the palm dbt branch naming macros

One of the most painful parts of data testing is unfortunate shared mutable state. palm-dbt provides a mechanism to eliminate this undesirable situation by namespacing each run of dbt. for git branches other than main or master, palm will prefix the calculated schema name with a formatted version of your branch name. In CI, this will be additionally prefixed with "CI". For example:

  • you open a branch FEATURE/DATA-100/update-widget
  • when you palm run in your local env, the schema public will be built as feature_data_100_update_widget_public. The schema sales will be built as feature_data_100_update_widget_sales.
  • in CI the schemas will be ci_feature_data_100_update_widget_public, ci_feature_data_100_update_widget_sales (respectively).
  • in prod the schemas will be 'public' and 'sales' (respectively).

Refs will automatically update as well. This way, you can use a single test database and not worry about conflicts between developers, or between branches for the same developer (like during hotfixes).

palm-dbt and dbt deps

In palm-dbt we have determined that running dbt deps before every command is problematic for a few reasons:

  1. It takes time, slowing down development, CI, and every production run.
  2. If dbt hub or github have an outage, our dbt commands fail and remain broken until the upstream error is resolved
  3. If you forget to run dbt deps, the resulting error messages can be quite confusing.

To solve these problems, we have decided that running dbt clean && dbt deps should happen in the Dockerfile, when the image is being built.

To support this decision your project must do the following:

  1. Include RUN dbt clean && dbt deps in the Dockerfile
  2. Include a volume entry in the docker-compose.yaml for the dbt_modules directory like this - /app/{{packages_dir}}, which will prevent the .:./app volume from blowing away the deps generated when the image was built.

if you use palm containerize this will be done for you!

Additionally, if you need to make changes to your deps you should use palm build to rebuild the image, which will update your deps!

Typical palm-dbt workflow

From a non-protected branch, running palm run will:

  1. drop (if it exists) the namespaced schema in development
  2. create the namespaced schema in development
  3. seed and run
  4. drop the namespaced schema in development

Why drop it? so your testing is atomic.

Want to persist it? use the flag --persist