name | about | title | labels | assignees |
---|---|---|---|---|
Add a new dbt model |
Request the addition of a new model to the dbt DAG. |
Add a new dbt model |
(Replace or delete anything in parentheses with your own issue content.)
(Brief description of the task here.)
- Name: (What should the model be called? See Model naming for guidance.)
- Model type: (SQL or Python? See Model type (SQL or Python) for guidance.)
- Materialization: (Should the model be a table or a view? See Model materialization for guidance.)
- Tests:
- (Add a bulleted list of tests here. See Model tests for guidance.)
- Description: (Provide a rich description for this model that will be displayed in documentation. Markdown is supported, and encouraged for more complex models. See Model description for guidance.)
(Use this checklist if the assignee already knows how to add a dbt model. Otherwise, delete it in favor of the long checklist in the following section.)
- Confirm that a subdirectory for this model's database exists in the
dbt/models/
directory, and if not, create one, add a newschema.yml
file, and updatedbt_project.yml
to document the+schema
- Define the SQL query or Python script that creates the model in the model subdirectory, following any existing file naming schema
- Use
source()
andref()
to reference other models where possible - [SQL models only] Optionally configure model materialization in the query file
- Update the
schema.yml
file in the subfolder ofdbt/models/
to point to the new model definition - [Python models only] Configure any third-party pure Python packages
- Add tests to the model schema definition in
schema.yml
- [SQL models only] If your model definition requires any new macros, make sure those macros are tested in
dbt/macros/tests/test_all.sql
- Commit your changes to a branch and open a pull request
Complete the following checklist to add the model:
- Confirm that a subdirectory for this model's database exists in the
dbt/models/
directory (e.g.dbt/models/default/
for thedefault.new_model
model). If a subdirectory does not yet exist, create one, add aschema.yml
file to the directory to store model properties, and updatedbt_project.yml
to document the new directory under themodels.athena
key with a+schema
attribute.
# Table example (only the model name would change for a view)
# schema.yml
models:
- name: default.new_model
# View or table example
--- a/dbt/dbt_project.yml
+++ b/dbt/dbt_project.yml
models:
athena:
+materialized: view
+ default:
+ +schema: default
census:
+schema: census
- Define the SQL query or Python script that creates the model in the appropriate subfolder of the
dbt/models/
directory. For example, if you're adding a view to thedefault
schema, then the model definition file should live indbt/models/default
. The file should have the same name as the model that appears in Athena. A period in the model name should separate the entity name from the database namespace (e.g.default.vw_pin_universe.sql
). All views should have a name prefixed withvw_
.
# SQL view example
touch dbt/models/default/default.vw_new_model.sql
# SQL table example
touch dbt/models/proximity/proximity.new_model.sql
# Python model example
touch dbt/models/proximity/proximity.new_model.py
-- SQL view or table example
-- Either dbt/models/default/default.vw_new_model.sql
-- or dbt/models/default/default.new_model.sql
select pin10, year
from {{ source('raw', 'foobar') }}
join {{ ref('default.vw_pin_universe') }}
using (pin10, year)
# Python model example
# dbt/models/default/default.new_model.py
import pandas as pd
def model(dbt, spark_session):
raw_foobar = dbt.source("raw", "foobar")
vw_pin_universe = dbt.ref("default.vw_pin_universe")
result = pd.merge(raw_foobar, vw_pin_universe, on=["pin10", "year"])
dbt.write(result[["pin10", "year"]])
- [SQL models only] Optionally configure model materialization. If the output of the query should be a view, no action is necessary, since the default for all models in this repository is to materialize as views; but if the output should be a table, with table data stored in S3, then you'll need to add a config block to the top of the view to configure materialization.
# Table example
--- dbt/models/default/default.new_model.sql
+++ dbt/models/default/default.new_model.sql
+ {{
+ config(
+ materialized='table',
+ partitioned_by=['year'],
+ bucketed_by=['pin10'],
+ bucket_count=1
+ )
+ }}
select pin10, year
from {{ source('raw', 'foobar') }}
join {{ ref('default.vw_pin_universe') }}
using (pin10, year)
- Update the
schema.yml
file in the subfolder ofdbt/models/
where you created your model definition. Make sure to add descriptions for new entities (models, sources, columns, etc). See Model description and Column descriptions for specific guidance on doc locations and using docs blocks.
# Table example (only the model name would change for a view)
--- a/dbt/models/default/schema.yml
+++ b/dbt/models/default/schema.yml
models:
+ - name: default.new_model
+ description: New model
+ columns:
+ - name: pin10
+ description: 10-digit PIN
+ - name: year
+ description: Year
- name: default.vw_pin_history
description: PIN history
data_tests:
-
[Python models only] If you need any third-party pure Python packages that are not preinstalled in the Athena PySpark environment, follow the docs for configuring Python model dependencies.
-
Add tests to your new model definition in
schema.yml
.
# Table example (only the model name would change for a view)
--- a/dbt/models/default/schema.yml
+++ b/dbt/models/default/schema.yml
models:
- name: default.new_model
description: New model
columns:
- name: pin10
description: 10-digit PIN
- name: year
description: Year
+ data_tests:
+ - unique_combination_of_columns:
+ name: new_model_unique_by_pin_and_year
+ combination_of_columns:
+ - pin
+ - year
- name: default.vw_pin_history
description: PIN history
data_tests:
-
[SQL models only] If your model definition requires any new macros, make sure those macros are tested in
dbt/macros/tests/test_all.sql
. If any tests need implementing, follow the pattern set by existing tests to implement them. -
Commit your changes to a branch and open a pull request to build your model and run tests in a CI environment.