diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md new file mode 100644 index 00000000000..71c464cbd40 --- /dev/null +++ b/website/docs/docs/build/unit-tests.md @@ -0,0 +1,258 @@ +--- +title: "Unit tests" +sidebar_label: "Unit tests" +description: "Learn how to use unit tests on your SQL models." +search_weight: "heavy" +id: "unit-tests" +keywords: + - unit test, unit tests, unit testing, dag +--- +:::note closed beta + +Unit testing is currently in closed beta for dbt Cloud accounts that have updated to a [versionless environment](/docs/dbt-versions/upgrade-core-in-cloud). + +It is available now as an alpha feature for dbt Core v1.8 users. + +::: + +Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ a building a model. + +Now, we are introducing a new type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. + +## Before you begin + +- We currently only support unit testing SQL models. +- We currently only support adding unit tests to models in your _current_ project. +- If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](#unit-testing-versioned-models) for more information. + +Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. + +### When to add a unit test to your model + +You should unit test a model: +- When your SQL contains complex logic: + - Regex + - Date math + - Window functions + - `case when` statements when there are many `when`s + - Truncation + - Recursion +- When you're writing custom logic to process input data, similar to creating a function. +- We don't recommend conducting unit testing for functions like `min()` since these functions are tested extensively by the warehouse. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information. +- Logic for which you had bugs reported before. +- Edge cases not yet seen in your actual data that you want to handle. +- Prior to refactoring the transformation logic (especially if the refactor is significant). +- Models with high "criticality" (public, contracted models or models directly upstream of an exposure). + +## Unit testing a model + +This example creates a new `dim_customers` model with a field `is_valid_email_address` that calculates whether or not the customer’s email is valid: + + + +```sql +with customers as ( + + select * from {{ ref('stg_customers') }} + +), + +accepted_email_domains as ( + + select * from {{ ref('top_level_email_domains') }} + +), + +check_valid_emails as ( + + select + customers.customer_id, + customers.first_name, + customers.last_name, + coalesce (regexp_like( + customers.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$' + ) + = true + and accepted_email_domains.tld is not null, + false) as is_valid_email_address + from customers + left join accepted_email_domains + on customers.email_top_level_domain = lower(accepted_email_domains.tld) + +) + +select * from check_valid_emails +``` + + +The logic posed in this example can be challenging to validate. You can add a unit test to this model to ensure the `is_valid_email_address` logic captures all known edge cases: emails without `.`, emails without `@`, and emails from invalid domains. + + + +```yaml +unit_tests: + - name: test_is_valid_email_address + description: "Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains." + model: dim_customers + given: + - input: ref('stg_customers') + rows: + - {customer_id: 1, email: cool@example.com, email_top_level_domain: example.com} + - {customer_id: 2, email: cool@unknown.com, email_top_level_domain: unknown.com} + - {customer_id: 3, email: badgmail.com, email_top_level_domain: gmail.com} + - {customer_id: 4, email: missingdot@gmailcom, email_top_level_domain: gmail.com} + - input: ref('top_level_email_domains') + rows: + - {tld: example.com} + - {tld: gmail.com} + expect: + rows: + - {customer_id: 1, is_valid_email_address: true} + - {customer_id: 2, is_valid_email_address: false} + - {customer_id: 3, is_valid_email_address: false} + - {customer_id: 4, is_valid_email_address: false} + +``` + + +The previous example defines the mock data using the inline `dict` format, but you can also use `csv` either inline or in a separate fixture file. + +You only have to define the mock data for the columns you care about. This enables you to write succinct and _specific_ unit tests. + +:::note + +The direct parents of the model that you’re unit testing (in this example, `stg_customers` and `top_level_email_domains`) need to exist in the warehouse before you can execute the unit test. + +Use the `--empty` flag to build an empty version of the models to save warehouse spend. + +```bash + +dbt run --select "stg_customers top_level_email_domains" --empty + +``` + +Alternatively, use `dbt build` to, in lineage order: + +- Run the unit tests on your model. +- Materialize your model in the warehouse. +- Run the data tests on your model. + +::: + +Now you’re ready to run this unit test. You have a couple of options for commands depending on how specific you want to be: + +- `dbt test --select dim_customers` runs _all_ of the tests on `dim_customers`. +- `dbt test --select "dim_customers,test_type:unit"` runs all of the _unit_ tests on `dim_customers`. +- `dbt test --select test_is_valid_email_address` runs the test named `test_is_valid_email_address`. + +```shell + +dbt test --select test_is_valid_email_address +16:03:49 Running with dbt=1.8.0-a1 +16:03:49 Registered adapter: postgres=1.8.0-a1 +16:03:50 Found 6 models, 5 seeds, 4 data tests, 0 sources, 0 exposures, 0 metrics, 410 macros, 0 groups, 0 semantic models, 1 unit test +16:03:50 +16:03:50 Concurrency: 5 threads (target='postgres') +16:03:50 +16:03:50 1 of 1 START unit_test dim_customers::test_is_valid_email_address ................... [RUN] +16:03:51 1 of 1 FAIL 1 dim_customers::test_is_valid_email_address ............................ [FAIL 1 in 0.26s] +16:03:51 +16:03:51 Finished running 1 unit_test in 0 hours 0 minutes and 0.67 seconds (0.67s). +16:03:51 +16:03:51 Completed with 1 error and 0 warnings: +16:03:51 +16:03:51 Failure in unit_test test_is_valid_email_address (models/marts/unit_tests.yml) +16:03:51 + +actual differs from expected: + +@@ ,customer_id,is_valid_email_address +→ ,1 ,True→False + ,2 ,False +...,... ,... + + +16:03:51 +16:03:51 compiled Code at models/marts/unit_tests.yml +16:03:51 +16:03:51 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1 + +``` + +The clever regex statement wasn’t as clever as initially thought, as the model incorrectly flagged `cool@example.com` (customer 1's email) as an invalid email address. + +Updating the regex logic to `'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'` (those pesky escape characters) and rerunning the unit test solves the problem: + +```shell + +dbt test --select test_is_valid_email_address +16:09:11 Running with dbt=1.8.0-a1 +16:09:12 Registered adapter: postgres=1.8.0-a1 +16:09:12 Found 6 models, 5 seeds, 4 data tests, 0 sources, 0 exposures, 0 metrics, 410 macros, 0 groups, 0 semantic models, 1 unit test +16:09:12 +16:09:13 Concurrency: 5 threads (target='postgres') +16:09:13 +16:09:13 1 of 1 START unit_test dim_customers::test_is_valid_email_address ................... [RUN] +16:09:13 1 of 1 PASS dim_customers::test_is_valid_email_address .............................. [PASS in 0.26s] +16:09:13 +16:09:13 Finished running 1 unit_test in 0 hours 0 minutes and 0.75 seconds (0.75s). +16:09:13 +16:09:13 Completed successfully +16:09:13 +16:09:13 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 + +``` + +Your model is now ready for production! Adding this unit test helped catch an issue with the SQL logic _before_ you materialized `dim_customers` in your warehouse and will better ensure the reliability of this model in the future. + +## Unit testing versioned models + +When a unit test is added to a model, it will run on all versions of the model by default. +Using the example in this article, if you have versions 1, 2, and 3 of `dim_customers`, the `test_is_valid_email_address` unit test will run on all 3 versions. + +To only unit test a specific version (or versions) of a model, include the desired version(s) in the model config: + +```yml + +unit_tests:: + - name: test_is_valid_email_address + model: dim_customers + versions: + include: + - 2 + ... + +``` + +In this scenario, if you have version 1, 2, and 3 of `dim_customers `, my `test_is_valid_email_address` unit test will run on _only_ version 2. + +To unit test all versions except a specific version (or versions) of a model, you can exclude the relevant version(s) in the model config: + +```yml + +unit_tests: + - name: test_is_valid_email_address + model: dim_customers + versions: + exclude: + - 1 + ... + +``` +So, if you have versions 1, 2, and 3 of `dim_customers`, your `test_is_valid_email_address` unit test will run on _only_ versions 2 and 3. + +If you want to unit test a model that references the pinned version of the model, you should specify that in the `ref` of your input: + +```yml + +unit_tests: + - name: test_is_valid_email_address + model: dim_customers + given: + - input: ref('stg_customers', v=1) + ... + +``` + + + diff --git a/website/docs/reference/commands/test.md b/website/docs/reference/commands/test.md index 373ad9b6db3..cad61a05ac5 100644 --- a/website/docs/reference/commands/test.md +++ b/website/docs/reference/commands/test.md @@ -3,6 +3,7 @@ title: "About dbt test command" sidebar_label: "test" id: "test" --- + `dbt test` runs tests defined on models, sources, snapshots, and seeds. It expects that you have already created those resources through the appropriate commands. @@ -29,3 +30,47 @@ dbt test --select "one_specific_model,test_type:generic" ``` For more information on writing tests, see the [Testing Documentation](/docs/build/data-tests). + + + + + +`dbt test` runs data tests defined on models, sources, snapshots, and seeds and unit tests defined on SQL models. It expects that you have already created those resources through the appropriate commands. + +The tests to run can be selected using the `--select` flag discussed [here](/reference/node-selection/syntax). + +```bash +# run data and unit tests +dbt test + +# run only data tests +dbt test --select test_type:data + +# run only unit tests +dbt test --select test_type:unit + +# run tests for one_specific_model +dbt test --select "one_specific_model" + +# run tests for all models in package +dbt test --select "some_package.*" + +# run only data tests defined singularly +dbt test --select "test_type:singular" + +# run only data tests defined generically +dbt test --select "test_type:generic" + +# run data tests limited to one_specific_model +dbt test --select "one_specific_model,test_type:data" + +# run unit tests limited to one_specific_model +dbt test --select "one_specific_model,test_type:unit" +``` + +For more information on writing tests, read the [data testing](/docs/build/data-tests) and [unit testing](/docs/build/unit-tests) documentation. + + + + + diff --git a/website/docs/reference/resource-configs/store_failures_as.md b/website/docs/reference/resource-configs/store_failures_as.md index dd61030afb8..005193a5381 100644 --- a/website/docs/reference/resource-configs/store_failures_as.md +++ b/website/docs/reference/resource-configs/store_failures_as.md @@ -17,7 +17,7 @@ You can configure it in all the same places as `store_failures`, including singu #### Singular test -[Singular test](https://docs.getdbt.com/docs/build/tests#singular-data-tests) in `tests/singular/check_something.sql` file +[Singular test](https://docs.getdbt.com/docs/build/data-tests#singular-data-tests) in `tests/singular/check_something.sql` file ```sql {{ config(store_failures_as="table") }} @@ -29,7 +29,7 @@ where 1=0 #### Generic test -[Generic tests](https://docs.getdbt.com/docs/build/tests#generic-data-tests) in `models/_models.yml` file +[Generic tests](https://docs.getdbt.com/docs/build/data-tests#generic-data-tests) in `models/_models.yml` file ```yaml models: diff --git a/website/docs/reference/resource-properties/unit-tests.md b/website/docs/reference/resource-properties/unit-tests.md new file mode 100644 index 00000000000..40c3414e373 --- /dev/null +++ b/website/docs/reference/resource-properties/unit-tests.md @@ -0,0 +1,204 @@ +--- +title: "About unit tests property" +sidebar_label: "Unit tests" +resource_types: [models] +datatype: test +--- + + +Unit tests validate your SQL modeling logic on a small set of static inputs before you materialize your full model in production. They support a test-driven development approach, improving both the efficiency of developers and reliability of code. + +To run only your unit tests, use the command: +`dbt test --select test_type:unit` + + + +```yml + +unit_tests: + - name: # this is the unique name of the test + model: + versions: #optional + include: #optional + exclude: #optional + config: + meta: {dictionary} + tags: | [] + given: + - input: # optional for seeds + format: dict | csv + # if format csv, either define dictionary of rows or name of fixture + rows: + - {dictionary} + fixture: + - input: ... # declare additional inputs + expect: + format: dict | csv + # if format csv, either define dictionary of rows or name of fixture + rows: + - {dictionary} + fixture: + overrides: # optional: configuration for the dbt execution environment + macros: + is_incremental: true | false + dbt_utils.current_timestamp: str + # ... any other jinja function from https://docs.getdbt.com/reference/dbt-jinja-functions + # ... any other context property + vars: {dictionary} + env_vars: {dictionary} + - name: ... # declare additional unit tests + + ``` + + + + +## About writing unit tests + +Unit tests are currently limited to testing SQL models and only models in your current project. + +### Versions +If your model has multiple versions, the default unit test will run on *all* versions of your model. To specify version(s) of your model to unit test, use `include` or `exclude` for the desired versions in your model versions config: + +```yaml + +# my test_is_valid_email_address unit test will run on all versions of my_model +unit_tests: + - name: test_is_valid_email_address + model: my_model + ... + +# my test_is_valid_email_address unit test will run on ONLY version 2 of my_model +unit_tests: + - name: test_is_valid_email_address + model: my_model + versions: + include: + - 2 + ... + +# my test_is_valid_email_address unit test will run on all versions EXCEPT 1 of my_model +unit_tests: + - name: test_is_valid_email_address + model: my_model + versions: + exclude: + - 1 + ... + +``` + +### Format + +When using `format: dict` you must supply an in-line dictionary for `rows:` (this is the default, if you don’t specify a `format`) + +```yml + +unit_tests: + - name: test_my_model + model: my_model + given: + - input: ref('my_model_a') + format: dict + rows: + - {id: 1, name: gerda} + - {id: 2, b: michelle} + ... +``` + +When `format: csv`, can either supply: + - An inline csv string for `rows:` + + ```yaml + unit_tests: + - name: test_my_model + model: my_model + given: + - input: ref('my_model_a') + format: csv + rows: | + id,name + 1,gerda + 2,michelle + ... + ``` + + + - The name of a csv file in the `tests/fixtures` directory in your project (or the directory configured for [test-paths](https://docs.getdbt.com/reference/project-configs/test-paths)) for `fixture`: + + ```yaml + unit_tests: + - name: test_my_model + model: my_model + given: + - input: ref('my_model_a') + format: csv + fixture: my_model_a_fixture + ... + ``` + + ```csv + # tests/fixtures/my_model_a_fixture.csv + 1,gerda + 2,michelle + ``` + +### Input + +- `input:` string that represents a `ref` or `source` call: + - `ref('my_model')` or `ref('my_model', v='2')` or `ref('dougs_project', 'users')` + - `source('source_schema', 'source_name')` +- `input:` is optional for seeds: + - If you don’t supply an input for a seed, we will use the seed *as* the input. + - If you do supply an input for a seed, we will use that input instead. +- You can also have “empty” inputs, by setting rows to an empty list `rows: []` + +## Examples +```yml + +unit_tests: + - name: test_is_valid_email_address # this is the unique name of the test + model: dim_customers # name of the model I'm unit testing + given: # the mock data for your inputs + - input: ref('stg_customers') + rows: + - {customer_id: 1, email: cool@example.com, email_top_level_domain: example.com} + - {customer_id: 2, email: cool@unknown.com, email_top_level_domain: unknown.com} + - {customer_id: 3, email: badgmail.com, email_top_level_domain: gmail.com} + - {customer_id: 4, email: missingdot@gmailcom, email_top_level_domain: gmail.com} + - input: ref('top_level_email_domains') + rows: + - {tld: example.com} + - {tld: gmail.com} + expect: # the expected output given the inputs above + rows: + - {customer_id: 1, is_valid_email_address: true} + - {customer_id: 2, is_valid_email_address: false} + - {customer_id: 3, is_valid_email_address: false} + - {customer_id: 4, is_valid_email_address: false} + +``` + +```yml + +unit_tests: + - name: test_is_valid_email_address # this is the unique name of the test + model: dim_customers # name of the model I'm unit testing + given: # the mock data for your inputs + - input: ref('stg_customers') + rows: + - {customer_id: 1, email: cool@example.com, email_top_level_domain: example.com} + - {customer_id: 2, email: cool@unknown.com, email_top_level_domain: unknown.com} + - {customer_id: 3, email: badgmail.com, email_top_level_domain: gmail.com} + - {customer_id: 4, email: missingdot@gmailcom, email_top_level_domain: gmail.com} + - input: ref('top_level_email_domains') + format: csv + rows: | + tld + example.com + gmail.com + expect: # the expected output given the inputs above + format: csv + fixture: valid_email_address_fixture_output + +``` diff --git a/website/sidebars.js b/website/sidebars.js index c29167545fe..9c67619bf90 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -277,9 +277,17 @@ const sidebarSettings = { "docs/build/python-models", ], }, + { + type: "category", + label: "Tests", + link: { type: "doc", id: "docs/build/data-tests" }, + items: [ + "docs/build/data-tests", + "docs/build/unit-tests", + ], + }, "docs/build/snapshots", "docs/build/seeds", - "docs/build/data-tests", "docs/build/jinja-macros", "docs/build/sources", "docs/build/exposures", @@ -811,7 +819,7 @@ const sidebarSettings = { }, { type: "category", - label: "For tests", + label: "For data tests", items: [ "reference/data-test-configs", "reference/resource-configs/fail_calc", @@ -822,6 +830,13 @@ const sidebarSettings = { "reference/resource-configs/where", ], }, + { + type: "category", + label: "For unit tests", + items: [ + "reference/resource-properties/unit-tests", + ], + }, { type: "category", label: "For sources",