Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expect_row_values_to_have_data_for_every_n_datepart errors when both start and end dates are set #115

Merged
merged 6 commits into from
Oct 8, 2021

Conversation

jeremyyeo
Copy link
Contributor

Hey @clausherther, just implementing @barberscott's fix for #113 here.


Tested with the following:

model / schema
-- dim_dates.sql
SELECT *
  FROM (VALUES ('2020-01-01'::DATE), ('2020-01-02'::DATE), ('2020-01-03'::DATE), ('2020-01-04'::DATE))
    AS my_table(date_at)
# dim_dates.yml
version: 2
models:
  - name: dim_dates
    columns:
      - name: date_at
        tests:
          - accepted_values:
              values: ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04"]
    tests:
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          # Expect this to PASS.
          date_col: date_at
          date_part: day
          test_start_date: "2020-01-01"
          test_end_date: "2020-01-04"
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          # Expect this to PASS.
          date_col: date_at
          date_part: day
          test_end_date: "2020-01-04"
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          # Expect this to FAIL.
          date_col: date_at
          date_part: day
          test_start_date: "2019-12-01"
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          # Expect this to FAIL.
          date_col: date_at
          date_part: day
          test_end_date: "2020-12-01"
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          # Expect this to FAIL.
          date_col: date_at
          date_part: day
          test_start_date: "2020-01-01"
          test_end_date: "2020-01-06"
dbt test -m dim_dates output
Running with dbt=0.20.2
Found 4 models, 14 tests, 0 snapshots, 0 analyses, 540 macros, 0 operations, 0 seed files, 0 sources, 0 exposures

15:08:39 | Concurrency: 1 threads (target='dev')
15:08:39 | 
15:08:39 | 1 of 6 START test accepted_values_dim_dates_date_at__2020_01_01__2020_01_02__2020_01_03__2020_01_04 [RUN]
15:08:42 | 1 of 6 PASS accepted_values_dim_dates_date_at__2020_01_01__2020_01_02__2020_01_03__2020_01_04 [�[32mPASS�[0m in 3.26s]
15:08:42 | 2 of 6 START test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2019_12_01 [RUN]
15:08:47 | 2 of 6 FAIL 31 dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2019_12_01 [�[31mFAIL 31�[0m in 4.69s]
15:08:47 | 3 of 6 START test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_04 [RUN]
15:08:51 | 3 of 6 PASS dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_04 [�[32mPASS�[0m in 4.35s]
15:08:51 | 4 of 6 START test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_04__2020_01_01 [RUN]
15:08:55 | 4 of 6 PASS dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_04__2020_01_01 [�[32mPASS�[0m in 3.89s]
15:08:55 | 5 of 6 START test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_06__2020_01_01 [RUN]
15:08:59 | 5 of 6 FAIL 1 dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_06__2020_01_01 [�[31mFAIL 1�[0m in 3.82s]
15:08:59 | 6 of 6 START test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_12_01 [RUN]
15:09:04 | 6 of 6 FAIL 331 dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_12_01 [�[31mFAIL 331�[0m in 4.60s]
15:09:04 | 
15:09:04 | Finished running 6 tests in 30.05s.

�[31mCompleted with 3 errors and 0 warnings:�[0m

�[31mFailure in test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2019_12_01 (models/testing_dbt_expectations/dim_dates.yml)�[0m
  Got 31 results, configured to fail if != 0

  compiled SQL at target/compiled/snowflake/models/testing_dbt_expectations/dim_dates.yml/schema_test/dbt_expectations_expect_row_va_983cdf96730122e510782406c066c373.sql

�[31mFailure in test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_01_06__2020_01_01 (models/testing_dbt_expectations/dim_dates.yml)�[0m
  Got 1 result, configured to fail if != 0

  compiled SQL at target/compiled/snowflake/models/testing_dbt_expectations/dim_dates.yml/schema_test/dbt_expectations_expect_row_va_bdb191a0149f9b0d3c08c5c8db6ab40d.sql

�[31mFailure in test dbt_expectations_expect_row_values_to_have_data_for_every_n_datepart_dim_dates_date_at__day__2020_12_01 (models/testing_dbt_expectations/dim_dates.yml)�[0m
  Got 331 results, configured to fail if != 0

  compiled SQL at target/compiled/snowflake/models/testing_dbt_expectations/dim_dates.yml/schema_test/dbt_expectations_expect_row_va_4cd32499e35de4d43a0427496f6139d9.sql

Done. PASS=3 WARN=0 ERROR=3 SKIP=0 TOTAL=6

@clausherther clausherther self-requested a review October 8, 2021 15:50
Copy link
Contributor

@clausherther clausherther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Hoping to cut a release this weekend, pending a related PR.

@@ -914,7 +914,13 @@ tests:

Expects model to have values for every grouped `date_part`.

For example, this tests whether a model has data for every `day` (grouped on `date_col`) from either a specified `start_date` and `end_date`, or for the `min`/`max` value of the specified `date_col`.
For example, this tests whether a model has data for every `day` (grouped on `date_col`) between either:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremyyeo fyi, just updated the README a bit. Thanks for adding the extra documentation!

@@ -5,7 +5,7 @@
name: 'dbt_expectations'
version: '0.4.0'

require-dbt-version: [">=0.20.0", "<0.21.0"]
require-dbt-version: [">=0.20.0", "<0.22.0"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, rebased to support 0.21

{%- set dr = run_query(sql) -%}
{%- set db_start_date = dr.columns[0].values()[0].strftime('%Y-%m-%d') -%}
{%- set db_end_date = dr.columns[1].values()[0].strftime('%Y-%m-%d') -%}
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@clausherther clausherther merged commit b3c5231 into calogica:main Oct 8, 2021
clausherther added a commit that referenced this pull request Nov 10, 2021
* add `interval` argument

for checking presence every n-date_parts instead of every date_part

* update docs for explaining new `interval` arg

[expect_row_values_to_have_data_for_every_n_datepart](https://github.com/calogica/dbt-expectations/tree/0.4.2#expect_row_values_to_have_data_for_every_n_datepart)

* expect_table_columns_to_match_ordered_list: refactor row_number to use loop.index (#112)

* Fixes #111 - refactor row_number to use loop.index

* Update CHANGELOG

* handle data types for `mod` and incorporate windowing

This test will handle the mod function, which only takes integer arguments, more stably. It also aggregates row counts across intervals when joining on the date spine to correctly detect data presence in the target model

update conditions based on interval

update styling

update styling

* Add support for dbt 0.21 (#116)

* Update README.md

* remove unintentional styling changes

* Fix expect_row_values_to_have_data_for_every_n_datepart errors when both start and end dates are set (#115)

* fix none when both test dates are set

* Add support for dbt 0.21 (#116)

* Update README.md

* fix none when both test dates are set

* Update README

Co-authored-by: Claus Herther <claus@calogica.com>

* add `interval` argument

for checking presence every n-date_parts instead of every date_part

* update docs for explaining new `interval` arg

[expect_row_values_to_have_data_for_every_n_datepart](https://github.com/calogica/dbt-expectations/tree/0.4.2#expect_row_values_to_have_data_for_every_n_datepart)

* handle data types for `mod` and incorporate windowing

This test will handle the mod function, which only takes integer arguments, more stably. It also aggregates row counts across intervals when joining on the date spine to correctly detect data presence in the target model

update conditions based on interval

update styling

update styling

* remove unintentional styling changes

* Change datepart param and fix formatting

* Reformat join to match prior style

* simplify tie-out of model data to spine with interval truncation

the condition added to the model_data CTE is meant to emulate (kind of) Snowflake's [`TIME_SLICE`](https://docs.snowflake.com/en/sql-reference/functions/time_slice.html), which should allow exact matches to the base_dates CTE for better time bucketing

* Add schema test

* replace calls to subquery with calls directly to columns in `model_data` CTE

* add comments/examples for new interval additions

Co-authored-by: Claus Herther <claus@calogica.com>
Co-authored-by: Jeremy Yeo <jeremyyeo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants