User-supplied fail_calc for tests #3321

jtcohen6 · 2021-05-05T13:36:50Z

Describe the feature

Today, the 'test' materialization hard-codes count(*) as the way to calculate failures from a test:

https://github.com/fishtown-analytics/dbt/blob/26fb58bd1b08218781b182918f5ed4dec3f735d9/core/dbt/include/global_project/macros/materializations/test.sql#L4-L7

I think that makes sense 90% of the time in the general case, but we want to give users the ability to customize the "failure calculation" if they'd like it to be something other than count(*). This is important for backward compatibility, since schema tests could previously calculate and return whatever numeric value they wanted. In the wild, this could be as simple as sum(column) instead of count(*), or it could be as complex as the dbt_utils.equality test:

select count(*) from unioned) +
        (select abs(
            (select count(*) from a_minus_b) -
            (select count(*) from b_minus_a)
            )

I'm hopeful this is quite straightforward to implement—it's just a matter of pulling in the fail_calc and templating it into the materialization.

Questions

Should fail_calc be a test config or a test property? I lean toward property, since I think this is an essential component of the test definition and less like something that wants to be set for many different types of tests at once, e.g. from dbt_project.yml. (In a post-Set configs in schema.yml files #2401 world, this is hopefully a less meaningful distinction!)
Could this have potentially strange interactions with % values of warn_if / error_if (Net-new test configs #3258)? Yes! I don't think we need to solve for every edge case there now.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-05-11T20:18:08Z

Update: We're going to make this a test config, following the pattern sketched out in #3258.

I don't think it makes a ton of sense for users to override the default configs set by the generic test definition, but hey, they'll be able to if they want to. I could even see this being compelling—let's say you want the unique test to calculate failure as the number of original rows containing a duplicate, rather than the length of the set of duplicate values.

jtcohen6 · 2021-06-02T15:03:01Z

Resolved by #3392

jtcohen6 added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels May 5, 2021

jtcohen6 added this to the Margaret Mead milestone May 5, 2021

jtcohen6 mentioned this issue May 5, 2021

Configurable description in yaml config for generic tests #3249

Closed

kwigley self-assigned this May 11, 2021

kwigley linked a pull request May 19, 2021 that will close this issue

New test configs: where, limit, warn_if, error_if, fail_calc #3336

Closed

4 tasks

kwigley mentioned this issue May 21, 2021

New test configs: where, limit, warn_if, error_if, fail_calc #3336

Closed

4 tasks

leahwicz mentioned this issue May 24, 2021

Release v0.20.0 RC1 #3388

Closed

4 tasks

kwigley mentioned this issue May 26, 2021

New test configs #3392

Merged

4 tasks

jtcohen6 closed this as completed Jun 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-supplied fail_calc for tests #3321

User-supplied fail_calc for tests #3321

jtcohen6 commented May 5, 2021

jtcohen6 commented May 11, 2021

jtcohen6 commented Jun 2, 2021

User-supplied fail_calc for tests #3321

User-supplied fail_calc for tests #3321

Comments

jtcohen6 commented May 5, 2021

Describe the feature

Questions

jtcohen6 commented May 11, 2021

jtcohen6 commented Jun 2, 2021