Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unit Test fails when using a versioned model as an input #10528

Closed
2 tasks done
kbrock91 opened this issue Aug 6, 2024 · 2 comments · Fixed by #10889
Closed
2 tasks done

[Bug] Unit Test fails when using a versioned model as an input #10528

kbrock91 opened this issue Aug 6, 2024 · 2 comments · Fixed by #10889
Labels
bug Something isn't working model_versions unit tests Issues related to built-in dbt unit testing functionality

Comments

@kbrock91
Copy link

kbrock91 commented Aug 6, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Unit tests fail if an input reference is a versioned model and the version is not explicitly defined in both the model AND the unit test.

Expected Behavior

The unit test should default to the latest version specified in the yml config for the input versioned model.

Steps To Reproduce

  1. In the dbt Cloud IDE, create a versioned mode (e.g. stg_tpch_orders, with versions 1 and 2) and second model (e.g. customer_tier) that references that versioned model, and a unit test on that second model (customer_tier) that references that versioned model as an input
  2. Run a dbt build --select customer_tier
  • If the customer_tier model does not reference a specific version, the unit test will always fail, regardless if a version is specified or not. For this scenario, see code below.
  • If the customer_tier model does reference a specific version, the unit test will fail if no version is specified. The unit test does work if you explicitly specify ref('stg_tpch_orders', v =2) in both places

Code for reference

schema.yml for stg_tpch_orders

- name: stg_tpch_orders
    description: staging layer for orders data
    versions:
      - v: 1
        columns:
          - include: all
            exclude: [comment]
        deprecation_date: 2024-8-31 00:00:00.00+00:00
      - v: 2
        columns:
          - include: all
    latest_version: 2

customer_tier.sql

{{
    config(
        materialized='table'
    )
}}

with customer as (
    select * from {{ ref('stg_tpch_customers') }}
),

orders as (
    select * from {{ ref('stg_tpch_orders') }}
),
final as (
    select
        customer.customer_key,
        sum(orders.total_price) as lifetime_value,
        case 
            when lifetime_value <= 200000 then 'tier1'
            when lifetime_value > 2000000 then 'tier2'
            when lifetime_value between 1000000 and 1999999 then 'tier3'
            when lifetime_value between 0 and 999999 then 'tier4' 
        end as tier_name, 
        max(orders.comment) as comment
    from customer
        inner join orders
            on customer.customer_key = orders.customer_key
    group by 1
)

select * from final

unit test:

unit_tests:
  - name: tiers_are_working
    description: "check if the logic for tiering is working correctly"
    model: customer_tier
    given:
      - input: ref('stg_tpch_customers')
        format: dict
        rows:
          - {customer_key: 629}
          - {customer_key: 4}
          - {customer_key: 1}
          - {customer_key: 26}

      - input: ref('stg_tpch_orders') 
        format: dict
        rows:
          - {customer_key: 629, total_price: 163443}
          - {customer_key: 4, total_price: 4134568}
          - {customer_key: 1,  total_price: 1428872}
          - {customer_key: 26, total_price: 418512}

    expect:
      rows:
        - {customer_key: 629,  tier_name: tier1}
        - {customer_key: 4,    tier_name: tier2}
        - {customer_key: 1,    tier_name: tier3}
        - {customer_key: 26,   tier_name: tier4}    

Relevant log output

14:20:52 Compilation Error in unit_test tiers_are_working (models/marts/intermediate/intermediate.yml)
  Unit_Test 'unit_test.analytics.customer_tier.tiers_are_working' (models/marts/intermediate/intermediate.yml) depends on a node named 'stg_tpch_orders' which was not found

Environment

- OS:
- Python:
- dbt: Latest Version in dbt Cloud IDE

Which database adapter are you using with dbt?

No response

Additional Context

No response

@kbrock91 kbrock91 added bug Something isn't working triage labels Aug 6, 2024
@dbeatty10 dbeatty10 added model_versions unit tests Issues related to built-in dbt unit testing functionality labels Aug 6, 2024
@dbeatty10
Copy link
Contributor

dbeatty10 commented Aug 9, 2024

EDIT: ignore this comment below; it worked for me when a tried it again today (2024-08-29).


With dbt-core 1.8.3, even specifying it in both places didn't work for me. See below for details.

models/fct_orders.sql

select *
from {{ ref('stg_orders', v=2) }}

models/_unit_tests.yml

unit_tests:
  - name: test_10528
    model: fct_orders
    given:
      - input: ref('stg_orders', v=2) 
        format: dict
        rows:
          - {id: 2}
    expect:
        rows:
          - {id: 2}

models/_models.yml

models:
  - name: stg_orders
    versions:
      - v: 1
        columns:
          - include: all
            exclude: [added_column]
      - v: 2
        columns:
          - include: all
    latest_version: 2

models/stg_orders_v1.sql

select 1 as id

models/stg_orders_v2.sql

select 1 as id, 2 as added_column

@dbeatty10
Copy link
Contributor

The reprex below has two cases:

  1. test_10528_a: model does not reference a specific version
  2. test_10528_b: model does reference a specific version but it is not the latest version
  3. test_10528_c: model does reference a specific version and it is the latest version

The first is a simplification of the example given in this bug report.
The second was described in this bug report and also reported in #10623.
The third works fine without issues. This scenario still worked when a prerelease version of the model was added to the project.

Reprex

models/_unit_tests.yml

unit_tests:

  #
  - name: test_10528_a
    description: model **does not** reference a specific version
    model: fct_orders_a
    given:
      - input: ref('stg_orders')
        format: dict
        rows:
          - {id: 2}
    expect:
        rows:
          - {id: 2}

  #
  - name: test_10528_b
    description: model **does** reference a specific version **but** it is not the latest version
    model: fct_orders_b
    given:
      # This will work if updated to ref('stg_orders', v=1) though
      - input: ref('stg_orders')
        format: dict
        rows:
          - {id: 3}
    expect:
        rows:
          - {id: 3}

  #
  - name: test_10528_c
    description: model **does** reference a specific version **and** it is the latest version
    model: fct_orders_c
    given:
      - input: ref('stg_orders')
        format: dict
        rows:
          - {id: 4}
    expect:
        rows:
          - {id: 4}

models/_models.yml

models:
  - name: stg_orders
    latest_version: 2
    versions:
      - v: 1
        columns:
          - include: all
            exclude: [added_column]
      - v: 2
        columns:
          - include: all

models/stg_orders_v1.sql

select 1 as id

models/stg_orders_v2.sql

select 1 as id, 2 as added_column

models/fct_orders_a.sql

select *
from {{ ref('stg_orders') }}

models/fct_orders_b.sql

select *
from {{ ref('stg_orders', v=1) }}

models/fct_orders_c.sql

select *
from {{ ref('stg_orders', v=2) }}

Run these commands:

dbt run --empty
dbt test --select test_10528_a test_10528_b test_10528_c

Get this output:

$ dbt test --select test_10528_a test_10528_b test_10528_c

18:34:36  Running with dbt=1.8.0
18:34:39  Registered adapter: duckdb=1.8.3
18:34:39  Found 5 models, 1 analysis, 410 macros, 3 unit tests
18:34:39  
18:34:39  Concurrency: 1 threads (target='dev')
18:34:39  
18:34:39  1 of 3 START unit_test fct_orders_a::test_10528_a .............................. [RUN]
18:34:39  1 of 3 ERROR fct_orders_a::test_10528_a ........................................ [ERROR in 0.03s]
18:34:39  2 of 3 START unit_test fct_orders_b::test_10528_b .............................. [RUN]
18:34:39  2 of 3 ERROR fct_orders_b::test_10528_b ........................................ [ERROR in 0.01s]
18:34:39  3 of 3 START unit_test fct_orders_c::test_10528_c .............................. [RUN]
18:34:39  3 of 3 PASS fct_orders_c::test_10528_c ......................................... [PASS in 0.16s]
18:34:39  
18:34:39  Finished running 3 unit tests in 0 hours 0 minutes and 0.35 seconds (0.35s).
18:34:39  
18:34:39  Completed with 2 errors and 0 warnings:
18:34:39  
18:34:39    Compilation Error in unit_test test_10528_a (models/_unit_tests.yml)
  Unit_Test 'unit_test.my_project.fct_orders_a.test_10528_a' (models/_unit_tests.yml) depends on a node named 'stg_orders' which was not found
18:34:39  
18:34:39    Compilation Error in unit_test test_10528_b (models/_unit_tests.yml)
  Unit_Test 'unit_test.my_project.fct_orders_b.test_10528_b' (models/_unit_tests.yml) depends on a node named 'stg_orders' with version '1' which was not found
18:34:39  
18:34:39  Done. PASS=1 WARN=0 ERROR=2 SKIP=0 TOTAL=3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model_versions unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants