Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt unit tests raise an error when QUOTED_IDENTIFIERS_IGNORE_CASE = true in Snowflake #1160

Open
2 tasks done
ivanroquev opened this issue Jun 12, 2024 · 13 comments
Open
2 tasks done
Labels
bug Something isn't working needs_spike unit tests Issues related to built-in dbt unit testing functionality

Comments

@ivanroquev
Copy link

ivanroquev commented Jun 12, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

No significative log output without log-level debug and the error is unexpected.

14:14:13 Finished running 4 data tests, 1 unit test, 1 project hook in 0 hours 0 minutes and 9.68 seconds (9.68s).
14:14:13 Command end result
14:14:13
14:14:13 Completed with 1 error and 0 warnings:
14:14:13
14:14:13 'actual_or_expected'
14:14:13
14:14:13 Done. PASS=4 WARN=0 ERROR=1 SKIP=0 TOTAL=5

Expected Behavior

Complete the test successfully like:

06:57:00 Finished running 4 data tests, 1 unit test, 1 project hook in 0 hours 0 minutes and 11.69 seconds (11.69s).
06:57:00
06:57:00 Completed successfully
06:57:00
06:57:00 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5

Steps To Reproduce

With this config:

models:
  - name: fact_chargers
    description: This table has basic information about charges
    columns:
      - name: country
        data_tests:
          - accepted_values:
              values: [
              'LU',
              'BE',
          ]
      - name: transaction_id
        data_tests:
          - unique
          - not_null
      - name: external_id
      - name: promotion_id
      - name: is_promotion_applied
        data_tests:
          - accepted_values:
              name: is_promotion_applied_accepted_values
              values: [ true, false ]


unit_tests:
  - name: test_fact_chargers_external_id
    description: "Unit test FACT_CHARGERS"
    model: fact_chargers
    overrides:
      macros:
        # unit test this model in "full refresh" mode
        is_incremental: true
    given:
      - input: source("DW", "fact_chargers")
        rows:
           - {
               transaction_id: transaction1,
               business_date: 2024-06-11,
               external_id: null,
               promotion_id: null,
               is_promotion_applied: false,
             }
           - {
              transaction_id: transaction2,
              business_date: 2024-06-11,
              external_id: null,
              promotion_id: null,
              is_promotion_applied: false,
             }
      - input: source("DW", "fact_promotions")
        rows:
          - {subtype: redemption, channel: emobility, external_id: ext-abcd, business_date: 2024-06-11, coupon_id: abcd1234, ticket_id: transaction1}
          - {subtype: redemption, channel: emobility, external_id: ext-addd, business_date: 2024-06-11, coupon_id: addd1235, ticket_id: transaction2}
          - {subtype: assignation, channel: emobility, external_id: ext-asig, business_date: 2024-06-11, coupon_id: addd1235, ticket_id: transaction1}
          - {subtype: redemption, channel: store, external_id: ext-store-redemption, business_date: 2024-06-11, coupon_id: addd1235, ticket_id: transaction2}
    expect:
      rows:
        - {
            transaction_id: transaction1,
            business_date: 2024-06-11,
            external_id: ext-abcd,
            promotion_id: abcd1234,
            is_promotion_applied: true,
          }
        - {
            transaction_id: transaction2,
            business_date: 2024-06-11,
            external_id: ext-addd,
            promotion_id: addd1235,
            is_promotion_applied: true,
          }

Running the test like that:

dbt test --select fact_chargers -t stg --vars '{"from_date_key": "2024-06-10", "to_date_key": "2024-06-11"}'

With this output:

+--------------+-------------+-----------+------------+--------------------+------------------+
|TRANSACTION_ID|BUSINESS_DATE|EXTERNAL_ID|PROMOTION_ID|IS_PROMOTION_APPLIED|ACTUAL_OR_EXPECTED|
+--------------+-------------+-----------+------------+--------------------+------------------+
|transaction1  |2024-06-11   |ext-abcd   |abcd1234    |true                |actual            |
|transaction2  |2024-06-11   |ext-addd   |addd1235    |true                |actual            |
|transaction1  |2024-06-11   |ext-abcd   |abcd1234    |true                |expected          |
|transaction2  |2024-06-11   |ext-addd   |addd1235    |true                |expected          |
+--------------+-------------+-----------+------------+--------------------+------------------+

Relevant log output

14:14:12  SQL status: SUCCESS 4 in 1.0 seconds
14:14:12  Applying DROP to: DEVELOPMENT.star_model.test_fact_chargers_external_id__dbt_tmp
14:14:12  Using snowflake connection "unit_test.bs.fact_chargers.test_fact_chargers_external_id"
14:14:12  On unit_test.bs.fact_chargers.test_fact_chargers_external_id: /* {"app": "dbt", "dbt_version": "1.8.2", "profile_name": "BS", "target_name": "stg", "node_id": "unit_test.bs.fact_chargers.test_fact_chargers_external_id"} */
drop table if exists DEVELOPMENT.star_model.test_fact_chargers_external_id__dbt_tmp cascade
14:14:12  SQL status: SUCCESS 1 in 0.0 seconds
 -------- SCHEMA ------------- 
| column               | data_type |
| -------------------- | --------- |
| TRANSACTION_ID       | Text      |
| BUSINESS_DATE        | Date      |
| EXTERNAL_ID          | Text      |
| PROMOTION_ID         | Text      |
| IS_PROMOTION_APPLIED | Boolean   |
| ACTUAL_OR_EXPECTED   | Text      |

actual
14:14:12  On unit_test.bs.fact_chargers.test_fact_chargers_external_id: Close
14:14:13  Unhandled error while executing 
'actual_or_expected'
14:14:13  Traceback (most recent call last):
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/base.py", line 368, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/base.py", line 314, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/base.py", line 415, in run
    return self.execute(compiled_node, manifest)
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/test.py", line 264, in execute
    unit_test_node, unit_test_result = self.execute_unit_test(test, manifest)
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/test.py", line 237, in execute_unit_test
    actual = self._get_unit_test_agate_table(table, "actual")
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/test.py", line 339, in _get_unit_test_agate_table
    unit_test_table = result_table.where(
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/agate/table/where.py", line 21, in where
    if test(row):
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/dbt/task/test.py", line 340, in <lambda>
    lambda row: row["actual_or_expected"] == actual_or_expected
  File "/Users/ivan.roque/Library/Python/3.9/lib/python/site-packages/agate/mapped_sequence.py", line 88, in __getitem__
    return self.dict()[key]
KeyError: 'actual_or_expected'

14:14:13  5 of 5 ERROR fact_chargers::test_fact_chargers_external_id ..................... [ERROR in 3.62s]

Environment

- OS: MacOS
- Python: 3.9
- dbt: 1.8.2
- dbt-snowflake: 1.8.2

Which database adapter are you using with dbt?

snowflake

Additional Context

I've had to replace the _get_unit_test_agate_table method to parse all dict keys to lowercase like this:

def _get_unit_test_agate_table(self, result_table, actual_or_expected: str):
    import agate
    # Convertir todas las claves de cada fila a lowercase
    def lowercase_keys(row):
        return {k.lower(): v for k, v in row.items()}

    # Convertir todas las filas del result_table a lowercase
    lowercased_rows = [lowercase_keys(row) for row in result_table.rows]

    filtered_rows = [
        row for row in lowercased_rows if row.get("actual_or_expected") == actual_or_expected.lower()
    ]

    # Convertir las filas filtradas de nuevo a un objeto agate.Table
    unit_test_table = agate.Table.from_object(filtered_rows)

    columns = list(unit_test_table.columns.keys())
    columns.remove("actual_or_expected")

    return unit_test_table.select(columns)

The original method is:

def _get_unit_test_agate_table(self, result_table, actual_or_expected: str):
    unit_test_table = result_table.where(
        lambda row: row["actual_or_expected"].lower() == actual_or_expected
    )
    columns = list(unit_test_table.columns.keys())
    columns.remove("actual_or_expected")
    return unit_test_table.select(columns)
@ivanroquev ivanroquev added bug Something isn't working triage labels Jun 12, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Jun 12, 2024
@jocelyncui22
Copy link

Hi any updates on this error? Our team is encountering similar issues to this.

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Jul 9, 2024

On Snowflake, if param QUOTED_IDENTIFIERS_IGNORE_CASE is true then unit tests will fail (not expected).

# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table

on-run-start: "{{ check_param() }}"

# models/tests.yml
unit_tests:
  - name: test_is_valid
    model: bar
    given:
      - input: ref('foo')
        rows:
          - {id: 1}
    expect:
      rows: 
          - {id: 2}
-- models/foo.sql
select 1 id

-- models/bar.sql
select id * 2 as id from {{ ref('foo') }}

-- macros/check_param.sql
{% macro check_param() %}
    {% set r = run_query("show parameters like 'QUOTED_IDENTIFIERS_IGNORE_CASE' in account;") %}
    {% if execute %}
    {% do r.print_table() %}
    {% endif %}
{% endmacro %}

(1) First make sure param is set to false in Snowflake:

image

(2) Build:

$ dbt --debug build

03:35:14  Running 1 on-run-start hook
03:35:14  Using snowflake connection "master"
03:35:14  On master: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "connection_name": "master"} */
show parameters like 'QUOTED_IDENTIFIERS_IGNORE_CASE' in account;
03:35:14  Opening a new connection, currently in state init
03:35:16  SQL status: SUCCESS 1 in 3.0 seconds
| key                  | value | default | level   | description          | type    |
| -------------------- | ----- | ------- | ------- | -------------------- | ------- |
| QUOTED_IDENTIFIER... | false | false   | ACCOUNT | If true, the case... | BOOLEAN |
03:35:16  Writing injected SQL for node "operation.my_dbt_project.my_dbt_project-on-run-start-0"
03:35:16  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
03:35:16  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
03:35:16  
03:35:16  On master: Close
03:35:17  Concurrency: 1 threads (target='sf-eu')
03:35:17  
03:35:17  Began running node model.my_dbt_project.foo
03:35:17  1 of 3 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:35:17  Re-using an available connection from the pool (formerly list_analytics_dbt_jyeo, now model.my_dbt_project.foo)
03:35:17  Began compiling node model.my_dbt_project.foo
03:35:17  Writing injected SQL for node "model.my_dbt_project.foo"
03:35:17  Began executing node model.my_dbt_project.foo
03:35:17  Writing runtime sql for node "model.my_dbt_project.foo"
03:35:17  Using snowflake connection "model.my_dbt_project.foo"
03:35:17  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table analytics.dbt_jyeo.foo
         as
        (select 1 id
        );
03:35:17  Opening a new connection, currently in state closed
03:35:20  SQL status: SUCCESS 1 in 3.0 seconds
03:35:20  On model.my_dbt_project.foo: Close
03:35:21  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '72b6ef92-cde4-4239-bb45-03aff682b5b5', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x155241c10>]}
03:35:21  1 of 3 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.31s]
03:35:21  Finished running node model.my_dbt_project.foo
03:35:21  Began running node unit_test.my_dbt_project.bar.test_is_valid
03:35:21  2 of 3 START unit_test bar::test_is_valid ...................................... [RUN]
03:35:21  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now unit_test.my_dbt_project.bar.test_is_valid)
03:35:21  Began compiling node unit_test.my_dbt_project.bar.test_is_valid
03:35:21  Began executing node unit_test.my_dbt_project.bar.test_is_valid
03:35:21  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:35:21  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
describe table analytics.dbt_jyeo.foo
03:35:21  Opening a new connection, currently in state closed
03:35:24  SQL status: SUCCESS 1 in 2.0 seconds
03:35:24  Writing injected SQL for node "unit_test.my_dbt_project.bar.test_is_valid"
03:35:24  Writing injected SQL for node "unit_test.my_dbt_project.bar.test_is_valid"
03:35:24  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:35:24  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
create or replace temporary table analytics.dbt_jyeo.test_is_valid__dbt_tmp
         as
        (select * from (
        with __dbt__cte__foo as (

-- Fixture for foo
select 
    
        try_cast('1' as NUMBER(1,0))
     as id
) select id * 2 as id from __dbt__cte__foo
    ) as __dbt_sbq
    where false
    limit 0

        );
03:35:25  SQL status: SUCCESS 1 in 1.0 seconds
03:35:25  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:35:25  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
describe table analytics.dbt_jyeo.test_is_valid__dbt_tmp
03:35:25  SQL status: SUCCESS 1 in 0.0 seconds
03:35:25  Writing runtime sql for node "unit_test.my_dbt_project.bar.test_is_valid"
03:35:25  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:35:25  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
-- Build actual result given inputs
with dbt_internal_unit_test_actual as (
  select
    id, 'actual' as "actual_or_expected"
  from (
    with __dbt__cte__foo as (

-- Fixture for foo
select 
    
        try_cast('1' as NUMBER(1,0))
     as id
) select id * 2 as id from __dbt__cte__foo
  ) _dbt_internal_unit_test_actual
),
-- Build expected result
dbt_internal_unit_test_expected as (
  select
    id, 'expected' as "actual_or_expected"
  from (
    select 
    
        try_cast('2' as NUMBER(2,0))
     as id
  ) _dbt_internal_unit_test_expected
)
-- Union actual and expected results
select * from dbt_internal_unit_test_actual
union all
select * from dbt_internal_unit_test_expected
03:35:26  SQL status: SUCCESS 2 in 1.0 seconds
03:35:26  Applying DROP to: analytics.dbt_jyeo.test_is_valid__dbt_tmp
03:35:26  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:35:26  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
drop table if exists analytics.dbt_jyeo.test_is_valid__dbt_tmp cascade
03:35:26  SQL status: SUCCESS 1 in 0.0 seconds
03:35:26  On unit_test.my_dbt_project.bar.test_is_valid: Close
03:35:27  2 of 3 PASS bar::test_is_valid ................................................. [PASS in 5.82s]
03:35:27  Finished running node unit_test.my_dbt_project.bar.test_is_valid
03:35:27  Began running node model.my_dbt_project.bar
03:35:27  3 of 3 START sql table model dbt_jyeo.bar ...................................... [RUN]
03:35:27  Re-using an available connection from the pool (formerly unit_test.my_dbt_project.bar.test_is_valid, now model.my_dbt_project.bar)
03:35:27  Began compiling node model.my_dbt_project.bar
03:35:27  Writing injected SQL for node "model.my_dbt_project.bar"
03:35:27  Began executing node model.my_dbt_project.bar
03:35:27  Writing runtime sql for node "model.my_dbt_project.bar"
03:35:27  Using snowflake connection "model.my_dbt_project.bar"
03:35:27  On model.my_dbt_project.bar: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "model.my_dbt_project.bar"} */
create or replace transient table analytics.dbt_jyeo.bar
         as
        (select id * 2 as id from analytics.dbt_jyeo.foo
        );
03:35:27  Opening a new connection, currently in state closed
03:35:32  SQL status: SUCCESS 1 in 5.0 seconds
03:35:32  On model.my_dbt_project.bar: Close
03:35:33  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '72b6ef92-cde4-4239-bb45-03aff682b5b5', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1546cc8d0>]}
03:35:33  3 of 3 OK created sql table model dbt_jyeo.bar ................................. [SUCCESS 1 in 5.85s]
  1. Set the param to true:

image

  1. Build again:
$ dbt --debug build
03:36:50  Running 1 on-run-start hook
03:36:50  Using snowflake connection "master"
03:36:50  On master: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "connection_name": "master"} */
show parameters like 'QUOTED_IDENTIFIERS_IGNORE_CASE' in account;
03:36:50  Opening a new connection, currently in state init
03:36:53  SQL status: SUCCESS 1 in 2.0 seconds
| key                  | value | default | level   | description          | type    |
| -------------------- | ----- | ------- | ------- | -------------------- | ------- |
| QUOTED_IDENTIFIER... | true  | false   | ACCOUNT | If true, the case... | BOOLEAN |
03:36:53  Writing injected SQL for node "operation.my_dbt_project.my_dbt_project-on-run-start-0"
03:36:53  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
03:36:53  1 of 1 OK hook: my_dbt_project.on-run-start.0 .................................. [OK in 0.00s]
03:36:53  
03:36:53  On master: Close
03:36:54  Concurrency: 1 threads (target='sf-eu')
03:36:54  
03:36:54  Began running node model.my_dbt_project.foo
03:36:54  1 of 3 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:36:54  Re-using an available connection from the pool (formerly list_analytics, now model.my_dbt_project.foo)
03:36:54  Began compiling node model.my_dbt_project.foo
03:36:54  Writing injected SQL for node "model.my_dbt_project.foo"
03:36:54  Began executing node model.my_dbt_project.foo
03:36:54  Writing runtime sql for node "model.my_dbt_project.foo"
03:36:54  Using snowflake connection "model.my_dbt_project.foo"
03:36:54  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table analytics.dbt_jyeo.foo
         as
        (select 1 id
        );
03:36:54  Opening a new connection, currently in state closed
03:36:58  SQL status: SUCCESS 1 in 4.0 seconds
03:36:58  On model.my_dbt_project.foo: Close
03:36:58  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '36b5e6c9-f0d6-4016-a8e8-e1737f793857', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13ff9b650>]}
03:36:58  1 of 3 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 4.97s]
03:36:58  Finished running node model.my_dbt_project.foo
03:36:58  Began running node unit_test.my_dbt_project.bar.test_is_valid
03:36:58  2 of 3 START unit_test bar::test_is_valid ...................................... [RUN]
03:36:58  Re-using an available connection from the pool (formerly model.my_dbt_project.foo, now unit_test.my_dbt_project.bar.test_is_valid)
03:36:58  Began compiling node unit_test.my_dbt_project.bar.test_is_valid
03:36:58  Began executing node unit_test.my_dbt_project.bar.test_is_valid
03:36:59  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:36:59  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
describe table analytics.dbt_jyeo.foo
03:36:59  Opening a new connection, currently in state closed
03:37:03  SQL status: SUCCESS 1 in 4.0 seconds
03:37:03  Writing injected SQL for node "unit_test.my_dbt_project.bar.test_is_valid"
03:37:03  Writing injected SQL for node "unit_test.my_dbt_project.bar.test_is_valid"
03:37:03  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:37:03  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
create or replace temporary table analytics.dbt_jyeo.test_is_valid__dbt_tmp
         as
        (select * from (
        with __dbt__cte__foo as (

-- Fixture for foo
select 
    
        try_cast('1' as NUMBER(1,0))
     as id
) select id * 2 as id from __dbt__cte__foo
    ) as __dbt_sbq
    where false
    limit 0

        );
03:37:04  SQL status: SUCCESS 1 in 1.0 seconds
03:37:04  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:37:04  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
describe table analytics.dbt_jyeo.test_is_valid__dbt_tmp
03:37:04  SQL status: SUCCESS 1 in 0.0 seconds
03:37:04  Writing runtime sql for node "unit_test.my_dbt_project.bar.test_is_valid"
03:37:04  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:37:04  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
-- Build actual result given inputs
with dbt_internal_unit_test_actual as (
  select
    id, 'actual' as "actual_or_expected"
  from (
    with __dbt__cte__foo as (

-- Fixture for foo
select 
    
        try_cast('1' as NUMBER(1,0))
     as id
) select id * 2 as id from __dbt__cte__foo
  ) _dbt_internal_unit_test_actual
),
-- Build expected result
dbt_internal_unit_test_expected as (
  select
    id, 'expected' as "actual_or_expected"
  from (
    select 
    
        try_cast('2' as NUMBER(2,0))
     as id
  ) _dbt_internal_unit_test_expected
)
-- Union actual and expected results
select * from dbt_internal_unit_test_actual
union all
select * from dbt_internal_unit_test_expected
03:37:05  SQL status: SUCCESS 2 in 0.0 seconds
03:37:05  Applying DROP to: analytics.dbt_jyeo.test_is_valid__dbt_tmp
03:37:05  Using snowflake connection "unit_test.my_dbt_project.bar.test_is_valid"
03:37:05  On unit_test.my_dbt_project.bar.test_is_valid: /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "all", "target_name": "sf-eu", "node_id": "unit_test.my_dbt_project.bar.test_is_valid"} */
drop table if exists analytics.dbt_jyeo.test_is_valid__dbt_tmp cascade
03:37:05  SQL status: SUCCESS 1 in 1.0 seconds
03:37:05  On unit_test.my_dbt_project.bar.test_is_valid: Close
03:37:06  Unhandled error while executing 
'actual_or_expected'
03:37:06  Traceback (most recent call last):
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/base.py", line 368, in safe_run
    result = self.compile_and_execute(manifest, ctx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/base.py", line 314, in compile_and_execute
    result = self.run(ctx.node, manifest)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/base.py", line 415, in run
    return self.execute(compiled_node, manifest)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/test.py", line 264, in execute
    unit_test_node, unit_test_result = self.execute_unit_test(test, manifest)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/test.py", line 237, in execute_unit_test
    actual = self._get_unit_test_agate_table(table, "actual")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/test.py", line 336, in _get_unit_test_agate_table
    unit_test_table = result_table.where(
                      ^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/agate/table/where.py", line 21, in where
    if test(row):
       ^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/dbt/task/test.py", line 337, in <lambda>
    lambda row: row["actual_or_expected"] == actual_or_expected
                ~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.8.latest/lib/python3.11/site-packages/agate/mapped_sequence.py", line 88, in __getitem__
    return self.dict()[key]
           ~~~~~~~~~~~^^^^^
KeyError: 'actual_or_expected'

03:37:06  2 of 3 ERROR bar::test_is_valid ................................................ [ERROR in 7.74s]
03:37:06  Finished running node unit_test.my_dbt_project.bar.test_is_valid
03:37:06  Began running node model.my_dbt_project.bar
03:37:06  3 of 3 SKIP relation dbt_jyeo.bar .............................................. [SKIP]
03:37:06  Finished running node model.my_dbt_project.bar
03:37:06  Connection 'master' was properly closed.
03:37:06  Connection 'unit_test.my_dbt_project.bar.test_is_valid' was properly closed.
03:37:06  Connection 'list_analytics_dbt_jyeo' was properly closed.
03:37:06  
03:37:06  Finished running 2 table models, 1 unit test, 1 project hook in 0 hours 0 minutes and 23.85 seconds (23.85s).
03:37:06  Command end result
03:37:06  
03:37:06  Completed with 1 error and 0 warnings:
03:37:06  
03:37:06    'actual_or_expected'
03:37:06  
03:37:06  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3

The test retrieval query is the same but the resulting "actual_or_expected" column name is lower or uppercased depending on the param.

image

image

When the param is on - the returned result set is uppercased "ACTUAL_OR_EXPECTED" (because the explicit quote to lowercase it is ignored) therefore it is not retrievable by its lower case key "actual_or_expected":

https://github.com/dbt-labs/dbt-core/blob/88b8b10df170d229b442c520209681bb9348065f/core/dbt/task/test.py#L342

Workaround:

Turn off the param in Snowflake for users who don't have direct access to edit the dbt-core source code like ivan above (i.e. pretty much all dbt Cloud users).

See workaround below where we override the default built in macro that does unit test.

@ivanroquev
Copy link
Author

Hello, in my case the param is TRUE, thanks for the information, I am not admin and I can't set it to false.
+------------------------------+-----+-------+-------+--------------------------------------------------+-------+ |key |value|default|level |description |type | +------------------------------+-----+-------+-------+--------------------------------------------------+-------+ |QUOTED_IDENTIFIERS_IGNORE_CASE|true |false |ACCOUNT|If true, the case of quoted identifiers is ignored|BOOLEAN| +------------------------------+-----+-------+-------+--------------------------------------------------+-------+

@i-dalbo
Copy link

i-dalbo commented Jul 18, 2024

I was just testing now the unit test feature, and I stumbled on the same error

10:07:38  1 of 1 START unit_test redacted_model_name  [RUN]
10:07:40  Unhandled error while executing
'actual_or_expected'
10:07:40  1 of 1 ERROR redacted_model_name::redacted_unit_test_name [ERROR in 2.85s]
10:07:41  Finished running 1 unit test, 3 project hooks in 0 hours 0 minutes and 8.33 seconds (8.33s).
10:07:42
10:07:42  Completed with 1 error and 0 warnings:
10:07:42
10:07:42    'actual_or_expected'

This is the system config

- OS: Linux
- Python: 3.9
- dbt: 1.8.3
- dbt-snowflake: 1.8.3

I don't want and don't intend to change any account parameters for the sake of unit testing without evaluating any side effects before, so I would expected this to be solved on the dbt side. Is there any timeline for solving this bug?

@sqnico
Copy link

sqnico commented Jul 18, 2024

I tried a workaround, by altering the QUOTED_IDENTIFIERS_IGNORE_CASE parameter at the session level. It did not work either.

#dbt_project.yml

on-run-start:
  [
    "ALTER SESSION SET QUOTED_IDENTIFIERS_IGNORE_CASE = FALSE",
    "{{ check_param() }}",
  ]
-- macros/check_params.sql

{% macro check_param() %}
    {% set r = run_query("show parameters like 'QUOTED_IDENTIFIERS_IGNORE_CASE';") %}
    {% if execute %} {% do r.print_table() %} {% endif %}
{% endmacro %}

Log output (same error: Unhandled error while executing 'actual_or_expected'):

18:04:43  Running 2 on-run-start hooks
18:04:43  1 of 2 START hook: hardware_dbt.on-run-start.0 ................................. [RUN]
18:04:43  1 of 2 OK hook: hardware_dbt.on-run-start.0 .................................... [SUCCESS 1 in 0.48s]
| key                  | value | default | level   | description          | type    |
| -------------------- | ----- | ------- | ------- | -------------------- | ------- |
| QUOTED_IDENTIFIER... | false | false   | SESSION | If true, the case... | BOOLEAN |
18:04:44  2 of 2 START hook: hardware_dbt.on-run-start.1 ................................. [RUN]
18:04:44  2 of 2 OK hook: hardware_dbt.on-run-start.1 .................................... [OK in 0.00s]
18:04:44  
18:04:44  Concurrency: 4 threads (target='dev')
18:04:44  
18:04:44  1 of 1 START unit_test fact_orders_hourly::test_is_scac_code_valid ............. [RUN]
18:04:47  Unhandled error while executing 
'actual_or_expected'
18:04:47  1 of 1 ERROR fact_orders_hourly::test_is_scac_code_valid ....................... [ERROR in 3.45s]
18:04:47  
18:04:47  Running 1 on-run-end hook
18:04:48  1 of 1 START hook: hardware_dbt.on-run-end.0 ................................... [RUN]
18:04:48  1 of 1 OK hook: hardware_dbt.on-run-end.0 ...................................... [OK in 0.00s]
18:04:48  
18:04:49  
18:04:49  Finished running 1 unit test, 3 project hooks in 0 hours 0 minutes and 21.64 seconds (21.64s).
18:04:49  
18:04:49  Completed with 1 error and 0 warnings:
18:04:49  
18:04:49    'actual_or_expected'
18:04:49  
18:04:49  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Config:

- OS: MacOS
- Python: 3.10.10
- dbt-adapters: 1.3.2
- dbt-common: 1.5.0
- dbt-core: 1.8.3
- dbt-snowflake: 1.8.3

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Jul 18, 2024

@sqnico i think that's cause of dbt's multi-threaded / connection pooling mechanism or something. When the unit test runs it's a different session from the session that ran the alter session statement. Probably can see that in Snowflake Query History's session id column.

@sqnico
Copy link

sqnico commented Jul 18, 2024

@jeremyyeo, thank you for your message. I checked, you are correct (see below)

CleanShot 2024-07-18 at 14 23 50

@jeremyyeo
Copy link
Contributor

@sqnico can you add this macro to your project and try again?

-- macros/get_unit_test.sql

{% macro get_unit_test_sql(main_sql, expected_fixture_sql, expected_column_names) -%}
-- For accounts that have this param set to true, we need to make it false for the query.
-- https://github.com/dbt-labs/dbt-snowflake/issues/1160
alter session set QUOTED_IDENTIFIERS_IGNORE_CASE = false;

-- Build actual result given inputs
with dbt_internal_unit_test_actual as (
  select
    {% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%},{% endif %}{%- endfor -%}, {{ dbt.string_literal("actual") }} as {{ adapter.quote("actual_or_expected") }}
  from (
    {{ main_sql }}
  ) _dbt_internal_unit_test_actual
),
-- Build expected result
dbt_internal_unit_test_expected as (
  select
    {% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%}, {% endif %}{%- endfor -%}, {{ dbt.string_literal("expected") }} as {{ adapter.quote("actual_or_expected") }}
  from (
    {{ expected_fixture_sql }}
  ) _dbt_internal_unit_test_expected
)
-- Union actual and expected results
select * from dbt_internal_unit_test_actual
union all
select * from dbt_internal_unit_test_expected
{%- endmacro %}

I'm overriding the default one from https://github.com/dbt-labs/dbt-adapters/blob/4b3806818f3831bbd0206a6b22fdcb4fe2a505c5/dbt/include/global_project/macros/materializations/tests/helpers.sql#L23 and adding the session param altering within it - this would make it use the same "session".

@sqnico
Copy link

sqnico commented Jul 18, 2024

@jeremyyeo Amazing, this is a great workaround! It works! Thank you so much for your help

21:50:45  Running with dbt=1.8.3
21:50:46  Registered adapter: snowflake=1.8.3
21:50:47  Found 109 models, 2 snapshots, 8 seeds, 1 operation, 470 data tests, 93 sources, 22 exposures, 1008 macros, 1 unit test
21:50:47  
21:50:48  Concurrency: 4 threads (target='dev')
21:50:48  
21:50:48  1 of 1 START unit_test fact_orders_hourly::test_is_scac_code_valid ............. [RUN]
21:50:51  1 of 1 PASS fact_orders_hourly::test_is_scac_code_valid ........................ [PASS in 3.08s]
21:50:51  
21:50:51  Running 1 on-run-end hook
21:50:52  1 of 1 START hook: hardware_dbt.on-run-end.0 ................................... [RUN]
21:50:52  1 of 1 OK hook: hardware_dbt.on-run-end.0 ...................................... [OK in 0.00s]
21:50:52  
21:50:52  
21:50:52  Finished running 1 unit test, 1 project hook in 0 hours 0 minutes and 5.30 seconds (5.30s).
21:50:52  
21:50:52  Completed successfully
21:50:52  
21:50:52  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

@dbeatty10 dbeatty10 changed the title [Bug] dbt unit test - KeyError: 'actual_or_expected' [Bug] dbt unit tests raise an error when QUOTED_IDENTIFIERS_IGNORE_CASE = true in Snowflake Jul 18, 2024
@dbeatty10 dbeatty10 removed the triage label Jul 18, 2024
@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Aug 5, 2024

@aranke mentioned #1107 this is related to fix it here.

@aranke
Copy link
Member

aranke commented Aug 5, 2024

I reached out to @dbt-labs/adapters internally to see if they have a preferred solution here.

@aranke aranke transferred this issue from dbt-labs/dbt-core Aug 6, 2024
@aranke
Copy link
Member

aranke commented Aug 6, 2024

From offline sync with @mikealfare @colin-rogers-dbt, we determined that this is a Snowflake issue so transferred over.

@colin-rogers-dbt
Copy link
Contributor

Will address as part of #1181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs_spike unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

9 participants