[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

ccharlesgb · 2023-10-25T12:40:21Z

Is this a regression in a recent version of dbt-core?

I believe this is a regression in dbt-core functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

We have an empty seed file in one of our dbt projects (Sometimes we need to manually intervene and add lines to it) that has been running successfully for a while now but after upgrading to dbt 1.6.6 you get an unhandled exception when compiling:

12:21:14  Running with dbt=1.6.6
12:21:15  Registered adapter: bigquery=1.6.7
12:21:15  Unable to do partial parsing because of a version mismatch
12:21:16  Found 3 models, 2 seeds, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
12:21:16  
12:21:25  Concurrency: 8 threads (target='integration-local')
12:21:25  
12:21:25  2 of 2 START seed file dry_run.my_seed ............................................................................................................................................. [RUN]
12:21:25  1 of 2 START seed file dry_run.empty_seed .......................................................................................................................................... [RUN]
12:21:25  Unhandled error while executing 
'NoneType' object has no attribute 'upper'
12:21:25  1 of 2 ERROR loading seed file dry_run.empty_seed .................................................................................................................................. [ERROR in 0.09s]
12:21:30  2 of 2 OK loaded seed file dry_run.my_seed ......................................................................................................................................... [INSERT 2 in 4.28s]
12:21:30  
12:21:30  Finished running 2 seeds in 0 hours 0 minutes and 13.56 seconds (13.56s).
12:21:30  
12:21:30  Completed with 1 error and 0 warnings:
12:21:30  
12:21:30    'NoneType' object has no attribute 'upper'
12:21:30  
12:21:30  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

Expected/Previous Behavior

In dbt 1.5.x it works fine:

12:24:09  Running with dbt=1.5.8
12:24:10  Registered adapter: bigquery=1.5.7
12:24:10  Unable to do partial parsing because of a version mismatch
12:24:10  Found 3 models, 0 tests, 0 snapshots, 0 analyses, 360 macros, 0 operations, 2 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
12:24:10  
12:24:11  Concurrency: 8 threads (target='integration-local')
12:24:11  
12:24:11  1 of 2 START seed file dry_run.empty_seed .......................................................................................................................................... [RUN]
12:24:11  2 of 2 START seed file dry_run.my_seed ............................................................................................................................................. [RUN]
12:24:15  2 of 2 OK loaded seed file dry_run.my_seed ......................................................................................................................................... [INSERT 2 in 3.76s]
12:24:16  1 of 2 OK loaded seed file dry_run.empty_seed ...................................................................................................................................... [INSERT 0 in 4.34s]
12:24:16  
12:24:16  Finished running 2 seeds in 0 hours 0 minutes and 5.52 seconds (5.52s).
12:24:16  
12:24:16  Completed successfully
12:24:16  
12:24:16  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Steps To Reproduce

In dbt 1.6.x (I am using Big Query) create a project and create a seed with just headings:

#my_seed.csv
a,b,c

Then when you run dbt build/seed you will see the error.

Relevant log output

Already posted.

Environment

- OS: Mac OS
- Python: 3.8.16
- dbt (working version): 1.5.8
- dbt (regression version): 1.6.6

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

The text was updated successfully, but these errors were encountered:

graciegoheen · 2023-10-25T17:15:33Z

Hi! So to confirm, the expected behavior is that when you have a csv file with just the headings:

#my_seed.csv
a,b,c

dbt will create an empty table in the warehouse?

I'm seeing a slightly different error for snowflake, but only on --full-refresh:

Interestingly, I do not see this error when running on snowflake 1.6.1 and core 1.6.0:

dataders · 2023-10-25T17:29:26Z

Happens on both snowflake and big query.

dbt-snowflake

dbt -d  seed -s empders --full-refresh

> Database Error in seed empders (seeds/empders.csv)
  002040 (42601): SQL compilation error:
  Unsupported data type 'NONE'.

There's no stacktrace here, but looking at Snowflake query history, this is the generated query that fails, obviously

create table TEST_DB.dbt_ajs.empders (col1 None,col2 None)

dbt-bigquery

here's dbt-bigquery stacktrace (relevant exract below).

BigQueryAdapter.load_dataframe()

AttributeError: 'NoneType' object has no attribute 'upper'

  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/dbt/adapters/bigquery/impl.py", line 781, in load_dataframe
    load_config.schema = bq_schema
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/base.py", line 172, in __setattr__
    super(_JobConfig, self).__setattr__(name, value)
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/load.py", line 477, in schema
    [field.to_api_repr() for field in value],
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/load.py", line 477, in <listcomp>
    [field.to_api_repr() for field in value],
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/schema.py", line 281, in to_api_repr
    if self.field_type.upper() in _STRUCT_TYPES:
AttributeError: 'NoneType' object has no attribute 'upper'

ccharlesgb · 2023-10-25T17:54:54Z

Hi! So to confirm, the expected behavior is that when you have a csv file with just the headings:
#my_seed.csv
a,b,c
dbt will create an empty table in the warehouse?

Yes exactly we used to see dbt creating a blank table with 3 columns a, b & c

ccharlesgb · 2023-10-25T18:13:08Z

It looks like the field_type dbt 1.6 is passing to the BQ client is None. If I print the value of bq_schema here:

    def load_dataframe(self, database, schema, table_name, agate_table, column_override):
        bq_schema = self._agate_to_schema(agate_table, column_override)
        print("BQ-SCHEMA")
        print(bq_schema)

BQ-SCHEMA
[SchemaField('a', None, 'NULLABLE', None, None, (), None), SchemaField('b', None, 'NULLABLE', None, None, (), None), SchemaField('c', None, 'NULLABLE', None, None, (), None)]

In 1.5 it infers it as an Integer (I didn't specify the seed schema in the yaml):

BQ-SCHEMA
[SchemaField('a', 'INT64', 'NULLABLE', None, None, (), None), SchemaField('b', 'INT64', 'NULLABLE', None, None, (), None), SchemaField('c', 'INT64', 'NULLABLE', None, None, (), None)]

gshank · 2023-10-26T18:39:56Z

This regression was caused by #8561 / #8153, released in 1.6.3.

jan-benisek · 2023-10-31T15:36:20Z

Same happens in Redshift, after updating to 1.6.6
Any idea when is this going to be fixed?

This is my csv

"a","b","c","d","e","f","g"

When I add empty second row, it works

"a","b","c","d","e","f","g"
"","","","","","",""

but ofc I do not want that extra null row there.

martynydbt · 2023-10-31T16:58:09Z

@jan-benisek this issue is in our current sprint, so accounting for potential roll over: between 1.5-3 weeks.

ccharlesgb added bug Something isn't working regression triage labels Oct 25, 2023

github-actions bot changed the title ~~[Regression] Running empty seed file raises unhandled exception~~ [CT-3259] [Regression] Running empty seed file raises unhandled exception Oct 25, 2023

graciegoheen added backport 1.6.latest High Severity bug with significant impact that should be resolved in a reasonable timeframe backport 1.7.latest and removed triage labels Oct 25, 2023

graciegoheen added the Impact: Adapters label Oct 25, 2023

gshank self-assigned this Oct 26, 2023

gshank mentioned this issue Nov 6, 2023

Support new agate Integer data_type in adapter code #9004

Merged

5 tasks

gshank closed this as completed in #9004 Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

ccharlesgb commented Oct 25, 2023

graciegoheen commented Oct 25, 2023

dataders commented Oct 25, 2023 •

edited

Loading

ccharlesgb commented Oct 25, 2023

ccharlesgb commented Oct 25, 2023

gshank commented Oct 26, 2023

jan-benisek commented Oct 31, 2023 •

edited

Loading

martynydbt commented Oct 31, 2023

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

Comments

ccharlesgb commented Oct 25, 2023

Is this a regression in a recent version of dbt-core?

Current Behavior

Expected/Previous Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

graciegoheen commented Oct 25, 2023

dataders commented Oct 25, 2023 • edited Loading

dbt-snowflake

dbt-bigquery

ccharlesgb commented Oct 25, 2023

ccharlesgb commented Oct 25, 2023

gshank commented Oct 26, 2023

jan-benisek commented Oct 31, 2023 • edited Loading

martynydbt commented Oct 31, 2023

dataders commented Oct 25, 2023 •

edited

Loading

jan-benisek commented Oct 31, 2023 •

edited

Loading