Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

Closed
2 tasks done
ccharlesgb opened this issue Oct 25, 2023 · 7 comments · Fixed by #9004
Closed
2 tasks done

[CT-3259] [Regression] Running empty seed file raises unhandled exception #8895

ccharlesgb opened this issue Oct 25, 2023 · 7 comments · Fixed by #9004
Assignees
Labels
backport 1.6.latest backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: Adapters regression

Comments

@ccharlesgb
Copy link

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

We have an empty seed file in one of our dbt projects (Sometimes we need to manually intervene and add lines to it) that has been running successfully for a while now but after upgrading to dbt 1.6.6 you get an unhandled exception when compiling:

12:21:14  Running with dbt=1.6.6
12:21:15  Registered adapter: bigquery=1.6.7
12:21:15  Unable to do partial parsing because of a version mismatch
12:21:16  Found 3 models, 2 seeds, 0 sources, 0 exposures, 0 metrics, 394 macros, 0 groups, 0 semantic models
12:21:16  
12:21:25  Concurrency: 8 threads (target='integration-local')
12:21:25  
12:21:25  2 of 2 START seed file dry_run.my_seed ............................................................................................................................................. [RUN]
12:21:25  1 of 2 START seed file dry_run.empty_seed .......................................................................................................................................... [RUN]
12:21:25  Unhandled error while executing 
'NoneType' object has no attribute 'upper'
12:21:25  1 of 2 ERROR loading seed file dry_run.empty_seed .................................................................................................................................. [ERROR in 0.09s]
12:21:30  2 of 2 OK loaded seed file dry_run.my_seed ......................................................................................................................................... [INSERT 2 in 4.28s]
12:21:30  
12:21:30  Finished running 2 seeds in 0 hours 0 minutes and 13.56 seconds (13.56s).
12:21:30  
12:21:30  Completed with 1 error and 0 warnings:
12:21:30  
12:21:30    'NoneType' object has no attribute 'upper'
12:21:30  
12:21:30  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

Expected/Previous Behavior

In dbt 1.5.x it works fine:

12:24:09  Running with dbt=1.5.8
12:24:10  Registered adapter: bigquery=1.5.7
12:24:10  Unable to do partial parsing because of a version mismatch
12:24:10  Found 3 models, 0 tests, 0 snapshots, 0 analyses, 360 macros, 0 operations, 2 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
12:24:10  
12:24:11  Concurrency: 8 threads (target='integration-local')
12:24:11  
12:24:11  1 of 2 START seed file dry_run.empty_seed .......................................................................................................................................... [RUN]
12:24:11  2 of 2 START seed file dry_run.my_seed ............................................................................................................................................. [RUN]
12:24:15  2 of 2 OK loaded seed file dry_run.my_seed ......................................................................................................................................... [INSERT 2 in 3.76s]
12:24:16  1 of 2 OK loaded seed file dry_run.empty_seed ...................................................................................................................................... [INSERT 0 in 4.34s]
12:24:16  
12:24:16  Finished running 2 seeds in 0 hours 0 minutes and 5.52 seconds (5.52s).
12:24:16  
12:24:16  Completed successfully
12:24:16  
12:24:16  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Steps To Reproduce

  1. In dbt 1.6.x (I am using Big Query) create a project and create a seed with just headings:
#my_seed.csv
a,b,c

Then when you run dbt build/seed you will see the error.

Relevant log output

Already posted.

Environment

- OS: Mac OS
- Python: 3.8.16
- dbt (working version): 1.5.8
- dbt (regression version): 1.6.6

Which database adapter are you using with dbt?

bigquery

Additional Context

No response

@ccharlesgb ccharlesgb added bug Something isn't working regression triage labels Oct 25, 2023
@github-actions github-actions bot changed the title [Regression] Running empty seed file raises unhandled exception [CT-3259] [Regression] Running empty seed file raises unhandled exception Oct 25, 2023
@graciegoheen
Copy link
Contributor

Hi! So to confirm, the expected behavior is that when you have a csv file with just the headings:

#my_seed.csv
a,b,c

dbt will create an empty table in the warehouse?

I'm seeing a slightly different error for snowflake, but only on --full-refresh:
Screenshot 2023-10-25 at 11 13 15 AM

Interestingly, I do not see this error when running on snowflake 1.6.1 and core 1.6.0:
Screenshot 2023-10-25 at 11 14 29 AM

@graciegoheen graciegoheen added backport 1.6.latest High Severity bug with significant impact that should be resolved in a reasonable timeframe backport 1.7.latest and removed triage labels Oct 25, 2023
@dataders
Copy link
Contributor

dataders commented Oct 25, 2023

Happens on both snowflake and big query.

dbt-snowflake

dbt -d  seed -s empders --full-refresh

> Database Error in seed empders (seeds/empders.csv)
  002040 (42601): SQL compilation error:
  Unsupported data type 'NONE'.

There's no stacktrace here, but looking at Snowflake query history, this is the generated query that fails, obviously

create table TEST_DB.dbt_ajs.empders (col1 None,col2 None)

dbt-bigquery

here's dbt-bigquery stacktrace (relevant exract below).

BigQueryAdapter.load_dataframe()

AttributeError: 'NoneType' object has no attribute 'upper'

  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/dbt/adapters/bigquery/impl.py", line 781, in load_dataframe
    load_config.schema = bq_schema
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/base.py", line 172, in __setattr__
    super(_JobConfig, self).__setattr__(name, value)
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/load.py", line 477, in schema
    [field.to_api_repr() for field in value],
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/job/load.py", line 477, in <listcomp>
    [field.to_api_repr() for field in value],
  File "/Users/dataders/miniforge3/envs/dbt/lib/python3.10/site-packages/google/cloud/bigquery/schema.py", line 281, in to_api_repr
    if self.field_type.upper() in _STRUCT_TYPES:
AttributeError: 'NoneType' object has no attribute 'upper'

@ccharlesgb
Copy link
Author

Hi! So to confirm, the expected behavior is that when you have a csv file with just the headings:

#my_seed.csv
a,b,c

dbt will create an empty table in the warehouse?

Yes exactly we used to see dbt creating a blank table with 3 columns a, b & c

@ccharlesgb
Copy link
Author

It looks like the field_type dbt 1.6 is passing to the BQ client is None. If I print the value of bq_schema here:

    def load_dataframe(self, database, schema, table_name, agate_table, column_override):
        bq_schema = self._agate_to_schema(agate_table, column_override)
        print("BQ-SCHEMA")
        print(bq_schema)
BQ-SCHEMA
[SchemaField('a', None, 'NULLABLE', None, None, (), None), SchemaField('b', None, 'NULLABLE', None, None, (), None), SchemaField('c', None, 'NULLABLE', None, None, (), None)]

In 1.5 it infers it as an Integer (I didn't specify the seed schema in the yaml):

BQ-SCHEMA
[SchemaField('a', 'INT64', 'NULLABLE', None, None, (), None), SchemaField('b', 'INT64', 'NULLABLE', None, None, (), None), SchemaField('c', 'INT64', 'NULLABLE', None, None, (), None)]

@gshank
Copy link
Contributor

gshank commented Oct 26, 2023

This regression was caused by #8561 / #8153, released in 1.6.3.

@gshank gshank self-assigned this Oct 26, 2023
@jan-benisek
Copy link

jan-benisek commented Oct 31, 2023

Same happens in Redshift, after updating to 1.6.6
Any idea when is this going to be fixed?

This is my csv

"a","b","c","d","e","f","g"

When I add empty second row, it works

"a","b","c","d","e","f","g"
"","","","","","",""

but ofc I do not want that extra null row there.

@martynydbt
Copy link

@jan-benisek this issue is in our current sprint, so accounting for potential roll over: between 1.5-3 weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.6.latest backport 1.7.latest bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe Impact: Adapters regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants