-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do case-insensitive schema comparisons to test for schema existance #1663
Do case-insensitive schema comparisons to test for schema existance #1663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment here, but this is looking really close!
core/dbt/task/runnable.py
Outdated
required_schemas_lowered = set(map(lower_tuple, required_schemas)) | ||
existing_schemas_lowered = set(map(lower_tuple, existing_schemas)) | ||
|
||
for database, schema in (required_schemas_lowered - existing_schemas_lowered): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing worth pointing out here: this is going to call create_schema
with the lower-cased versions of database
and schema
. The adapter.create_schema method will quote these identifiers as configured in the dbt_project.yml
file, so I think that given:
quoting:
database: True
schema: True
and a target configured with:
database: MY_DATABASE
schema: MY_SCHEMA
dbt would errantly create a schema at "my_database"."my_schema"
, which would be incorrect.
I think that means that we want to generate some sort of mapping between the lower-cased schema tuples and their untransformed counterparts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, I didn't know about the quoting option. It feels like if someone specified quoting=True
, dbt should actually be doing all db or schema checks with case-sensitivity? However, there are a lot of functions out there that are lower()
ing things by default and that feels like a harder fix.
Would another solution be to do a schema check before creating it? The quote_policy is used in the check, but unsure how the internals work. Although I would be worried about projects with lots of required schemas.
if adapter.check_schema_exists(db, schema):
adapter.create_schema(db, schema)
Anyway, I updated the PR to create the schema with the original model/target values. One remaining bug is that if your project and your target have different cases, this will call the create_schema function twice (because of that extra add() for the snowflake use {schema} bug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like if someone specified quoting=True, dbt should actually be doing all db or schema checks with case-sensitivity?
Yeah, I think that's generally right, but it can be pretty tricky to do exactly the right thing in every circumstance. Snowflake supports a quoted_identifier_ignore_case parameter, which is all the proof I need that it would be impossible for dbt to aways be exactly right on every database it supports :). Instead, we disallow dbt to operate on two databases/schemas that differ only in casing - I don't think there's a good use case for that, so dbt just fails hard if it finds that that's the case - it's usually a misconfiguration!
Would another solution be to do a schema check before creating it?
Yeah! That's a great idea! I like the way that you implemented this.
One remaining bug is that if your project and your target have different cases ... this will call the create_schema function twice
Really good point! I took a quick scan through the codebase, and I don't think dbt runs use schema ...
anymore, so we could conceivably remove this line of code!! I'd need to think a lot harder about that, but if it turns out that we can remove this line, then we'd also be fixing #913 :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I spent some more time verifying that we don't run the use schema...
query anywhere. Let's remove it!! Are you comfortable making that change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
f03f9cc
to
265f6d3
Compare
Redshift integration tests are breaking:
Is this a race condition in the tests? |
@edmundyan it is in fact a race condition in Redshift :( That's one for another issue though! Let me re-run that particular test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @edmundyan!
…ered Do case-insensitive schema comparisons to test for schema existance
Resolves #1651
Looked through the unit tests and didn't find anything for
dbt.task
classes. Happy to help or contribute if you feel that's needed :)Edit:
closes #913