Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check length of escaped strings in the adapter test #6567

Merged
merged 2 commits into from
Jan 11, 2023

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Jan 10, 2023

resolves #6566

See the following PRs for context and validation that it doesn't break any dbt Labs adapters:

Description

Adapter maintainers can choose to extend one of the following pytests (which are expected to be mutually exclusive):

  • BaseEscapeSingleQuotesQuote
  • BaseEscapeSingleQuotesBackslash

We've observed at least one case where the adapter made the wrong choice but the test passed. This PR aims to trigger a CI failure if the wrong choice is made accidentally, and I validated that is does so here.

Checklist

Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I like defining the tests and expected values in SQL.

@dbeatty10 dbeatty10 merged commit dd4b47d into main Jan 11, 2023
@dbeatty10 dbeatty10 deleted the dbeatty/verify-escaped-length branch January 11, 2023 00:26
Copy link
Contributor

@VersusFacit VersusFacit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we really appreciate you made those validation PRs first.

Really intriguing! So, the adapter maintainer picked a macro quoting policy that did not coordinate to the realities of their database system? Curious!

I like how the tests are defined and the numeric confirmation is key.

My only concern is this absolutely insane case that does happen and might be important to clarify:

'{{ escape_single_quotes('they're') }}'

Double quotes really aren't that sacred and on first glance, it'd be hard to know this what (at least on Snowflake) the quoting policy function does.

image

^So like, this is a thing...that works

I figure we should add a case to account for this. What's your thought?

@VersusFacit
Copy link
Contributor

Oops didn't get my message in before you went on the merge

@dbeatty10
Copy link
Contributor Author

the adapter maintainer picked a macro quoting policy that did not coordinate to the realities of their database system?

Yep! In this case, it was us as the maintainers of dbt-spark 😅. Since it was so easy to accidentally overlook, it provided motivation to open #6566 to make it hard to get wrong.

Based on our DM's, it seems like part of your concern is that we're covering test cases like:
'{{ escape_single_quotes("they're") }}'
but not:
'{{ escape_single_quotes('they're') }}'

I tried the latter in my local environment, and it gave the following error:

>           raise CompilationException(str(e), node) from e
E           dbt.exceptions.CompilationException: Compilation Error in model test_escape_single_quotes (models/test_escape_single_quotes.sql)
E             expected token ',', got 're'
E               line 2
E                 '{{ escape_single_quotes('they're') }}' as actual,

projects/dbt-core/core/dbt/clients/jinja.py:516: CompilationException

I think the issue is that "they're" can be parsed as valid string literal within Jinja whereas 'they're' can not.

Based on my current understanding, I'm comfortable with the changes made by this PR. But I'm open to continued discussion if we're missing something important.

@VersusFacit
Copy link
Contributor

VersusFacit commented Jan 11, 2023

Got it. So, on some db's (like Spark) it'll be a syntax error and on others (like Snowflake), both options will work -- what a nightmare hah.

'{{ escape_single_quotes("they're") }}'

  1. Outermost: SQL level
  2. Innermost: will be escaped
  3. In-between: what's inside these is passed to the escape function, but not these themselves

My concern was around that last of the three. I hope we have literature or guidance somewhere that makes this clear, because otherwise, that's some deep arcane type magic that would have bothered (among others) early career data engineer me 😄

Agreed this Issue is resolved. Reclosing the issue. Thanks Doug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1782] [Bug] Check length of escaped strings in the adapter test
3 participants