Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml quoting not working with NativeEnvironment jinja evaluator #2597

Closed
drewbanin opened this issue Jun 29, 2020 · 12 comments
Closed

yaml quoting not working with NativeEnvironment jinja evaluator #2597

drewbanin opened this issue Jun 29, 2020 · 12 comments
Labels
bug Something isn't working
Milestone

Comments

@drewbanin
Copy link
Contributor

Describe the bug

dbt's NativeEnvironment introduced a functional change to how Jinja strings are evaluated. In dbt v0.17.0, a schema test can no longer be configured with a quoted column name.

Steps To Reproduce

# schema.yml

version: 2
models:
  - name: debug

    columns:
      - name: MyId
        quote: true
        tests:
          - relationships:
              to: ref('debug')
              field: '"MyId"'
-- models/debug.sql

select 1 as "MyId"

Results:

Database Error in test relationships_debug__MyId____MyId___ref_debug_ (models/schema.yml)
  column "myid" does not exist
  LINE 12:     select MyId as id from "analytics"."test_schema"."debug"
                      ^
  HINT:  Perhaps you meant to reference the column "debug.MyId" or the column "child.id".
  compiled SQL at target/compiled/neondwh/models/schema.yml/schema_test/relationships_debug__MyId____MyId___ref_debug_.sql

Expected behavior

I would expect the yaml/jinja string '"MyId"' to be resolved to the string "MyId", not MyId.

The output of dbt --version:

dbt v0.17.0

The operating system you're using: macOS

The output of python --version: 3.7.7

Additional context

Using Jinja2==2.11.2

@drewbanin drewbanin added the bug Something isn't working label Jun 29, 2020
@drewbanin drewbanin added this to the 0.17.1 milestone Jun 29, 2020
@beckjake
Copy link
Contributor

Is this a dupe of #2468 ?

@drewbanin
Copy link
Contributor Author

drewbanin commented Jun 29, 2020

I don't think so -- the issue here is that strings like '"my string"' are being stripped of double quotes, right?

@jtcohen6
Copy link
Contributor

As I understand it, a resolution to #2468 could offer a resolution here too, if debug.MyId were configured with quote: true. But the more straightforward issue is around the ability to pass quotes as part of a string value through the native env, which I think more closely relates to #2531.

Cringe-y, but this does work:

- relationships:
    to: ref('my_first_dbt_model')
    field: '''"MyId"'''

@drewbanin
Copy link
Contributor Author

yeah - I believe that this is a yaml/jinja parsing/rendering issue. If that's the case, we should fix this for 0.17.1. If not, then we can investigate other options. My instinct here is that this isn't directly related to the relationships test, but I have not yet verified that this is the case.

@beckjake
Copy link
Contributor

Well I'm not sure how we can handle that without customizing the yaml/jinja parsing stuff further. Yaml eats one level of quotes, jinja eats another. Neither part really "knows" about quoting. I think the best fix is to honor the quote parameter and tell people to use that.

@drewbanin
Copy link
Contributor Author

These configs aren't typed as database identifiers -- I don't think the quote config is a good general solution to this problem. We need to provide a way for users to supply quoted strings to yaml configs.

Didn't we take specific action to preserve these quotes? I see the _requote_result method in clients/jinja.py but it doesn't appear to me that we're calling it anywhere. I don't recall intentionally ditching that functionality - was removing the call to this function intentional?

@beckjake
Copy link
Contributor

It was removed as part of #2395 because (I thought) we were successfully avoiding calling the renderer on column names. We can put it back, I guess.

@drewbanin
Copy link
Contributor Author

Ok, thanks for the info. I just don't like the idea of dbt needing to be overly smart about where/when it chomps off wrapping quotes. The column name change in #2395 is a good one, but I'd hesitate to add similar logic for the relationships specifically test given that:

  1. Any user could supply their own schema tests which won't benefit from this logic
  2. This behavior should ideally be explicit / consistent for other configs too

Do you recall if there were any specific drawbacks related to the addition of _requote_result? From what I understand, I think it could be a good idea to start calling that method again, but my memory of removing it in the first place is pretty hazy!

@beckjake
Copy link
Contributor

It was just another layer of difficult-to-predict behavior. It had some weird potential special-case behavior where we'd potentially requote things that shouldn't be requoted. I think something like "{{ a }}" ~ "{{ b }}" would get requoted if len(a+b)==19, but not otherwise (may require some fiddling around with curly braces to reproduce, but you get the idea).

@dmarts
Copy link

dmarts commented Jun 30, 2020

Hi all I had this same problem and used the quote parameter to get around it. But couldn't see how to use that with a model-level test e.g.

    - dbt_utils.unique_combination_of_columns:
        combination_of_columns:
          - Company
          - "Purchase Order"
          - Item Number

The missing piece in the workaround was to add square brackets on the column names like so:

    - dbt_utils.unique_combination_of_columns:
        combination_of_columns:
          - [Company]
          - [Purchase Order]
          - [Item Number]

I guess neither yaml and jinja want to eat the square brackets so they make it all the way to Snowflake 👍

Edit: this didn't work after all, so we've moved back to 16 for now 🙃

@drewbanin
Copy link
Contributor Author

hey @dmarts - that's not good. I'm not sure what the right way to address this use-case is at present, but it's definitely something that should work in some way in dbt. Going to give this a think and follow back up here

@drewbanin
Copy link
Contributor Author

errr..... I might have misunderstood your use case! Are you just trying to supply a list of columns to combination_of_columns? In that case, the PR attached above (#2599) should fix this in dbt v0.17.1. This fix will preserve double quotes found inside of a yaml string.

That's a bit of a tricky concept. Let's consider the example:

    - dbt_utils.unique_combination_of_columns:
        combination_of_columns:
          - Company
          - "Purchase Order"
          - '"Item Number"'

In this example:

  • Company is unquoted
  • `"Purchase Order" is wrapped in a single set of quotes
  • '"Item Number"' is wrapped in two sets of quotes (double quotes inside of single quotes)

I think that if you're providing an identifier to a macro like unique_combination_of_columns, you'll need to use the 3rd option shown here. This does not work correctly in 0.17.0, but that's a regression, and we'll have it fixed for 0.17.1.

The two sets of quotes are necessary because this file is first parsed by dbt's YAML parser. To YAML, "Purchase Order" and Purchase Order are both equivalent -- the quotes are optional, and they are stripped by the YAML parser when reading the file. This means that dbt will operate on a string like Purchase Order, which is not a valid identifier when passed into a SQL statement.

So, we need to use two sets of quotes here, and the PR above does just that. I'm going to close this issue out, but I'm more than happy to continue the discussion. Please let me know if I missed anything or if you have any questions about the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants