Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserve empty strings in native env templating #2608

Closed
wants to merge 1 commit into from

Conversation

drewbanin
Copy link
Contributor

Do not merge this PR

Just wanted to call out another issue we're seeing with the Jinja NativeEnv implementation introduced in 0.17.1. This PR fixes a problem where the empty string '' is converted to None.

Example:

version: 2

models:
  - name: debug

    columns:
      - name: id
        tests:
          - accepted_values:
              values: ['', 'a']

Here, the test compiles to ....where id not in ('None', 'a').

I think the number of holes to plug in our current approach to native env rendering is exceeding the number of thumbs we have available to us. Let's regroup and figure out a long-term sustainable approach to type inference for yaml+jinja.

@drewbanin
Copy link
Contributor Author

@cla-bot check

@cla-bot cla-bot bot added the cla:yes label Jul 6, 2020
@cla-bot
Copy link

cla-bot bot commented Jul 6, 2020

The cla-bot has been summoned, and re-checked this pull request!

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test this, of course...

@drewbanin
Copy link
Contributor Author

Do we want to further restrict the set of types that dbt will convert to? The docs say that literal_eval will:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

So, literal_eval will convert string values to tuples, lists, dicts and sets. I think it could be a good idea to restrict this set of destination types to non-container types. I do think that coercing string input to a container type is an awesome feature when used intentionally, but a very bad feature when used by accident. A string like "('abc', 'def')" is a pretty reasonable looking SQL fragment, but in the native env context, it would be interpreted as a Python tuple!

I'd be ok with documenting and supporting this behavior if we're happy living with it for the foreseeable future. If we have any reservations about something like this though, it could be a good idea to remove support today and consider explicit ways to support these container types in a native env in the future

@beckjake
Copy link
Contributor

beckjake commented Jul 6, 2020

so, I know earlier I said that I thought it would be hard to add an as_bool that's the reverse of the default, but maybe I was wrong. What about something like this?

class NativeMarker(str):
    pass


def get_environment(...):
   ...
   env.filters['as_native'] = NativeMarker


def quoted_native_concat(nodes):
    head = list(islice(nodes, 2))

    if not head:   # this turns get_rendered('') into ''
        return ''

    if len(head) == 1:
        raw = head[0]
        if not isinstance(raw, NativeMarker):
            return str(raw)
    else:
        # turn multiple nodes into a string
        return "".join([str(v) for v in chain(head, nodes)])

    # if we got here, there was a single node marked with the 'as_native' filter.
    try:
        return literal_eval(raw)
    except (ValueError, SyntaxError, MemoryError):
        return raw

@beckjake
Copy link
Contributor

beckjake commented Jul 6, 2020

That's kind of a reset to the 0.16 behavior, but it makes the native rendering sort of opt-in at the jinja level, with some caveats around the native code generator behaving slightly differently from the standard code generator. Maybe that's what we always wanted?

@beckjake
Copy link
Contributor

beckjake commented Jul 6, 2020

You could pretty easily extend this with filters that assert the type, too. so we could implement an as_bool filter that ensures the result of literal_eval is a bool, etc... but this seems like a potential one-size-fits-all solution for the issue.

It does suck for anyone who was using enabled: "{{ var('my_var_value') }} and now has to tack an | as_native on the end, but maybe optimizing for them is less important than optimizing for everyone else?

@drewbanin
Copy link
Contributor Author

cool, that approach is broadly aligned with how I was thinking about this too. There's a lot to like about the 0.17.0 behavior, but it has too many quirks to be a sane default. I just keep coming back to the case of key: '00'. Any implementation that evaluates this string out to 0 is just wrong. If the only answer there is making the native env opt-in/explicit, then so be it IMO.

We're going to need to write something up about this change to make sure folks understand why we changed the behavior in 0.17.0 and why we're changing it back in 0.17.1 (or does this make our next release 0.18.0??). I think most folks would agree that making native env rendering explicit is the right decision long-term, and if that's the case, then it will benefit everyone to make this change as soon as possible.

@drewbanin
Copy link
Contributor Author

I confused myself with this one :)

Closing as this was replaced by #2618

See also #2612

@drewbanin drewbanin closed this Jul 8, 2020
@kwigley kwigley deleted the fix/literal_eval_empty_string branch February 12, 2021 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants