Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add set/zip function to contexts #5107

Merged
merged 8 commits into from
May 13, 2022

Conversation

jeremyyeo
Copy link
Contributor

@jeremyyeo jeremyyeo commented Apr 19, 2022

resolves #2345

Description

Adding the set and zip functions as prescribed in the issue linked above.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have added information about my change to be included in the CHANGELOG.

@cla-bot cla-bot bot added the cla:yes label Apr 19, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented Apr 20, 2022

@jtcohen6 any thoughts on adding any of the others mentioned in the issue (i.e. any(), all())?

Also had some flake8 fails on:

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.SafeLoader' imported but unused
core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.Loader' imported but unused
core/dbt/context/base.py:8:1: F401 'dbt.clients.yaml_helper.Dumper' imported but unused

in my pre-commit, wasn't sure if it was okay to remove or not so I just committed with --no-verify.


Looks like my black formatter ignored the #noqa bit and formatted that particular one-liner into 4 separate lines (I have format on save turned on in VS Code)... changing those imports back to a single line results in a passing pre-commit.

@jeremyyeo jeremyyeo marked this pull request as ready for review April 23, 2022 20:31
@jeremyyeo jeremyyeo requested review from a team as code owners April 23, 2022 20:31
@emmyoop emmyoop requested review from emmyoop and removed request for a team and VersusFacit April 25, 2022 13:34
Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Sorry it took a bit to review! We've been having some conversations around the best pattern for functions that we add.

@jeremyyeo jeremyyeo changed the title Feature: Add set function to contexts Feature: Add set/zip function to contexts Apr 27, 2022
@gshank gshank requested a review from emmyoop April 28, 2022 13:29
@gshank
Copy link
Contributor

gshank commented May 2, 2022

Did you want to try creating the "try_set" methods that @emmyoop asked for?

@jeremyyeo
Copy link
Contributor Author

Did you want to try creating the "try_set" methods that @emmyoop asked for?

Hey @gshank yep keen to do that but kind of waiting to see if we will be adding a new special error type to raise (#5167). You reckon I should just proceed with a generic TypeError for now?

def try_set(value):
    try:
        return set(value)
    except TypeError:
        raise TypeError

@gshank
Copy link
Contributor

gshank commented May 3, 2022

It looks like we mainly raise CompilationExceptions in the contexts, partly because Jinja catches the exceptions and we don't have code to distinguish between parse-time and run-time for those errors. For now, just make it a CompilationException? We plan to do some work on organizing exceptions at some point and we can clean it up then.

@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented May 4, 2022

@gshank / @emmyoop what do you think of the try_ methods "skipping" the raise if the user was the one that set up a default return value? Would make it behave similar to other methods, e.g:

var('foo')  
> error if variable 'foo' not defined.

var('foo', 'bar') 
> var 'foo' set to 'bar' if var 'foo' not defined.

@emmyoop
Copy link
Member

emmyoop commented May 4, 2022

@jeremyyeo the try methods shouldn't allow a default. The intention of their use is for when the value really should always be the expected type. ie. if it's not iterable, etc it should be a hard stop. If the project wants to allow a default then the non-try version of the method should be used. We plan to have both versions of these methods where appropriate so the most appropriate one can be used and we don't break any projects going forward.

@gshank
Copy link
Contributor

gshank commented May 5, 2022

Looks like this is getting really close :-). If you can just add a couple of simple tests we can get this wrapped up. Probably a test in tests/functional/context_methods would be good; just something simple like a model that uses the functions.

Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some test changes but otherwise is looking great!

{% set set_result = set([1, 2, 2, 3, "foo", False]) %}
{% set try_set_result = try_set([1, 2, 2, 3, "foo", False]) %}

{% set simple_set = (set_result == try_set_result == set((1, 2, 3, "foo", False))) %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is a bit off since you're using the new method to check the new method.

Instead of using tests I would use macros that log the values of set_result andtry_set_result. Something like:

{% macro validate_set() %}
    {% set set_result = set([1, 2, 2, 3, 'foo', False]) %}
    {{ log("set_result: " ~ set_result) }}
    {% set try_set_result = try_set([1, 2, 2, 3, 'foo', False]) %}
    {{ log("try_set_result: " ~ try_set_result) }}
{% endmacro %}

Then in the test you can capture the logs from run-operation with the --debug flag for the macro and assert the logs show up:

def test_builtin_set_function(self, project):
        _, log_output = run_dbt_and_capture(["--debug", "run-operation", "validate_set"])

        expected_set = {False, 1, 2, 3, 'foo'}
        assert f"set_result: {expected_set}" in log_output
        assert f"try_set_result: {expected_set}" in log_output

Also do the same for zip for consistency.

@jeremyyeo jeremyyeo requested a review from emmyoop May 13, 2022 10:24
Copy link
Member

@emmyoop emmyoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jeremyyeo! Thanks for working with us on the few iterations as we move towards a better way to implement these methods.

I'll get it merged in!

@emmyoop emmyoop merged commit 7e43f36 into main May 13, 2022
@emmyoop emmyoop deleted the feature/2345-more-python-functions branch May 13, 2022 15:25
@emmyoop
Copy link
Member

emmyoop commented May 13, 2022

@jtcohen6 should this be back ported to 1.1.latest?

@leahwicz
Copy link
Contributor

@emmyoop I would wait for this to go out in v1.2 since it's net new functionality. @jtcohen6 let me know if you feel differently

agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request May 20, 2022
* add set function to contexts

* add zip function to contexts

* add changelog

* add try_ equivalents

* remove defaults

* add tests

* update tests
@skwaugh
Copy link

skwaugh commented Jun 17, 2022

Any chance of this making it into 1.2?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 14, 2022

@skwaugh This will be in v1.2!

@jeremyyeo @emmyoop I'm working on docs for this right now (dbt-labs/docs.getdbt.com#1635), and it looks like some of the docstrings need updating. (E.g. all methods include :param default: A default value to return if *args is not iterable, including the ones that do not accept a default argument.)

One other note: To my mind, the implementation of set and try_set here feels exactly backwards. A big inspiration for the try_ nomenclature is Snowflake's try_cast and BigQuery's safe_cast, as opposed to just cast. The idea there is that cast raises an error if the provided column expression / literal value cannot be converted or coerced to the desired type, whereas try_cast/safe_cast return a default value (null) instead.

When using CAST, a query can fail if BigQuery is unable to perform the cast. If you want to protect your queries from these types of errors, you can use SAFE_CAST.

As a dbt user, and as someone familiar with those SQL methods, that's what I was expecting those methods to do, too.

I'm also struggling to come up with a working example where zip fails (raising a TypeError), leading it to return the default value instead. I've tried setting one of the inputs to a non-iterable value:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b) | list %}

But this just raises:

00:11:08  Encountered an error:
'NoneType' object is not iterable

@jeremyyeo
Copy link
Contributor Author

and it looks like some of the docstrings need updating.

Ah yes - thanks for catching that :)

To my mind, the implementation of set and try_set here feels exactly backwards.

Yeah I think so too.

However, at the same time, I'm pretty sure we were trying to align these new functions (in terms of whether try_ "returns default upon error" or "error upon error") with the behaviour of existing context methods like toyaml, tojson:

def toyaml(
value: Any, default: Optional[str] = None, sort_keys: bool = False
) -> Optional[str]:

Which has no try_ equivalent at the moment. I suppose we could:

  1. Swap try_zip/try_set with _zip/_set but then this will deviate a little bit with the other existing methods toyaml, etc.
  2. Swap try_zip/try_set with _zip/_set AND add try_ equivalents of the existing methods too (try_toyaml)?

I'm also struggling to come up with a working example where zip fails (raising a TypeError)

Non-iters seem to raise TypeError in Python...

image

I think it's because:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b) %}  -- Here the TypeError is raised and value returned is None
{% set into_list = my_zip | list %} -- Effectively we're doing `list(None)`

And list(None) does result in 'NoneType' object is not iterable.

image

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 14, 2022

@jeremyyeo Thanks for the quick response! And good call on the | list filter being the culprit here. I was able to get this example working:

{% set my_list_a = 1234 %}
{% set my_list_b = ['alice', 'bob', 'another'] %}
{% set my_zip = zip(my_list_a, my_list_b, default = []) | list %}
{% do log(my_zip, info = true) %} {# [] #}

I'm catching myself back up on these threads from a few months ago, and just dropped a few thoughts in #5167 (comment). I recall agreeing at the time that:

  • This would necessarily feel inconsistent with toyaml/fromyaml/tojson/fromjson (we don't love how those were implemented, too much in one method)
  • At the risk of some interim inconsistency, it's worth doing the right thing going forward. Later on, we can scope and schedule work to bring the others into line.

My strong preference is to get this right going forward, by giving the right functionality to the right name. Two options for us right now (while we're in the v1.2 RC period):

  1. Switch the behavior of set + try_set, zip + try_zip
  2. Rename try_* methods to indicate the opposite behavior, e.g. strict_

@emmyoop @nathaniel-may open to your thoughts as well!

@jtcohen6
Copy link
Contributor

@nathaniel-may and I had a chance to discuss live. We're definitely lacking a larger "language design" discussion and spec which would make it easier for community members to contribute, and for us to review, these types of changes with confidence. If he's interested, I think @dbeatty10 might be able to help with this :)

We don't have the bandwidth to do that in the immediate term, but it's definitely worth doing. In the meantime, let's move forward with one of the short-term fixes identified above, to include in v1.2.0-rc2, just to avoid shipping functionality that does the exact opposite of what (I suspect) many users will expect.

@emmyoop
Copy link
Member

emmyoop commented Jul 14, 2022

@jtcohen6 I'd lean toward option 2 to avoid breaking users until we've figured out the standard we want to live by. strict is better than try 💯 . And while try has meaning in Snowflake, does it for other adapters? I don't love using try if it only has meaning in Snowflake. I'd rather move away from it entirely.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 14, 2022

@emmyoop Okay, I buy it! If we choose to adopt strict_* as a consistent pattern, it will also be much easier to add in strict_toyaml et al, since it requires no change to the current behavior of toyaml.

@jeremyyeo I can pick this up, unless you have the motivation and capacity to see it through. We'd just need to get the change merged + backported by next Tuesday.

@nathaniel-may
Copy link
Contributor

nathaniel-may commented Jul 14, 2022

minor note: would you consider toyaml_strict so in an alphabetized docs page they show up next to each other?

Clarification question: If I see a function with strict in its name, what exactly do I expect from its behavior?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 14, 2022

minor note: would you consider toyaml_strict so in an alphabetized docs page they show up next to each other?

👍 good call! all about it

Clarification question: If I see a function with strict in its name, what exactly do I expect from its behavior?

As I understand it, "strict" methods:

  • should raise a standard exception if the value cannot be converted to a set / yaml / json, zipped, etc
  • should not not allow for a default value to gracefully fail over to

Whereas the "non-strict" versions of those methods return None if the conversion/serialization fails (or a configurable default instead of None).

IIRC, we had previously discussed strict: bool = False as an additional optional argument of these methods, but decided it would be better to implement separate methods / variants.

(One thought: Should it be to_set() and to_set_strict()? Not a hill we must fall down on right now.)

@jeremyyeo
Copy link
Contributor Author

@jtcohen6 Took a crack at that tags/v1.2.0rc1...rename-strict-mode

Should we do toyaml and tojson at this time too?

@jtcohen6
Copy link
Contributor

@jeremyyeo That branch looks great! Want to open a PR for it?

I just opened a dedicated issue for this switch: #5475

The timeline to do fromjson/tojson/fromyaml/toyaml is not as pressing. The reason to rename try_set and try_zip right now is that they've already been included in an RC, and not yet in a final release, so we only have one week to correct it before we've made a commitment to backwards compatibility. That said, we should definitely make sure the work is tracked (in either #5475 or #5167), clearly defined, and (probably) open for community contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make zip function available in Jinja
7 participants