Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add macro to get columns #516

Merged
merged 13 commits into from
Mar 28, 2022
Merged

add macro to get columns #516

merged 13 commits into from
Mar 28, 2022

Conversation

patkearns10
Copy link
Contributor

@patkearns10 patkearns10 commented Mar 11, 2022

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is main
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

Similar to dbt_utils.get_column_values, except this pulls column names, rather than the values within a column. Like dbt_utils.star, but as a list, to be iterated over.

This is basically dbt_utils.star, but removing some of the last steps.

Unsure what the purpose of the top couple lines of star macro are:

{% macro star(from, relation_alias=False, except=[], prefix='', suffix='') -%}
    {{ return(adapter.dispatch('star', 'dbt_utils')(from, relation_alias, except, prefix, suffix)) }}
{% endmacro %}

{% macro default__star(from, relation_alias=False, except=[], prefix='', suffix='') -%}
    {%- do dbt_utils._is_relation(from, 'star') -%}
    {%- do dbt_utils._is_ephemeral(from, 'star') -%}

So I left those out. But I assume maybe that is important?

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt_utils.type_* macros instead of explicit datatypes (e.g. dbt_utils.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@patkearns10 patkearns10 requested a review from joellabes March 11, 2022 02:47
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @patkearns10, thanks for opening this!

Unsure what the purpose of the top couple lines of star macro are:

{% macro star(from, relation_alias=False, except=[], prefix='', suffix='') -%}
    {{ return(adapter.dispatch('star', 'dbt_utils')(from, relation_alias, except, prefix, suffix)) }}
{% endmacro %}

{% macro default__star(from, relation_alias=False, except=[], prefix='', suffix='') -%}
    {%- do dbt_utils._is_relation(from, 'star') -%}
    {%- do dbt_utils._is_ephemeral(from, 'star') -%}

The first chunk is to enable multiple dispatch: https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch. In short, they allow other adapters to provide an override version of this macro, if they need to.

The is_relation and is_ephemeral blocks then check that the thing you passed in is a real ref/source object, and that it's not ephemeral (because CTEs can't be used with this, because they don't go into the info schema).

So you do need all parts of that 😬

I'd be very interested in whether the star macro could call this new macro directly, instead of having two copies of the exclude code etc. I haven’t looked at whether that's possible or not, but it feels reasonable - would love if you had a go as an extra for experts!

README.md Outdated Show resolved Hide resolved
@patkearns10
Copy link
Contributor Author

Okay I moved some things around!
the star macro now references the get_columns macro.

tested it works locally, by recreating the macros and removing the dispatch logic. How do I test the dispatch logic?

get_columns macro:
Screen Shot 2022-03-23 at 10 15 29 AM

call get_columns:
Screen Shot 2022-03-23 at 10 30 15 AM

call get_columns with except:
Screen Shot 2022-03-23 at 10 30 39 AM

replace within local star macro:
Screen Shot 2022-03-23 at 10 15 20 AM

call local star macro:
Screen Shot 2022-03-23 at 10 15 07 AM

call local star macro with except:
Screen Shot 2022-03-23 at 10 38 15 AM

@patkearns10 patkearns10 requested a review from joellabes March 22, 2022 23:38
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great :chefs_kiss: Just a couple of naming/doc tweaks for clarity. As for how to test the dispatched stuff, you'd want to add a dbt Singular test which validates the behaviour you expect.

Something along similar lines to how codegen is tested: https://github.com/dbt-labs/dbt-codegen/blob/main/integration_tests/tests/test_generate_source.sql - I don't know if there's a way to compare two jinja lists' contents, or whether you'd have to turn it into a comma-separated string and compare that to an expected value 🤷

macros/sql/star.sql Outdated Show resolved Hide resolved
macros/sql/get_columns.sql Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
macros/sql/get_columns.sql Outdated Show resolved Hide resolved
patkearns10 and others added 7 commits March 24, 2022 10:01
Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
    -----------
    Actual:
    -----------
    --->"field_3"as "test_field_3"<---

    -----------
    Expected:
    -----------
    --->"field_3" as "test_field_3"<---
@patkearns10
Copy link
Contributor Author

Added tests, documentation, and a new (opt-in) argument for lowercase output:

star UPPER:

Screen Shot 2022-03-24 at 1 41 15 PM

star lower:

Screen Shot 2022-03-24 at 1 40 55 PM

get_filtered_columns UPPER:

Screen Shot 2022-03-24 at 1 44 37 PM

get_filtered_columns lower:

Screen Shot 2022-03-24 at 1 42 11 PM

integration tests:

Screen Shot 2022-03-24 at 1 39 45 PM

@patkearns10 patkearns10 requested a review from joellabes March 24, 2022 02:46
Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan overall! My one complaint is the customisable lowercasing - since you can just pass it into the lower filter, I don't think adding two extra params adds enough value. I talk about that more on one of the specific comments - lmk what you think

README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
integration_tests/tests/sql/test_star_output_lower.sql Outdated Show resolved Hide resolved

{% else %}

select 'ok' limit 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be {{ limit_zero() }} - not all databases support limit clauses (notably T-SQL)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!
@joellabes -- this is no longer required, as I switched to using the seed/model method. Should I remove this file then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense - the Core team are working on a wider testing suite that I hope will also cover packages - if that wasn't in the pipeline then I'd suggest we kept this around because "we’d need it one day" but I think we'll have a more long term solution soon.

@patkearns10
Copy link
Contributor Author

Removed lowercase and swapped to different testing method.

Screen Shot 2022-03-26 at 12 03 52 PM
Screen Shot 2022-03-26 at 12 03 38 PM

Curious some other tests were failing 🤔 but I didn't touch those and don't really want to dig into that.
Screen Shot 2022-03-26 at 12 03 23 PM

@patkearns10 patkearns10 requested a review from joellabes March 26, 2022 01:19
@joellabes
Copy link
Contributor

OK I looked into this, and you're off the hook for the tests - they're checking target.name for snowflake instead of target.type so aren't lowercasing the properties 😬
image

https://github.com/dbt-labs/dbt-utils/blob/main/integration_tests/models/sql/test_unpivot.sql#L18-L21

And the test_get_column_values_use_default one I assume came from running dbt build or something - it has a post-hook which drops the seed after it's created which is a bit finnicky unless it's done as seed/run/test.

These won't break on the CI jobs, (idk why that isn't running 🤔) so I think we're good to merge! Thanks for getting this done 🥳

Copy link
Contributor

@joellabes joellabes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 😍

@joellabes joellabes merged commit d279542 into main Mar 28, 2022
@joellabes joellabes removed the pending label Mar 28, 2022
joellabes added a commit that referenced this pull request Apr 6, 2022
* Update README.md

* Mutually excl range examples in disclosure triangle

* Fix union_relations error when no include/exclude provided

* Fix union_relations error when no include/exclude provided (#509)

* Update CHANGELOG.md

* Add to_condition to relationships where

* very minor nit - update "an new" to "a new" (#519)

* add quoting to split_part (#528)

* add quoting to split_part

* update docs for split_part

* typo

* corrected readme syntax

* revert and update to just documentation

* add new line

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* add macro to get columns (#516)

* add macro to get columns

* star macro should use get_columns

* add adapter.

* swap adapter for dbt_utils

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* update documentation

* add output_lower arg

* update name to get_filtered_columns_in_relation from get_columns

* add tests

* forgot args

* too much whitespace removal

    -----------
    Actual:
    -----------
    --->"field_3"as "test_field_3"<---

    -----------
    Expected:
    -----------
    --->"field_3" as "test_field_3"<---

* didnt mean to move a file that i did not create. moving things back.

* remove lowercase logic

* limit_zero

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* Add listagg macro and integration test

* remove type in listagg macro

* updated integration test

* Add redshift to listagg macro

* remove redshift listagg

* explicitly named group by column

* updated default values

* Updated example to use correct double vs. single quotes

* whitespace control

* Added redshift specific macro

* Remove documentation

* Update integration test so less likely to accidentally work

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* default everything but measure to none

* added limit functionality for other dbs

* syntax bug for postgres

* update redshift macro

* fixed block def control

* Fixed bug in redshift

* Bug fix redshift

* remove unused group_by arg

* Added additional test without order by col

* updated to regex replace

* typo

* added more integration_tests

* attempt to make redshift less complicated

* typo

* update redshift

* replace to substr

* More explicit versions with added complexity

* handle special characters

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
Co-authored-by: Jamie Rosenberg <james.rosenberg@canva.com>
Co-authored-by: Pat Kearns <pat.kearns@fishtownanalytics.com>
dbeatty10 pushed a commit that referenced this pull request Apr 7, 2022
* Fix/timestamp withought timezone (#458)

* timestamp and changelog updates

* changelog fix

* Add context for why change to no timezone

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* also ignore dbt_packages (#463)

* also ignore dbt_packages

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* date_spine: transform comment to jinja (#462)

* Have union_relations raise exception when include parameter results in no columns (#473)

* Raise exception if no columns in column_superset

* Add relation names to compiler error message

* Add `union_relations` fix to changelog

* Added case for handling postgres foreign tables... (#476)

* Add link for fewer_rows_than schema test in docs (#465)

* Added case for handling postgres foreign tables (tables which are external to current database and are imported into current database from remote data stores by using Foreign Data Wrappers functionallity).

* Reworked getting of postges table_type.

* Added needed changes to CHANGELOG.

Co-authored-by: José Coto <jlcoto@users.noreply.github.com>
Co-authored-by: Taras Stetsiak <tstetsiak@health-union.com>

* Enhance usability of star macro by only generating column aliases when prefix and/or suffix is specified (#468)

* The star macro should only produce column aliases when there is either a prefix or suffix specified.

* Enhanced the readme for the star macro.

* Add new integration test

Co-authored-by: Nick Perrott <nperrott@roiti.com>
Co-authored-by: Josh Elston-Green
Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* fix: extra brace typo in insert_by_period_materialization (#480)

* Support quoted column names in sequential_values test (#479)

* Add any value (#501)

* Add link for fewer_rows_than schema test in docs (#465)

* Update get_query_results_as_dict example to demonstrate accessing columnar results as dictionary values (#474)

* Update get_qu
ery_results_as_dict example to demonstrate accessing columnar results as dictionary values

* Use slugify in example

* Fix slugify example with dbt_utils. package prefix

Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com>

* Add note about not_null_where deprecation to Readme (#477)

* Add note about not_null_where deprecation to Readme

* Add docs to unique_where test

* Update pull_request_template.md to reference `main` vs `master` (#496)

* Correct coalesce -> concatenation typo (#495)

* add any_value cross-db macro

* Missing colon in test

* Update CHANGELOG.md

Co-authored-by: José Coto <jlcoto@users.noreply.github.com>
Co-authored-by: Elize Papineau <elizepapineau@gmail.com>
Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com>
Co-authored-by: Joe Ste.Marie <stemarie.joe@gmail.com>
Co-authored-by: Niall Woodward <niall@niallrees.com>

* Fix changelog

* Second take at fixing pivot to allow single quotes (#503)

* fix pivot : in pivoted column value, single quote must be escaped (on postgresql)
else ex. syntax error near : when color = 'blue's'

* patched expected

* single quote escape : added dispatched version of the macro to support bigquery & snowflake

* second backslash to escape in Jinja, change case of test file columns

Let's see if other databases allow this

* explicitly list columns to compare

* different tests for snowflake and others

* specific comparison seed

* Don't quote identifiers for apostrophe, to avoid BQ and SF problems

* Whitespace management for macros

* Update CHANGELOG.md

Co-authored-by: Marc Dutoo <marc.dutoo@gmail.com>

* Add bool or cross db (#504)

* Create bool_or cross-db func

* Forgot a comma

* Update CHANGELOG.md

* Code review tweaks

* Fix union_relations error when no include/exclude provided (#509)

* Update CHANGELOG.md

* Add _is_ephemeral test to get_column_values (#518)

* Add _is_ephemeral test

Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com>

* Add deduplication macro (#512)

* Update README.md

* Mutually excl range examples in disclosure triangle

* Fix union_relations error when no include/exclude provided

* Fix union_relations error when no include/exclude provided (#509)

* Update CHANGELOG.md

* Add dedupe macro

* Add test for dedupe macro

* Add documentation to README

* Add entry to CHANGELOG

* Implement review

* Typed materialized views as views (#525)

* Typed materialized views as views

* Update get_relations_by_pattern.sql

* Moving fix from get_tables_by_pattern_sql

reverting changes to this file to add a fix to the macro get_tables_by_pattern_sql

* removing quoting from table_type

removing quoting from table_type as this was causing an error when calling this macro within get_tables_by_pattern_sql

* calling get_table_types_sql for materialized views

calling get_table_types_sql macro to handle materialized views in sources.

* Add `alias` argument to `deduplicate` macro (#526)

* Add `alias` argument to `deduplicate

* Test `alias` argument

* Rename `alias` to `relation_alias`

* Fix/use generic test naming style instead of schema test (#521)

* Updated Rreferences to 'schema test' in README along with small improvements to test descriptions.  Updates were also carried out in folder structure and integration README

* Updated references to 'schema test' in Changelog

* updated changelog with changes to documentation and fproject file structure

* Apply suggestions from code review

Update macro descriptions to be "asserts that"

* Update CHANGELOG.md

* Update README.md

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* Remove extraneous whitespace (#529)

* rm whitespace from date_trunc

* datediff

* rm uncessary whitespace control

* change log

* fix CHANGELOG

* address comments

* Feature/add listagg macro (#530)

* Update README.md

* Mutually excl range examples in disclosure triangle

* Fix union_relations error when no include/exclude provided

* Fix union_relations error when no include/exclude provided (#509)

* Update CHANGELOG.md

* Add to_condition to relationships where

* very minor nit - update "an new" to "a new" (#519)

* add quoting to split_part (#528)

* add quoting to split_part

* update docs for split_part

* typo

* corrected readme syntax

* revert and update to just documentation

* add new line

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* add macro to get columns (#516)

* add macro to get columns

* star macro should use get_columns

* add adapter.

* swap adapter for dbt_utils

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* update documentation

* add output_lower arg

* update name to get_filtered_columns_in_relation from get_columns

* add tests

* forgot args

* too much whitespace removal

    -----------
    Actual:
    -----------
    --->"field_3"as "test_field_3"<---

    -----------
    Expected:
    -----------
    --->"field_3" as "test_field_3"<---

* didnt mean to move a file that i did not create. moving things back.

* remove lowercase logic

* limit_zero

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* Add listagg macro and integration test

* remove type in listagg macro

* updated integration test

* Add redshift to listagg macro

* remove redshift listagg

* explicitly named group by column

* updated default values

* Updated example to use correct double vs. single quotes

* whitespace control

* Added redshift specific macro

* Remove documentation

* Update integration test so less likely to accidentally work

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>

* default everything but measure to none

* added limit functionality for other dbs

* syntax bug for postgres

* update redshift macro

* fixed block def control

* Fixed bug in redshift

* Bug fix redshift

* remove unused group_by arg

* Added additional test without order by col

* updated to regex replace

* typo

* added more integration_tests

* attempt to make redshift less complicated

* typo

* update redshift

* replace to substr

* More explicit versions with added complexity

* handle special characters

Co-authored-by: Joel Labes <joel.labes@dbtlabs.com>
Co-authored-by: Jamie Rosenberg <james.rosenberg@canva.com>
Co-authored-by: Pat Kearns <pat.kearns@fishtownanalytics.com>

* patch default behaviour in get_column_values (#533)

* Update changelog, add missing quotes around get_table_types_sql

* rm whitespace

Co-authored-by: Joe Markiewicz <74217849+fivetran-joemarkiewicz@users.noreply.github.com>
Co-authored-by: Anders <swanson.anders@gmail.com>
Co-authored-by: Mikaël Simarik <mikael.simarik@gmail.com>
Co-authored-by: Graham Wetzler <graham@wetzler.dev>
Co-authored-by: Taras <32882370+Aesthet@users.noreply.github.com>
Co-authored-by: José Coto <jlcoto@users.noreply.github.com>
Co-authored-by: Taras Stetsiak <tstetsiak@health-union.com>
Co-authored-by: nickperrott <46330920+nickperrott@users.noreply.github.com>
Co-authored-by: Nick Perrott <nperrott@roiti.com>
Co-authored-by: Ted Conbeer <tconbeer@users.noreply.github.com>
Co-authored-by: Armand Duijn <armandduijn@users.noreply.github.com>
Co-authored-by: Elize Papineau <elizepapineau@gmail.com>
Co-authored-by: Elize Papineau <elize.papineau@dbtlabs.com>
Co-authored-by: Joe Ste.Marie <stemarie.joe@gmail.com>
Co-authored-by: Niall Woodward <niall@niallrees.com>
Co-authored-by: Marc Dutoo <marc.dutoo@gmail.com>
Co-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
Co-authored-by: Luis Leon <98919783+luisleon90@users.noreply.github.com>
Co-authored-by: Brid Moynihan <moynihanbridgit@gmail.com>
Co-authored-by: SunriseLong <44146580+SunriseLong@users.noreply.github.com>
Co-authored-by: Grace Goheen <53586774+graciegoheen@users.noreply.github.com>
Co-authored-by: Jamie Rosenberg <james.rosenberg@canva.com>
Co-authored-by: Pat Kearns <pat.kearns@fishtownanalytics.com>
Co-authored-by: James McNeill <55981540+jpmmcneill@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants