Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new macros array_append and array_construct and integration tests #595

Merged
merged 18 commits into from
Jun 14, 2022

Conversation

graciegoheen
Copy link
Contributor

@graciegoheen graciegoheen commented May 18, 2022

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Created two new cross db macros array_construct and array_append for use in a package the dbt Labs pro-serv team is developing.

Additionally created cross db macros cast_array_to_string and array_concat to be able to implement integration tests.

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt_utils.type_* macros instead of explicit datatypes (e.g. dbt_utils.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@graciegoheen graciegoheen self-assigned this May 18, 2022
@graciegoheen
Copy link
Contributor Author

graciegoheen commented May 18, 2022

Open to suggestions here as to the best way to test these macros! I can't seem to store a column of ARRAY type in a seed file, so am currently casting the arrays as strings in order to compare.

Note: There is no integration test on cast_array_to_string.

@graciegoheen graciegoheen requested a review from dbeatty10 May 19, 2022 15:47
Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool stuff @graciegoheen 🤩 ! The integration tests can be some of the trickiest parts -- impressed what you were able to do there.

Initial thought

  • In light of us migrating all cross-database macros into dbt-core, 1) we'll need to figure out if these are considered "cross-database macros" or not, and 2) if so, do we want to include them in the current migration effort into adapters or not.

Would love to hear opinions from you and others!

These seem like cross-database macros to me. I don't have strong opinions whether they should be included in the migration or not. On one hand, it would be nice to get any foundational macros all the way into core. On the other, we've been attracted to the idea that the macros we're migrating have been vetted by years of use.

Idea for an additional macro

In Python, append adds an element to a list, and extend concatenates the first list with another list.

Wondering if we might want to add complementary macro named array_extend or array_concat? The former would align with Python naming conventions, whereas the latter would align with SQL naming conventions.

Taking a quick look at array functions in different databases, it looks like

Database Append single item to array Extend one array with another
Postgres array_append array_cat
Snowflake array_append array_cat
Redshift N/A array_concat
BigQuery N/A array_concat

@graciegoheen
Copy link
Contributor Author

Hi @dbeatty10 - thanks for the great feedback!

  • I think these are considered "cross-database macros" in that they produce the same output regardless of the db you're using
  • Regardless of if they should be migrated to core or not, I think they do belong in dbt-utils for the time being (otherwise these will probably just exist with the dbt_project_evaluator package as they do currently)
  • I like the idea of adding an array_concat macro, I can spend some time developing what that would look like.

@graciegoheen graciegoheen changed the title Added new macros array_append and create_array and integration tests Added new macros array_append and array_construct and integration tests May 25, 2022
@graciegoheen
Copy link
Contributor Author

array_concat has been added!

@graciegoheen graciegoheen requested review from b-per and dbeatty10 May 25, 2022 13:27
@b-per
Copy link
Contributor

b-per commented May 26, 2022

🚀

Quick questions on my side about tests. Should we add a few tests to verify the behavior with NULLs? For example:

  • array_append with a NULL element to append
  • array_append with a NULL array and a NON NULL element
  • array_append with a NULL array and a NULL element
  • array_concat with some NULL array (on the left and/or right)
  • cast_array_to_string with a NULL array

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Jun 6, 2022

@b-per For bigquery: raises an error if the query result has an ARRAY which contains NULL elements, although such an ARRAY can be used inside the query.

Do you think these array functions should prevent you from having a NULL element of the array? Or should I just add a note in the documentation? Not sure how to handle this...

@b-per
Copy link
Contributor

b-per commented Jun 7, 2022

Hmm. To be honest, I was actually not sure what would be the outcome of adding NULL to arrays, (either, actually adding a NULL or not doing anything) and wanted to make sure that the behaviour was the same across all DWs.

BigQuery can't return an array with NULL values but couldn't we use cast_array_to_string that you created? Their docs mention that calling FORMAT("%T", [1, NULL, 3]) works for example.

@graciegoheen
Copy link
Contributor Author

graciegoheen commented Jun 7, 2022

A couple of things:

  • cast_array_to_string currently uses string_agg for bigquery (a work-around to array_to_string only supporting string type arrays) which doesn't allow for NULL values. We could switch this to FORMAT (but that will also add spaces in between each array element, so will need to adjust that so that it matches cross-db).
  • the bigger question is probably, does this still count as "cross-db compatible" if you can't output an array with a NULL value (for example, if you run this and try to select * from test_array_concat with an a NULL value in the array, you get an error in BigQuery)

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome @graciegoheen 🤩

Glad that you and @b-per discovered that BigQuery does not allowing arrays containing nulls in select statements.

That leaves us with a range of options for moving forward, including:

  1. Include the affected array functions in dbt-utils with a documented note
  2. Include the affected array functions in dbt-utils, but throw a "not supported" error for bigquery
  3. Decide not to include the affected array functions in dbt-utils

Option 1 feels like the best balance of functionality and consistent behavior, so let's ship it! :shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants