Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle more than 100 fields to compute hashid #970

Merged
merged 2 commits into from
Nov 13, 2020

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Nov 13, 2020

What

When running normalization on large tables, we combine all columns together via concat function and compute a hash from there to generate a surrogate key that could be used as primary key / foreign key in nested tables.

Unfortunately, the current code fails on Postgres because we cannot pass more than 100 arguments to a function for that SQL engine in order to close #913

How

Override dbt_utils.concat & dbt_utils.surrogate_key macros to handle more than 100 columns on postgres by using the binary || concat operator as an alternative to concat function:
(this should be a fix in the dbt-utils repository but we will override it for the moment)

https://www.postgresqltutorial.com/postgresql-concat-function/

hash_node_columns = jinja_call(f"dbt_utils.surrogate_key([\n {hash_node_columns}\n ])")
# Disable dbt_utils.surrogate_key for own version to fix a bug with Postgres (#913).
# hash_node_columns = jinja_call(f"dbt_utils.surrogate_key([\n {hash_node_columns}\n ])")
# We should re-enable it when our PR to dbt_utils is merged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the link to the dbt PR?

Copy link
Contributor Author

@ChristopheDuong ChristopheDuong Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbt-utils PR is here: dbt-labs/dbt-utils#296

@sherifnada
Copy link
Contributor

don't forget to publish the image once it's good to go

Copy link
Contributor

@jrhizor jrhizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Not blocking merging this PR, but I would feel safer with a test for this behavior in the destination integration tests.

@ChristopheDuong ChristopheDuong merged commit 4a52515 into master Nov 13, 2020
@ChristopheDuong ChristopheDuong deleted the chris/fix-normalization-hashid branch November 13, 2020 21:36
roman-yermilov-gl added a commit that referenced this pull request Jan 19, 2023
* Source Github: handle 502 Bad Gateway with proper log message

* Source Github: bump version

* Source Github: update changelog for new 0.3.12 version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
etsybaev pushed a commit that referenced this pull request Jan 19, 2023
* Source Github: handle 502 Bad Gateway with proper log message

* Source Github: bump version

* Source Github: update changelog for new 0.3.12 version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

normalization fails with >100 fields in a stream
4 participants