Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: Prevent temp relation identifiers from being too long #2850

Merged
merged 3 commits into from
Nov 9, 2020

Conversation

elexisvenator
Copy link
Contributor

@elexisvenator elexisvenator commented Oct 26, 2020

related #2197

Description

The currently postgres make_temp_relation adds a 29 character suffix to the end of the temp relation identifier (9 from default suffix and 20 from timestamp). This is a problem now that relations with more than 63 characters raise exceptions.
The fix is to shorten the suffix and also trim the base_relation identifier so that the total length is always less than 63 characters.

An exception can also be raised if the default suffix is overridden with a value that is too long.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Oct 26, 2020
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, @elexisvenator! Really cleverly done. This will enable us to say, once and for all, that 63 is the number we shall count, and the number of the counting shall be 63, regardless of materialization type.

Quick bits before we can merge this:

  1. Can you add a changelog note?
  2. I would have expected a failing test here. (More specifically, a pass where we don't expect it.) Could you take a quick look at 063_relation_name_tests and adjust so that we can test for (a) passage when a postgres incremental model name is ~60 chars (thanks to this PR), (b) failure when a postgres model name is itself >63 chars?

{% if suffix_length > relation_max_name_length %}
{% do exceptions.raise_compiler_error('Temp relation suffix is too long (' ~ suffix|length ~ ' characters). Maximum length is ' ~ (relation_max_name_length - dtstring|length) ~ ' characters.') %}
{% endif %}
{% set tmp_identifier = base_relation.identifier[:relation_max_name_length - suffix_length] ~ suffix ~ dtstring %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate about replacing one silent truncation (by postgres) with another (by dbt), but I suppose this is only for temp relation naming, for use mid-materialization, and it won't have any effect on how users actually reference the final model

@elexisvenator
Copy link
Contributor Author

This will enable us to say, once and for all, that 63 is the number we shall count, and the number of the counting shall be 63, regardless of materialization type.

Almost, the actual limit is 51 due to most materializations creating persisted backup tables with the suffix __dbt_backup

I will add the changes tonight 👍

@jtcohen6
Copy link
Contributor

Almost, the actual limit is 51 due to most materializations creating persisted backup tables with the suffix __dbt_backup
make_temp_relation

Yes, that's silly and certainly for historical reasons. We should just use make_temp_relation in the table and view materializations as well, rather than rehashing all the logic for tmp_identifier and intermediate_relation.

If that's a change you wanted to address in this PR, I'd support it. You certainly don't have to, though, it's at the edge of the scope.

@elexisvenator
Copy link
Contributor Author

I'm less certain about touching backup tables. They are persisted and I frequently see them continuing to exist especially if dbt fails for some reason during a run. If they have a unique generated name then it wouldn't be possible to clean these up automatically.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 28, 2020

Ok, that's a fair point. I wouldn't want to give globally unique names to backup_relation, only to intermediate_relation, but there's still the matter of the suffix length. I'd like to give this more dedicated thought, but I think that's going to be a while from now, when it's finally necessary and proper to clean up our builtin materializations.

For the time being, then, 51 shall be the order of the day.

Related: dbt-labs#2197 

The currently postgres `make_temp_relation` adds a 29 character suffix to the end of the temp relation identifier (9 from default suffix and 20 from timestamp).  This is a problem now that relations with more than 63 characters raise exceptions. 
The fix is to shorten the suffix and also trim the base_relation identifier so that the total length is always less than 63 characters.

An exception can also be raised if the default suffix is overridden with a value that is too long.
@elexisvenator
Copy link
Contributor Author

I updated the integration tests to check up to 51 characters, and do a separate check that triggers make_temp_relation in an incremental model. There is also a skipped test that would pass if a way to deal with the __dbt_backup is found.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this contribution @elexisvenator!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants