Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't provide a diststyle to redshift tables when diststyle is configured as "auto" #2246

Closed
1 of 5 tasks
drewbanin opened this issue Mar 25, 2020 · 2 comments · Fixed by #2298
Closed
1 of 5 tasks
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift

Comments

@drewbanin
Copy link
Contributor

Describe the bug

Redshift does not support explicitly setting a diststyle to auto. Instead, if a user configures a diststyle as "auto", dbt should ignore the config. This compels Redshift to use the "auto" diststyle.

See this thread for more info: #1882

Steps To Reproduce

-- models/my_model.sql

{{ config(materialized='table', dist='auto') }}

select 1 as id

The model will fail to build indicating that auto is not a valid diststyle.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

0.16.0
@drewbanin drewbanin added bug Something isn't working redshift good_first_issue Straightforward + self-contained changes, good for new contributors! labels Mar 25, 2020
@drewbanin drewbanin added this to the Octavius Catto milestone Mar 25, 2020
@rodrigodelmonte
Copy link
Contributor

rodrigodelmonte commented Mar 30, 2020

Hi @drewbanin

Thanks for keeping the burning flame of this topic! I like the idea of explicit is better than implicit.

I revisited the Redshift Docs and found it explaining how Redshift chooses the distribution key and the sort key for CTAs, referral documentation that brought me into this page here.

In a nutshell, if it is a simple query the CTA table will inherit the distribution key from the source table. In another hand, if it is a complex query with joins and group by, Redshift will choose the distribution key and the sort key based on the Query Plan.

If you don't specify DISTKEY or DISTSTYLE, CTAS determines the distribution style for the new table based on the query plan for the SELECT clause. For more information, see Inheritance of Column and Table Attributes.

My point is, it is still an automatic mechanism since the user can choose to not define the distribution key, and Redshift will choose one. However, it will not mark the distribution key as AUTO on the metadata tables definition. Probably, Redshift will not choose a new distribution key based on table usage for example.

I took the opportunity to execute your suggested example and check the distsyle (I've omitted the materialization in the config block), here it is:

select "table", diststyle from SVV_TABLE_INFO
where "table" like 'my_test%';
-- result:
-- table        diststyle
-- my_test	EVEN

Anyway, I will be very happy contributing with this PR.

@drewbanin
Copy link
Contributor Author

Awesome, thanks @rodrigodelmonte! A PR would be wonderful and very much welcomed!

rodrigodelmonte pushed a commit to rodrigodelmonte/dbt that referenced this issue Apr 3, 2020
rodrigodelmonte pushed a commit to rodrigodelmonte/dbt that referenced this issue Apr 6, 2020
  * Don't provide a diststyle to redshift tables when configured as "auto"
rodrigodelmonte pushed a commit to rodrigodelmonte/dbt that referenced this issue Apr 6, 2020
  * Don't provide a diststyle to redshift tables when configured as "auto"
rodrigodelmonte pushed a commit to rodrigodelmonte/dbt that referenced this issue Apr 7, 2020
  * Don't provide a diststyle to redshift tables when configured as "auto"
rodrigodelmonte pushed a commit to rodrigodelmonte/dbt that referenced this issue Apr 7, 2020
  * Don't provide a diststyle to redshift tables when configured as "auto"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors! redshift
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants