Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(redshift): Cannot add distkey to existing table #26733

Closed
ghost opened this issue Aug 11, 2023 · 2 comments · Fixed by #26789
Closed

(redshift): Cannot add distkey to existing table #26733

ghost opened this issue Aug 11, 2023 · 2 comments · Fixed by #26789
Labels
@aws-cdk/aws-redshift Related to Amazon Redshift bug This issue is a bug. effort/small Small work item – less than a day of effort p1

Comments

@ghost
Copy link

ghost commented Aug 11, 2023

Describe the bug

Adding distKey=true to a column where the table previously had no distribution key fails to deploy because it tries to create the table.

The diff:

[~] Custom::RedshiftDatabaseQuery staff_join_staff_roles-id/Resource/Resource staffjoinstaffrolesidFA762FF7 
 └─ [~] tableColumns
     └─ @@ -2,7 +2,7 @@
        [ ] {
        [ ]   "dataType": "varchar(36)",
        [ ]   "name": "staff_id",
        [-]   "distKey": false,
        [+]   "distKey": true,
        [ ]   "sortKey": false,
        [ ]   "id": "staff_id"
        [ ] },

The error:

3:45:11 PM | UPDATE_FAILED        | Custom::RedshiftDatabaseQuery               | staffjoinstaffrolesidFA762FF7
Received response status [FAILED] from custom resource. Message returned: Statement status was FAILED: ERROR: Relation "staff_join_staff_roles" already exist
s

Logs: /aws/lambda/dev-business-insights-QueryRedshiftDatabase3de5bea-fGJLJqA4DIkh

at waitForStatementComplete (/var/task/redshift-data.js:34:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async executeStatement (/var/task/redshift-data.js:18:5)
at async createTable (/var/task/table.js:47:5)
at async handler (/var/task/table.js:22:27) (RequestId: fd93e151-06ce-487f-8412-955a039bb9c3)

Expected Behavior

CDK should add a distribution key to the table.

Current Behavior

CDK tries to create the table from scratch.

Reproduction Steps

It's unfortunately difficult for me to create a runnable snippet due to the amount of proprietary code used to generate it, but I've been able to reproduce the error with various tables in our system. The basic steps are:

  1. Create a table with any number of columns, ensuring that
  • None are set as the distribution key.
  • Do not set the distribution style.
  1. Deploy
  2. Edit your code and set .distKey(true) for any single column
  3. Deploy -- it will fail

As best as I can figure it, our code does something like this:

List<Column> columns = List<Columns>.of(
    Column.builder()
          .name("foo")
          .dataType("VARCHAR(255)")
          .distKey(false)  // Change this to `true` on your 2nd deploy
          .sortKey(false)
          .build(),
    Column.builder()
          .name("bar")
          .dataType("VARCHAR(255)")
          .distKey(false)
          .sortKey(false)
          .build()
);

Table table = Table.Builder.create(databaseScope, "mytable-id")
    .tableName("mytable")
    .cluster(myDbCluster)
    .databaseName("mydatabase")
    .tableColumns(columns)
    .build();

Possible Solution

The current code issues a CREATE TABLE statement if the table either has no distkey and we're adding one, or had a distkey and we're removing it:

// aws-redshift-alpha/lib/private/database-query-provider/table.ts
if ((!oldDistKey && newDistKey ) || (oldDistKey && !newDistKey)) {
  return createTable(tableNamePrefix, tableNameSuffix, tableColumns, tableAndClusterProps);
} else if (oldDistKey !== newDistKey) {
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

The problem here is that the resulting query tries to create a new table with the same name, which Redshift will reject because the table already exists.

Redshift supports adding and removing distribution keys on existing tables (see docs here) so we should take advantage of that. I believe the following will work:

if (!oldDistKey && newDistKey) {
  // Table has no existing distribution key, add a new one
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE KEY DISTKEY ${newDistKey}`);
} else if (oldDistKey && !newDistKey) {
  // Table has a distribution key, remove and set to AUTO
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE AUTO`);
} else if (oldDistKey !== newDistKey) {
  // Table has an existing distribution key, change it
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

Additional Information/Context

No response

CDK CLI Version

2.88.0

Framework Version

2.17.184

Node.js Version

20.5.1

OS

MacOS 13.5

Language

Java

Language Version

JDK 17

Other information

No response

@ghost ghost added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 11, 2023
@github-actions github-actions bot added the @aws-cdk/aws-redshift Related to Amazon Redshift label Aug 11, 2023
@pahud pahud added p1 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. labels Aug 14, 2023
@pahud
Copy link
Contributor

pahud commented Aug 14, 2023

Yes we should address this use case.

mergify bot added a commit to lpizzinidev/aws-cdk that referenced this issue Aug 17, 2023
@mergify mergify bot closed this as completed in #26789 Aug 17, 2023
mergify bot pushed a commit that referenced this issue Aug 17, 2023
…26789)

The current implementation executes a `CREATE TABLE` if the table has no `distKey` specified and one is added, or if a `distKey` is present and is removed.
The resulting table is created with the same name, causing the update operation to fail.
This fixes the problem by using [`ALTER TABLE`](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html) when adding/removing `distKey` on a table update. 

Closes #26733.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-redshift Related to Amazon Redshift bug This issue is a bug. effort/small Small work item – less than a day of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant