Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(redshift): column compression encodings and comments can now be customised #24177

Merged
merged 39 commits into from
Mar 8, 2023
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0f50d3f
addition: initial testing suite
Rizxcviii Jan 6, 2023
20fe7fa
addition: initial column encoding methods
Rizxcviii Jan 6, 2023
7517f8d
addition: docstring for ColumnEncoding
Rizxcviii Jan 6, 2023
8669223
addition: assigning enums to string variables
Rizxcviii Jan 6, 2023
38cfb4a
addition: adding encoding on creation of table
Rizxcviii Jan 6, 2023
656a307
addition: updates on column encoding
Rizxcviii Jan 6, 2023
7f2c6b9
addition: table comment and column comment
Rizxcviii Jan 6, 2023
5a72700
modification, addition
Rizxcviii Jan 6, 2023
77fe42c
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Jan 6, 2023
68d8107
modification: integ test
Rizxcviii Jan 6, 2023
8689510
addition: docuementation for encoding and commentting
Rizxcviii Jan 6, 2023
73366fd
addition, modification:
Rizxcviii Jan 23, 2023
6f73d7a
modification: removing table comments code
Rizxcviii Jan 26, 2023
6ec1f26
modification: removing table comments code
Rizxcviii Jan 26, 2023
033bdc7
modification: integ test snapshot
Rizxcviii Jan 26, 2023
6a8ede8
modification: reverting import cleanup
Rizxcviii Jan 26, 2023
c8ac796
modification: removing table comments from README
Rizxcviii Jan 26, 2023
6ce5fa9
modification: bugfix, nested and incorrect test
Rizxcviii Jan 26, 2023
a96840c
modification: using private enum
Rizxcviii Jan 26, 2023
c4582e9
addition: line break on EOF
Rizxcviii Jan 26, 2023
2c13a23
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 6, 2023
07b6126
modification: using an actual compression encoding used by VARCHAR
Rizxcviii Feb 9, 2023
452d8ea
modification: rosetta fixing, was probably not run using the yarn com…
Rizxcviii Feb 9, 2023
461a93d
removal: lock file
Rizxcviii Feb 9, 2023
c78e737
modification: typo
Rizxcviii Feb 9, 2023
6a48280
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 14, 2023
e319510
linting
Rizxcviii Feb 15, 2023
a5c715b
updating README
Rizxcviii Feb 15, 2023
2662b65
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 16, 2023
88499ec
modification: post cdk update changes
Rizxcviii Feb 16, 2023
bdbc536
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 20, 2023
0cdebe6
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 21, 2023
783438e
typo
Rizxcviii Feb 21, 2023
3fb1a51
reverting unecessary change in integ test
Rizxcviii Feb 21, 2023
c006522
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Feb 27, 2023
9b3c41e
Merge branch 'main' into feature/commentting-encoding
Rizxcviii Mar 3, 2023
b151288
updating README
Rizxcviii Mar 3, 2023
307e1b8
removing ColumnEncoding from private
Rizxcviii Mar 6, 2023
2613835
Merge branch 'main' into feature/commentting-encoding
comcalvi Mar 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 29 additions & 6 deletions packages/@aws-cdk/aws-redshift/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,17 +200,32 @@ new Table(this, 'Table', {
});
```

Tables can also be configured with a comment:
Tables and their respective columns can be configured to contain comments:

```ts fixture=cluster
new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)' },
{ name: 'col2', dataType: 'float' }
{ name: 'col1', dataType: 'varchar(4)', comment: 'This is a column comment' },
{ name: 'col2', dataType: 'float', comment: 'This is a another column comment' }
],
cluster: cluster,
databaseName: 'databaseName',
tableComment: 'This is a table comment',
});
```

Table columns can be configured to use a specific compression encoding:

```ts fixture=cluster
import { ColumnEncoding } from '@aws-cdk/aws-redshift';

new Table(this, 'Table', {
tableColumns: [
{ name: 'col1', dataType: 'varchar(4)', encoding: ColumnEncoding.TEXT32K },
{ name: 'col2', dataType: 'float', encoding: ColumnEncoding.DELTA32K },
],
cluster: cluster,
databaseName: 'databaseName',
comment: 'This is a comment',
});
```

Expand Down Expand Up @@ -417,14 +432,16 @@ Some Amazon Redshift features require Amazon Redshift to access other AWS servic
When you create an IAM role and set it as the default for the cluster using console, you don't have to provide the IAM role's Amazon Resource Name (ARN) to perform authentication and authorization.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc;

const defaultRole = new iam.Role(this, 'DefaultRole', {
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
},
);

new Cluster(stack, 'Redshift', {
new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
},
Expand All @@ -437,14 +454,16 @@ new Cluster(stack, 'Redshift', {
A default role can also be added to a cluster using the `addDefaultIamRole` method.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc;

const defaultRole = new iam.Role(this, 'DefaultRole', {
assumedBy: new iam.ServicePrincipal('redshift.amazonaws.com'),
},
);

const redshiftCluster = new Cluster(stack, 'Redshift', {
const redshiftCluster = new Cluster(this, 'Redshift', {
masterUser: {
masterUsername: 'admin',
},
Expand All @@ -460,6 +479,8 @@ redshiftCluster.addDefaultIamRole(defaultRole);
Attaching IAM roles to a Redshift Cluster grants permissions to the Redshift service to perform actions on your behalf.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc

const role = new iam.Role(this, 'Role', {
Expand All @@ -477,6 +498,8 @@ const cluster = new Cluster(this, 'Redshift', {
Additional IAM roles can be attached to a cluster using the `addIamRole` method.

```ts
import * as ec2 from '@aws-cdk/aws-ec2';
import * as iam from '@aws-cdk/aws-iam';
declare const vpc: ec2.Vpc

const role = new iam.Role(this, 'Role', {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ export async function handler(event: AWSLambda.CloudFormationCustomResourceEvent
}
return subHandler(event.ResourceProperties, event);
}

export { ColumnEncoding } from './types';
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/* eslint-disable-next-line import/no-unresolved */
import * as AWSLambda from 'aws-lambda';
import { executeStatement } from './redshift-data';
import { ClusterProps, TableAndClusterProps, TableSortStyle } from './types';
import { ClusterProps, ColumnEncoding, TableAndClusterProps, TableSortStyle } from './types';
import { areColumnsEqual, getDistKeyColumn, getSortKeyColumns } from './util';
import { Column } from '../../table';

Expand Down Expand Up @@ -40,7 +40,7 @@ async function createTable(
tableAndClusterProps: TableAndClusterProps,
): Promise<string> {
const tableName = tableNamePrefix + tableNameSuffix;
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}`).join();
const tableColumnsString = tableColumns.map(column => `${column.name} ${column.dataType}${getEncodingColumnString(column)}`).join();

let statement = `CREATE TABLE ${tableName} (${tableColumnsString})`;

Expand All @@ -61,6 +61,11 @@ async function createTable(

await executeStatement(statement, tableAndClusterProps);

for (const column of tableColumns) {
if (column.comment) {
await executeStatement(`COMMENT ON COLUMN ${tableName}.${column.name} IS '${column.comment}'`, tableAndClusterProps);
}
}
if (tableAndClusterProps.tableComment) {
await executeStatement(`COMMENT ON TABLE ${tableName} IS '${tableAndClusterProps.tableComment}'`, tableAndClusterProps);
}
Expand Down Expand Up @@ -107,6 +112,20 @@ async function updateTable(
alterationStatements.push(...columnAdditions.map(addition => `ALTER TABLE ${tableName} ${addition}`));
}

const columnEncoding = tableColumns.filter(column => {
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.encoding !== oldColumn.encoding);
}).map(column => `ALTER COLUMN ${column.name} ENCODE ${column.encoding || ColumnEncoding.AUTO}`);
if (columnEncoding.length > 0) {
alterationStatements.push(`ALTER TABLE ${tableName} ${columnEncoding.join(', ')}`);
}

const columnComments = tableColumns.filter(column => {
return oldTableColumns.some(oldColumn => column.name === oldColumn.name && column.comment !== oldColumn.comment);
}).map(column => `COMMENT ON COLUMN ${tableName}.${column.name} IS ${column.comment ? `'${column.comment}'` : 'NULL'}`);
if (columnComments.length > 0) {
alterationStatements.push(...columnComments);
}

const oldDistStyle = oldResourceProperties.distStyle;
if ((!oldDistStyle && tableAndClusterProps.distStyle) ||
(oldDistStyle && !tableAndClusterProps.distStyle)) {
Expand Down Expand Up @@ -162,3 +181,10 @@ async function updateTable(
function getSortKeyColumnsString(sortKeyColumns: Column[]) {
return sortKeyColumns.map(column => column.name).join();
}

function getEncodingColumnString(column: Column): string {
if (column.encoding) {
return ` ENCODE ${column.encoding}`;
}
return '';
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,112 @@ export enum TableSortStyle {
*/
INTERLEAVED = 'INTERLEAVED',
}

/**
* The compression encoding of a column.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Compression_encodings.html
*/
export enum ColumnEncoding {
/**
* Amazon Redshift assigns an optimal encoding based on the column data.
* This is the default.
*/
AUTO = 'AUTO',

/**
* The column is not compressed.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Raw_encoding.html
*/
RAW = 'RAW',

/**
* The column is compressed using the AZ64 algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/az64-encoding.html
*/
AZ64 = 'AZ64',

/**
* The column is compressed using a separate dictionary for each block column value on disk.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html
*/
BYTEDICT = 'BYTEDICT',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 1-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA = 'DELTA',

/**
* The column is compressed based on the difference between values in the column.
* This records differences as 2-byte values.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Delta_encoding.html
*/
DELTA32K = 'DELTA32K',

/**
* The column is compressed using the LZO algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/lzo-encoding.html
*/
LZO = 'LZO',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 1 byte.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY8 = 'MOSTLY8',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 2 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY16 = 'MOSTLY16',

/**
* The column is compressed to a smaller storage size than the original data type.
* The compressed storage size is 4 bytes.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_MostlyN_encoding.html
*/
MOSTLY32 = 'MOSTLY32',

/**
* The column is compressed by recording the number of occurrences of each value in the column.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html
*/
RUNLENGTH = 'RUNLENGTH',

/**
* The column is compressed by recording the first 245 unique words and then using a 1-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT255 = 'TEXT255',

/**
* The column is compressed by recording the first 32K unique words and then using a 2-byte index to represent each word.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/c_Text255_encoding.html
*/
TEXT32K = 'TEXT32K',

/**
* The column is compressed using the ZSTD algorithm.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/zstd-encoding.html
*/
ZSTD = 'ZSTD',
}
17 changes: 17 additions & 0 deletions packages/@aws-cdk/aws-redshift/lib/table.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import { Construct, IConstruct } from 'constructs';
import { ICluster } from './cluster';
import { DatabaseOptions } from './database-options';
import { DatabaseQuery } from './private/database-query';
import { ColumnEncoding } from './private/database-query-provider';
import { HandlerName } from './private/database-query-provider/handler-name';
import { getDistKeyColumn, getSortKeyColumns } from './private/database-query-provider/util';
import { TableHandlerProps } from './private/handler-props';
Expand Down Expand Up @@ -79,6 +80,20 @@ export interface Column {
* @default - column is not a SORTKEY
*/
readonly sortKey?: boolean;

/**
* The encoding to use for the column.
*
* @default - Amazon Redshift determines the encoding based on the data type.
*/
readonly encoding?: ColumnEncoding;

/**
* A comment to attach to the column.
*
* @default - no comment
*/
readonly comment?: string;
}

/**
Expand Down Expand Up @@ -344,3 +359,5 @@ export enum TableSortStyle {
*/
INTERLEAVED = 'INTERLEAVED',
}

export { ColumnEncoding } from './private/database-query-provider';
Loading