Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(glue): add L2 resources for Database and Table #1988

Merged
merged 25 commits into from
Mar 14, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3551d75
Add glue database and table
Mar 8, 2019
3de3b03
Add unit tests for database and schema
Mar 9, 2019
e4df159
Stash
Mar 9, 2019
502b0cb
Improve test coverage of table
Mar 11, 2019
a08dc37
Add integration tests and README
Mar 11, 2019
f4db178
Update README with types
Mar 11, 2019
0d10989
Add validation for name uniqueness and at least one column
Mar 11, 2019
cf6d56d
Use strongly named references
Mar 11, 2019
9dc8e57
Update StorageType enums to be enum-like classes
Mar 11, 2019
c7f62d7
Add SSE-S3 and SSE-KMS encryption support
Mar 12, 2019
e0043a9
Restrict s3 grants to only objects containing the table's prefix
Mar 12, 2019
231c36a
Add tsdocs for Type
Mar 12, 2019
cee2e46
Add Encryption to README
Mar 12, 2019
4595fde
Add CSE encryption and distinguish SSE-KMS from SSE-KMS-MANAGED
Mar 12, 2019
b8f886b
Minor fixes to the README
Mar 12, 2019
db1960b
Some more minor fixes to the README
Mar 12, 2019
d245b1c
Merge branch 'master' into samgood/glue
Mar 13, 2019
f98d5fd
Rename prefix to s3Prefix and use haveResource in tests
Mar 13, 2019
d3ccd53
Use string concatentation
Mar 13, 2019
538cff9
Add docs and fix string concatenation
Mar 13, 2019
0c2f7fa
Rename StorageType to DataFormat
Mar 13, 2019
1cb7c4f
Improve docs and make the TableEncryption enum more consistent with B…
Mar 13, 2019
a5d45f0
Refactor s3 bucket creation into separate function and support unencr…
Mar 13, 2019
30f8a3c
add test for CSE-KMS with an explicit bucket
Mar 13, 2019
45434fe
minor fixes to README
Mar 13, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion packages/@aws-cdk/aws-glue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ new glue.Database(stack, 'MyDatabase', {

### Table

A Glue table describes the structure (column names and types),location of data (S3 objects with a common prefix in a S3 bucket) and format of the files (Json, Avro, Parquet, etc.):
A Glue table describes the structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket) and format of the files (Json, Avro, Parquet, etc.):

```ts
new glue.Table(stack, 'MyTable', {
Expand Down Expand Up @@ -69,6 +69,33 @@ new glue.Table(stack, 'MyTable', {
});
```

### [Encryption](https://docs.aws.amazon.com/athena/latest/ug/encryption.html)

You can enable encryption on a S3 bucket:
* [SSE-S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html) - Server side encryption (SSE) with an Amazon S3-managed key.
```ts
new glue.Table(stack, 'MyTable', {
encryption: glue.TableEncryption.SSE_S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum names should be consistent with BucketEncryption

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue the other way around - the enum values are consistent with the S3, Athena, Glue and EMR documentation. What would I name CSE-KMS if I were copying BucketEncryption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, but I think we have a problem with ALL_CAPS when converting those member names to other languages. Can we find names that are PascalCase?

...
});
```
* [SSE-KMS](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html) - Server-side encryption (SSE) with a AWS Key Management Service customer managed key.

```ts
// with a KMS managed key
new glue.Table(stack, 'MyTable', {
encryption: glue.TableEncryption.SSE_KMS
...
});

// with a customer-managed KMS key
new glue.Table(stack, 'MyTable', {
encryption: glue.TableEncryption.SSE_KMS,
encryptionKey: new kms.EncryptionKey(stack, 'MyKey')
...
});
```

### Types

A table's schema is a collection of columns, each of which have a `name` and a `type`. Types are recursive structures, consisting of primitive and complex types:
Expand Down
3 changes: 2 additions & 1 deletion packages/@aws-cdk/aws-glue/lib/database.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,13 @@ export class Database extends cdk.Construct {
});

// see https://docs.aws.amazon.com/glue/latest/dg/glue-specifying-resource-arns.html#data-catalog-resource-arns
this.databaseName = resource.ref;
this.databaseName = resource.databaseName;
this.databaseArn = this.node.stack.formatArn({
service: 'glue',
resource: 'database',
resourceName: this.databaseName
});
// catalogId is implicitly the accountId, which is why we don't pass the catalogId here
this.catalogArn = this.node.stack.formatArn({
service: 'glue',
resource: 'catalog'
eladb marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
3 changes: 3 additions & 0 deletions packages/@aws-cdk/aws-glue/lib/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ export interface Column {
comment?: string;
}

/**
* Represents a type of a column in a table schema.
*/
export interface Type {
sam-goodwin marked this conversation as resolved.
Show resolved Hide resolved
/**
* Indicates whether this type is a primitive data type.
Expand Down
26 changes: 16 additions & 10 deletions packages/@aws-cdk/aws-glue/lib/storage-type.ts
Original file line number Diff line number Diff line change
@@ -1,44 +1,50 @@
/**
* Absolute class name of the Hadoop `InputFormat` to use when reading table files.
*/
export enum InputFormat {
export class InputFormat {
/**
* An InputFormat for plain text files. Files are broken into lines. Either linefeed or
* carriage-return are used to signal end of line. Keys are the position in the file, and
* values are the line of text.
*
* @see https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TextInputFormat.html
*/
TextInputFormat = 'org.apache.hadoop.mapred.TextInputFormat'
public static readonly TextInputFormat = new InputFormat('org.apache.hadoop.mapred.TextInputFormat');

constructor(public readonly className: string) {}
}

/**
* Absolute class name of the Hadoop `OutputFormat` to use when writing table files.
*/
export enum OutputFormat {
export class OutputFormat {
/**
* Writes text data with a null key (value only).
*
* @see https://hive.apache.org/javadocs/r2.2.0/api/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.html
*/
HiveIgnoreKeyTextOutputFormat = 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
public static readonly HiveIgnoreKeyTextOutputFormat = new OutputFormat('org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat');

constructor(public readonly className: string) {}
}

/**
* Serialization library to use when serializing/deserializing (SerDe) table records.
*
* @see https://cwiki.apache.org/confluence/display/Hive/SerDe
*/
export enum SerializationLibrary {
export class SerializationLibrary {
/**
* @see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-JSON
*/
HiveJson = 'org.apache.hive.hcatalog.data.JsonSerDe',
public static readonly HiveJson = new SerializationLibrary('org.apache.hive.hcatalog.data.JsonSerDe');

/**
* @see https://github.com/rcongiu/Hive-JSON-Serde
*/
OpenXJson = 'org.openx.data.jsonserde.JsonSerDe'
public static readonly OpenXJson = new SerializationLibrary('org.openx.data.jsonserde.JsonSerDe');

constructor(public readonly className: string) {}
}

/**
Expand All @@ -48,17 +54,17 @@ export interface StorageType {
/**
* `InputFormat` for this storage type.
*/
inputFormat: string;
inputFormat: InputFormat;

/**
* `OutputFormat` for this storage type.
*/
outputFormat: string;
outputFormat: OutputFormat;

/**
* Serialization library for this storage type.
*/
serializationLibrary: string;
serializationLibrary: SerializationLibrary;
}

export namespace StorageType {
Expand Down
131 changes: 121 additions & 10 deletions packages/@aws-cdk/aws-glue/lib/table.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import iam = require('@aws-cdk/aws-iam');
import kms = require('@aws-cdk/aws-kms');
import s3 = require('@aws-cdk/aws-s3');
import cdk = require('@aws-cdk/cdk');
import { IDatabase } from './database';
Expand All @@ -13,6 +14,38 @@ export interface ITable extends cdk.IConstruct {
export(): TableImportProps;
}

/**
* Encryption options for a Table.
*
* @see https://docs.aws.amazon.com/athena/latest/ug/encryption.html
*/
export enum TableEncryption {
Unencrypted = 'Unecrypted',

/**
* Server side encryption (SSE) with an Amazon S3-managed key.
*
* @see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
*/
SSE_S3 = 'SSE-S3',

/**
* Server-side encryption (SSE) with an AWS KMS customer managed key.
*
* @see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html
*/
SSE_KMS = 'SSE-KMS',

/**
* Client-side encryption (CSE) with an AWS KMS customer managed key.
*
* @see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
*
* TODO: implement. It's not clear what properties to set on a table to support client-side encryption.
*/
// CSE_KMS = 'CSE-KMS'
}

export interface TableImportProps {
tableArn: string;
tableName: string;
Expand Down Expand Up @@ -41,7 +74,7 @@ export interface TableProps {
*
* @default one is created for you
*/
bucket?: s3.Bucket;
bucket?: s3.IBucket;

/**
* S3 prefix under which table objects are stored.
Expand Down Expand Up @@ -74,6 +107,28 @@ export interface TableProps {
*/
compressed?: boolean;

/**
* The kind of encryption to secure the data with.
*
* You can only provide this option if you are not explicitly passing in a Bucket.
*
* If you choose SSE-KMS, you can specify a KMS key via `encryptionKey`. If
* encryption key is not specified, a key will automatically be created.
*
* @default Unencrypted
*/
encryption?: TableEncryption;

/**
* External KMS key to use for bucket encryption.
*
* The 'encryption' property must be either un-specified or set to "SSE-KMS".
*
* @default If encryption is set to "Kms" and this property is undefined, a
* new KMS key will be created and associated with this bucket.
*/
encryptionKey?: kms.IEncryptionKey;

/**
* Indicates whether the table data is stored in subdirectories.
*
Expand All @@ -98,7 +153,7 @@ export class Table extends cdk.Construct implements ITable {
}

public readonly database: IDatabase;
public readonly bucket: s3.Bucket;
public readonly bucket: s3.IBucket;
public readonly prefix: string;
sam-goodwin marked this conversation as resolved.
Show resolved Hide resolved
sam-goodwin marked this conversation as resolved.
Show resolved Hide resolved

public readonly tableName: string;
Expand All @@ -109,9 +164,15 @@ export class Table extends cdk.Construct implements ITable {
public readonly partitionKeys?: Column[];

constructor(scope: cdk.Construct, id: string, props: TableProps) {
validateProps(props);
super(scope, id);

this.database = props.database;
this.bucket = props.bucket || new s3.Bucket(this, 'Bucket');
const encryption = parseEncryption(props);
this.bucket = props.bucket || new s3.Bucket(this, 'Bucket', {
encryption: encryption ? encryption.encryption : undefined,
encryptionKey: encryption ? encryption.encryptionKey : undefined
});
this.storageType = props.storageType;
this.prefix = props.prefix || 'data/';
this.columns = props.columns;
Expand All @@ -133,18 +194,18 @@ export class Table extends cdk.Construct implements ITable {
compressed: props.compressed === undefined ? false : props.compressed,
storedAsSubDirectories: props.storedAsSubDirectories === undefined ? false : props.storedAsSubDirectories,
columns: renderColumns(props.columns),
inputFormat: props.storageType.inputFormat,
outputFormat: props.storageType.outputFormat,
inputFormat: props.storageType.inputFormat.className,
outputFormat: props.storageType.outputFormat.className,
serdeInfo: {
serializationLibrary: props.storageType.serializationLibrary
serializationLibrary: props.storageType.serializationLibrary.className
}
},

tableType: 'EXTERNAL_TABLE'
}
});

this.tableName = tableResource.ref;
this.tableName = tableResource.tableName;
this.tableArn = cdk.Fn.join('', [this.database.databaseArn, '/', this.tableName]);
sam-goodwin marked this conversation as resolved.
Show resolved Hide resolved
}

Expand All @@ -162,7 +223,7 @@ export class Table extends cdk.Construct implements ITable {
*/
public grantRead(identity: iam.IPrincipal): void {
this.grant(identity, readPermissions);
this.bucket.grantRead(identity);
this.bucket.grantRead(identity, this.prefix);
}

/**
Expand All @@ -172,7 +233,7 @@ export class Table extends cdk.Construct implements ITable {
*/
public grantWrite(identity: iam.IPrincipal): void {
this.grant(identity, writePermissions);
this.bucket.grantWrite(identity);
this.bucket.grantWrite(identity, this.prefix);
}

/**
Expand All @@ -182,7 +243,7 @@ export class Table extends cdk.Construct implements ITable {
*/
public grantReadWrite(identity: iam.IPrincipal): void {
this.grant(identity, readPermissions.concat(writePermissions));
this.bucket.grantReadWrite(identity);
this.bucket.grantReadWrite(identity, this.prefix);
}

private grant(identity: iam.IPrincipal, permissions: string[]) {
Expand All @@ -192,6 +253,56 @@ export class Table extends cdk.Construct implements ITable {
}
}

/**
* Check there is at least one column and no duplicated column names or partition keys.
*
* @param props the TableProps
*/
function validateProps(props: TableProps): void {
if (props.bucket && (props.encryption || props.encryptionKey)) {
throw new Error('you can not specify both an encryption key and s3 bucket');
}
if (props.columns.length === 0) {
throw new Error('you must specify at least one column for the table');
}
const names = new Set<string>();
(props.columns.concat(props.partitionKeys || [])).forEach(column => {
if (names.has(column.name)) {
throw new Error(`column names and partition keys must be unique, but 'p1' is duplicated`);
}
names.add(column.name);
});
}

function parseEncryption(props: TableProps): {
encryption: s3.BucketEncryption;
encryptionKey?: kms.IEncryptionKey;
} | undefined {
if (props.encryption === undefined || props.encryption === TableEncryption.Unencrypted) {
return undefined;
}

if (props.encryption === TableEncryption.SSE_KMS) {
if (props.encryptionKey === undefined) {
return {
encryption: s3.BucketEncryption.KmsManaged
};
} else {
return {
encryption: s3.BucketEncryption.Kms,
encryptionKey: props.encryptionKey
};
}
} else {
if (props.encryptionKey) {
throw new Error('a customer managed KMS key cannot be used with SSE-S3 encryption');
}
return {
encryption: s3.BucketEncryption.S3Managed
};
}
}

const readPermissions = [
'glue:BatchDeletePartition',
'glue:BatchGetPartition',
Expand Down
Loading