Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Athena Databases and Tables #5237

Closed
wants to merge 1 commit into from

Conversation

richardhboyd
Copy link
Contributor

WIP for Athena Resources via AwsCustomResources

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@mergify
Copy link
Contributor

mergify bot commented Nov 28, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

Comment on lines +78 to +86
ARRAY = 'ARRAYTODO',
/**
* < primitive_type, data_type >
*/
MAP = 'MAPTODO',
/**
* < col_name : data_type [COMMENT col_comment] [, ...] >
*/
STRUCT = 'STRUCTTODO'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a better way to represent these data structures inside the schema.


/**
* The name for the Table.
* @default none
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need '@default' for required properties?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't really make sense to have @default for required properties - feel free to remove these.

this.tableName = props.tableName;

const s3Policy = new PolicyStatement();
s3Policy.addActions('s3:*');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to scope this down to the read Actions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you can create a PolicyStatement in one expression:

  new AwsCustomResource(this, 'CreateAthenaTable', {
    // ...
    policyStatements: [
      new PolicyStatement({
        actions: [...],
        resources: [...],
      }),
      // ...
    ],
  });

service: 'Athena',
action: 'startQueryExecution',
parameters: {
QueryString: `CREATE EXTERNAL TABLE IF NOT EXISTS ${props.databaseName}.${props.tableName} (${schemaStringBuilder(props.schema)}) LOCATION 's3://${props.queryBucket.bucketName}'/`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the Comments arguments to the query

parameters: {
QueryString: `CREATE EXTERNAL TABLE IF NOT EXISTS ${props.databaseName}.${props.tableName} (${schemaStringBuilder(props.schema)}) LOCATION 's3://${props.queryBucket.bucketName}'/`,
ResultConfiguration: {
OutputLocation: `s3://${props.queryBucket.bucketName}/`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a better place for this. we really don't care about the output of this query but this field is required.

Copy link
Contributor Author

@richardhboyd richardhboyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add resources for Workgroups, Catalogs, and (stretch-goal) Athena Connectors. Connectors will require more work than the rest of the changes combined.

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@netroy
Copy link
Contributor

netroy commented Nov 28, 2019

I really like that CDK is finally getting more Athena support. Thanks ❤️.
However, I'd like to point out a few things.

1. Deprecation of Athena's original metastore

Since Glue has been available across all regions, CREATE DATABASE & CREATE TABLE queries on Athena, actually create databases & tables on the Glue Data Catalog (instead of the metastore Athena originally launched with).
It might make sense to create Athena Database & Table classes as wrappers on top of the already existing classes in @aws-cdk/aws-glue, instead of making custom-resources for them.

More info

2. Avro has special schema requirements

For Apache Avro, column definitions alone don't work right now in Athena.
An Avro style schema also needs to be given,

  • either as Table.StorageDescriptor.SerDeInfo.Parameters if creating the Table on Glue
  • or, SERDEPROPERTIES in the CREATE EXTERNAL TABLE query, if creating the Table over Athena, or Spectrum.

More info

@richardhboyd
Copy link
Contributor Author

richardhboyd commented Nov 28, 2019 via email

@netroy
Copy link
Contributor

netroy commented Nov 28, 2019

@richardhboyd I created this PR earlier today.
It might be relevant in this context.
Perhaps, I can have your feedback 🙂

@@ -68,14 +68,18 @@
"pkglint": "1.18.0"
},
"dependencies": {
"@aws-cdk/core": "1.18.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the @aws-cdk/core dependency (this module uses a bunch of things from it, like Construct).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just pushed down a line so the dependencies are in alphabetical order

const tempStrings: string[] = [];
tempStrings.concat(Object.keys(schema)
.map(key => `'${key}' '${schema[key]}'`));
return `(${tempStrings.join(", ")})`;
Copy link
Contributor

@skinny85 skinny85 Nov 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very surprising way to write this code... concat doesn't modify its receiver, it returns a new array, and also you're calling concat on something you know is empty.

Did you mean something like this?

function schemaStringBuilder(schema: { [key: string]: DataType }): string {
  return '(' +
    Object.keys(schema)
      .map(key => `'${key}' '${schema[key]}'`)
      .join(', ')
    + ')';
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh, that's good to know. I'll make that change.

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

5 similar comments
@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@mergify
Copy link
Contributor

mergify bot commented Dec 16, 2019

Thanks so much for taking the time to contribute to the AWS CDK ❤️

We will shortly assign someone to review this pull request and help get it
merged. In the meantime, please take a minute to make sure you follow this
checklist
:

  • PR title type(scope): text
    • type: fix, feat, refactor go into CHANGELOG, chore is hidden
    • scope: name of module without aws- or cdk- prefix or postfix (e.g. s3 instead of aws-s3-deployment)
    • text: use all lower-case, do not end with a period, do not include issue refs
  • PR Description
    • Rationale: describe rationale of change and approach taken
    • Issues: indicate issues fixed via: fixes #xxx or closes #xxx
    • Breaking?: last paragraph: BREAKING CHANGE: <describe what changed + link for details>
  • Testing
    • Unit test added. Prefer to add a new test rather than modify existing tests
    • CLI or init templates change? Re-run/add CLI integration tests
  • Documentation
    • README: update module README to describe new features
    • API docs: public APIs must be documented. Copy from official AWS docs when possible
    • Design: for significant features, follow design process

@SomayaB SomayaB added draft pr/work-in-progress This PR is a draft and needs further work. and removed draft labels Jan 7, 2020
@netroy
Copy link
Contributor

netroy commented Jan 8, 2020

@richardhboyd can I somehow help move this forward?
If there is an RFC, I could maybe send some PRs.

Cheers

Copy link
Contributor

@skinny85 skinny85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardhboyd what's the status of this PR?

Marking as "Request changes" to remove it from my to-do list.

@mergify
Copy link
Contributor

mergify bot commented Mar 25, 2020

Once all the requested changes have been addressed, and the PR is ready for another review, remember to dismiss the review.


const gluePolicy = new PolicyStatement();
gluePolicy.addActions('glue:GetDatabase', 'glue:GetTable');
athenaPolicy.addResources(`arn:aws:glue:${Aws.REGION}:${Aws.ACCOUNT_ID}:catalog`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be gluePolicy.addResources(...

@skinny85
Copy link
Contributor

I'm closing this one due to inactivity, please comment if you want it re-opened.

@skinny85 skinny85 closed this Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/work-in-progress This PR is a draft and needs further work.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants