-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Athena Databases and Tables #5237
Conversation
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
ARRAY = 'ARRAYTODO', | ||
/** | ||
* < primitive_type, data_type > | ||
*/ | ||
MAP = 'MAPTODO', | ||
/** | ||
* < col_name : data_type [COMMENT col_comment] [, ...] > | ||
*/ | ||
STRUCT = 'STRUCTTODO' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a better way to represent these data structures inside the schema.
|
||
/** | ||
* The name for the Table. | ||
* @default none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need '@default' for required properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't really make sense to have @default
for required properties - feel free to remove these.
this.tableName = props.tableName; | ||
|
||
const s3Policy = new PolicyStatement(); | ||
s3Policy.addActions('s3:*'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to scope this down to the read Actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you can create a PolicyStatement
in one expression:
new AwsCustomResource(this, 'CreateAthenaTable', {
// ...
policyStatements: [
new PolicyStatement({
actions: [...],
resources: [...],
}),
// ...
],
});
service: 'Athena', | ||
action: 'startQueryExecution', | ||
parameters: { | ||
QueryString: `CREATE EXTERNAL TABLE IF NOT EXISTS ${props.databaseName}.${props.tableName} (${schemaStringBuilder(props.schema)}) LOCATION 's3://${props.queryBucket.bucketName}'/`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the Comments arguments to the query
parameters: { | ||
QueryString: `CREATE EXTERNAL TABLE IF NOT EXISTS ${props.databaseName}.${props.tableName} (${schemaStringBuilder(props.schema)}) LOCATION 's3://${props.queryBucket.bucketName}'/`, | ||
ResultConfiguration: { | ||
OutputLocation: `s3://${props.queryBucket.bucketName}/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a better place for this. we really don't care about the output of this query but this field is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add resources for Workgroups, Catalogs, and (stretch-goal) Athena Connectors. Connectors will require more work than the rest of the changes combined.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
I really like that CDK is finally getting more Athena support. Thanks ❤️. 1. Deprecation of Athena's original metastoreSince Glue has been available across all regions, 2. Avro has special schema requirementsFor Apache Avro, column definitions alone don't work right now in Athena.
|
This is great feedback. I was just working on trying to understand how Glue Tables and Athena Tables were related.
On Nov 28, 2019, at 2:05 PM, करतोफ्फेलस्क्रिप्ट™ <notifications@github.com> wrote:
I really like that CDK is finally getting more Athena support. Thanks ❤️.
However, I'd like to point out a few things.
1. Deprecation of Athena's original metastore
Since Glue has been available across all regions, CREATE DATABASE & CREATE TABLE queries on Athena, actually create databases & tables on the Glue Data Catalog (instead of the metastore Athena originally launched with).
It might make sense to create Athena Database & Table classes as wrappers on top of the already existing classes in @aws-cdk/aws-glue, instead of making custom-resources for them.
More info<https://docs.aws.amazon.com/athena/latest/ug/glue-faq.html>
2. Avro has special schema requirements
For Apache Avro, column definitions alone don't work right now in Athena.
An Avro style schema also needs to be given,
* either as Table.StorageDescriptor.SerDeInfo.Parameters if creating the Table on Glue
* or, SERDEPROPERTIES in the CREATE EXTERNAL TABLE query, if creating the Table over Athena, or Spectrum.
More info<https://docs.aws.amazon.com/athena/latest/ug/avro.html>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#5237?email_source=notifications&email_token=AN4IKX3QHJBN6ZAUELFV7QDQWAI4ZA5CNFSM4JSPH5AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFNJRIY#issuecomment-559585443>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AN4IKX3UMFVR3FFSEM37XS3QWAI4ZANCNFSM4JSPH5AA>.
|
@richardhboyd I created this PR earlier today. |
@@ -68,14 +68,18 @@ | |||
"pkglint": "1.18.0" | |||
}, | |||
"dependencies": { | |||
"@aws-cdk/core": "1.18.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave the @aws-cdk/core
dependency (this module uses a bunch of things from it, like Construct
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just pushed down a line so the dependencies are in alphabetical order
const tempStrings: string[] = []; | ||
tempStrings.concat(Object.keys(schema) | ||
.map(key => `'${key}' '${schema[key]}'`)); | ||
return `(${tempStrings.join(", ")})`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very surprising way to write this code... concat
doesn't modify its receiver, it returns a new array, and also you're calling concat
on something you know is empty.
Did you mean something like this?
function schemaStringBuilder(schema: { [key: string]: DataType }): string {
return '(' +
Object.keys(schema)
.map(key => `'${key}' '${schema[key]}'`)
.join(', ')
+ ')';
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh, that's good to know. I'll make that change.
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
5 similar comments
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
Thanks so much for taking the time to contribute to the AWS CDK ❤️ We will shortly assign someone to review this pull request and help get it
|
@richardhboyd can I somehow help move this forward? Cheers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardhboyd what's the status of this PR?
Marking as "Request changes" to remove it from my to-do list.
Once all the requested changes have been addressed, and the PR is ready for another review, remember to dismiss the review. |
|
||
const gluePolicy = new PolicyStatement(); | ||
gluePolicy.addActions('glue:GetDatabase', 'glue:GetTable'); | ||
athenaPolicy.addResources(`arn:aws:glue:${Aws.REGION}:${Aws.ACCOUNT_ID}:catalog`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be gluePolicy.addResources(...
I'm closing this one due to inactivity, please comment if you want it re-opened. |
WIP for Athena Resources via AwsCustomResources
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license