-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stepfunctions-tasks): Support for Athena APIs: StartQueryExecution, StopQueryExeuction, GetQueryResults and GetQueryExecution #11045
Conversation
Title does not follow the guidelines of Conventional Commits. Please adjust title before merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sumeet-Badyal overall looks great! I have some minor suggestions and a couple of properties I think we should consider re-naming.
thanks for the high quality first cut!! I'm only about 2/3 of the way through this PR but figured I'd publish the comments so that you can iterate on them.
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/get-query-results.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/get-query-execution.ts
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/get-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
Pull request has been modified.
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/get-query-execution.ts
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Outdated
Show resolved
Hide resolved
/** | ||
* Name of catalog used in query execution | ||
* | ||
* @default - No catalog | ||
*/ | ||
readonly catalogName?: string; | ||
|
||
/** | ||
* Name of database used in query execution | ||
* | ||
* @default - No database | ||
*/ | ||
readonly databaseName?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: is a query execution context valid if neither is present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to provide two use cases and then hopefully that will clarify things:
relevant link https://docs.aws.amazon.com/IAM/latest/UserGuide/list_awsglue.html
- Start an an Athena Query that creates a table, this requires write permissions to catalog, database and table as shown in the actions table in the link above.
- Start an Athena Query that gets a userdefinedfunction(UDF) then runs the function. This requires read permissions to a UDF
In the table Resource types defined by AWS Glue the table shows how to scope down permissions if one has the UDF or table name which I don’t.
Basically any athena query possible must be able to run from a state machine hence the large permissions granted.
One thing I can do is that in the table Resource types defined by AWS Glue, the permissions are scoped down if the user sends an optional parameter databaseName.
i.e. currently: UDF /glue/user-defined-function/*
changed to: /glue/user-defined-function/DATABASENAME/*
or if no database name provided /glue/user-defined-function/default/*
This will prevent cross database access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for clarifying. I think we should seek to simplify this. it's unfortunate we don't have L2 constructs for Athena yet, as grant*
APIs would/should be leveraged here.
let's introduce any changes that will ensure least privilege for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe those changes have been included in the commit found at these two links:
https://github.com/aws/aws-cdk/pull/11045/files#diff-edd80b0b061b6b337000af6848daeda4011822de72da5749dbcd430c8089bc37R149
https://github.com/aws/aws-cdk/pull/11045/files#diff-edd80b0b061b6b337000af6848daeda4011822de72da5749dbcd430c8089bc37R154
Granting access to all tables and user defined functions in the specified database or the default database if no database has been included in the request.
packages/@aws-cdk/aws-stepfunctions-tasks/lib/athena/start-query-execution.ts
Show resolved
Hide resolved
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work @Sumeet-Badyal !!
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
's3:ListBucketMultipartUploads', | ||
's3:ListMultipartUploadParts', | ||
's3:PutObject'], | ||
resources: [this.props.resultConfiguration?.outputLocation ?? '*'], // Need S3 location where data is stored https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these policy resources expect ARN, passing outputLocation
(S3 URL like s3://...
) will get an error.
Even if outputLocation
is specified to props, we may have to pass *
to resources or parse S3 URL to get ARN correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for spotting this. I parsed the s3 bucket correctly like you suggested but still got an error Unable to verify/create output bucket
. When I reran the exact same state machine but with wildcard permissions the machine succeeded and sent the logs to the correct output location. I will be submitting a fix immediately granting wildcard permissions for S3 and LakeFormation
…ration patterns (#11188) The changes made by #11045 seem to support `WAIT_FOR_TASK_TOKEN (.waitForTaskToken)` but according to the documentation, only `Request Response` and `Run a job (.sync)` are supported: https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html closes #11246 ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
…ryExecution (#11203) The changes made by #11045 grant S3 scoped permissions if the optional parameter output location is passed. The output location is not parsed correctly to be attached as a resource. When the output location is correctly parsed, state machines with a valid definition and a valid S3 bucket still fail due to an `Unable to verify/create output bucket` error. The exact same state machine and S3 bucket pass with wildcard permissions as such the resource for Start Query Execution must be changed to `*`. BREAKING CHANGE: type of `outputLocation` in the experimental Athena `StartQueryExecution` has been changed to `s3.Location` from `string` ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
feat(stepfunctions-tasks): support for Athena APIs: StartQueryExecution, StopQueryExeuction, GetQueryResults and GetQueryExecution
Implementation
Update package
@aws-cdk/aws-stepfunctions-tasks
to include support for Athena StartQueryExecution, StopQueryExeuction, GetQueryResults, GetQueryExecution API as per documentation here:https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html
Includes support for the following Amazon Athena API calls:
StartQueryExecution
StopQueryExeuction
GetQueryResults
GetQueryExecution
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license