Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stepfunctions-tasks): Support for Athena APIs: StartQueryExecution, StopQueryExeuction, GetQueryResults and GetQueryExecution #11045

Merged
merged 7 commits into from
Oct 26, 2020

Conversation

Sumeet-Badyal
Copy link
Contributor

feat(stepfunctions-tasks): support for Athena APIs: StartQueryExecution, StopQueryExeuction, GetQueryResults and GetQueryExecution

Implementation

Update package @aws-cdk/aws-stepfunctions-tasks to include support for Athena StartQueryExecution, StopQueryExeuction, GetQueryResults, GetQueryExecution API as per documentation here:
https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html

Includes support for the following Amazon Athena API calls:

  • StartQueryExecution
  • StopQueryExeuction
  • GetQueryResults
  • GetQueryExecution

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link

gitpod-io bot commented Oct 22, 2020

@mergify
Copy link
Contributor

mergify bot commented Oct 22, 2020

Title does not follow the guidelines of Conventional Commits. Please adjust title before merge.

Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sumeet-Badyal overall looks great! I have some minor suggestions and a couple of properties I think we should consider re-naming.

thanks for the high quality first cut!! I'm only about 2/3 of the way through this PR but figured I'd publish the comments so that you can iterate on them.

@shivlaks shivlaks self-assigned this Oct 23, 2020
@mergify mergify bot dismissed shivlaks’s stale review October 23, 2020 21:07

Pull request has been modified.

packages/@aws-cdk/aws-stepfunctions-tasks/README.md Outdated Show resolved Hide resolved
Comment on lines +276 to +288
/**
* Name of catalog used in query execution
*
* @default - No catalog
*/
readonly catalogName?: string;

/**
* Name of database used in query execution
*
* @default - No database
*/
readonly databaseName?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: is a query execution context valid if neither is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to provide two use cases and then hopefully that will clarify things:
relevant link https://docs.aws.amazon.com/IAM/latest/UserGuide/list_awsglue.html

  1. Start an an Athena Query that creates a table, this requires write permissions to catalog, database and table as shown in the actions table in the link above.
  2. Start an Athena Query that gets a userdefinedfunction(UDF) then runs the function. This requires read permissions to a UDF

In the table Resource types defined by AWS Glue the table shows how to scope down permissions if one has the UDF or table name which I don’t.

Basically any athena query possible must be able to run from a state machine hence the large permissions granted.

One thing I can do is that in the table Resource types defined by AWS Glue, the permissions are scoped down if the user sends an optional parameter databaseName.
i.e. currently: UDF /glue/user-defined-function/*
changed to: /glue/user-defined-function/DATABASENAME/*
or if no database name provided /glue/user-defined-function/default/*

This will prevent cross database access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for clarifying. I think we should seek to simplify this. it's unfortunate we don't have L2 constructs for Athena yet, as grant* APIs would/should be leveraged here.

let's introduce any changes that will ensure least privilege for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe those changes have been included in the commit found at these two links:

https://github.com/aws/aws-cdk/pull/11045/files#diff-edd80b0b061b6b337000af6848daeda4011822de72da5749dbcd430c8089bc37R149
https://github.com/aws/aws-cdk/pull/11045/files#diff-edd80b0b061b6b337000af6848daeda4011822de72da5749dbcd430c8089bc37R154

Granting access to all tables and user defined functions in the specified database or the default database if no database has been included in the request.

@mergify mergify bot dismissed shivlaks’s stale review October 26, 2020 19:20

Pull request has been modified.

Copy link
Contributor

@shivlaks shivlaks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @Sumeet-Badyal !!

@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: bd20429
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

's3:ListBucketMultipartUploads',
's3:ListMultipartUploadParts',
's3:PutObject'],
resources: [this.props.resultConfiguration?.outputLocation ?? '*'], // Need S3 location where data is stored https://docs.aws.amazon.com/athena/latest/ug/security-iam-athena.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these policy resources expect ARN, passing outputLocation (S3 URL like s3://...) will get an error.
Even if outputLocation is specified to props, we may have to pass * to resources or parse S3 URL to get ARN correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting this. I parsed the s3 bucket correctly like you suggested but still got an error Unable to verify/create output bucket. When I reran the exact same state machine but with wildcard permissions the machine succeeded and sent the logs to the correct output location. I will be submitting a fix immediately granting wildcard permissions for S3 and LakeFormation

mergify bot pushed a commit that referenced this pull request Nov 2, 2020
…ration patterns (#11188)

The changes made by #11045 seem to support `WAIT_FOR_TASK_TOKEN (.waitForTaskToken)` but according to the documentation, only `Request Response` and `Run a job (.sync)` are supported: https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html

closes #11246

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mergify bot pushed a commit that referenced this pull request Nov 3, 2020
…ryExecution (#11203)

The changes made by #11045 grant S3 scoped permissions if the optional parameter  output location is passed. The output location is not parsed correctly to be attached as a resource. When the output location is correctly parsed, state machines with a valid definition and a valid S3 bucket still fail due to an `Unable to verify/create output bucket` error. The exact same state machine and S3 bucket pass with wildcard permissions as such the resource for Start Query Execution must be changed to `*`.

BREAKING CHANGE: type of `outputLocation` in the experimental Athena `StartQueryExecution` has been changed to `s3.Location` from `string`

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants