Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glue: Job construct does not honor SparkUIProps S3 prefix when granting S3 access #19862

Closed
raginjason opened this issue Apr 11, 2022 · 5 comments · Fixed by #26696
Closed

glue: Job construct does not honor SparkUIProps S3 prefix when granting S3 access #19862

raginjason opened this issue Apr 11, 2022 · 5 comments · Fixed by #26696
Assignees
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@raginjason
Copy link

Describe the bug

glue.Job accepts a SparkUIProps object as an argument. The job then in turn grants some S3 permissions for the bucket attribute of this object. Unfortunately, it does not take the prefix attribute of this object into account for the grant. See:

private setupSparkUI(executable: JobExecutableConfig, role: iam.IRole, props: SparkUIProps) {
if (JobType.PYTHON_SHELL === executable.type) {
throw new Error('Spark UI is not available for JobType.PYTHON_SHELL jobs');
}
const bucket = props.bucket ?? new s3.Bucket(this, 'SparkUIBucket');
bucket.grantReadWrite(role);

Expected Behavior

I expect glue.Job to include the prefix attribute of the SparkUIProps object when granting S3 permissions

Current Behavior

glue.Job grants read-write access to the entire bucket attribute of the SparkUIProps object passed to it

Reproduction Steps

from typing import cast

from aws_cdk import aws_glue as glue
from aws_cdk import aws_s3 as s3
from aws_cdk import core
from aws_cdk.assertions import Template, Match

stack = core.Stack()
code_bucket = s3.Bucket(stack, "CodeBucket")
ui_bucket = s3.Bucket(stack, "UIBucket")

spark_ui_prefix = "/foo/bar/baz"

job = glue.Job(stack, "Job",
               spark_ui=glue.SparkUIProps(enabled=True, bucket=ui_bucket, prefix=spark_ui_prefix),
               executable=glue.JobExecutable.python_etl(
                   glue_version=cast(glue.GlueVersion, glue.GlueVersion.V3_0),
                   python_version=glue.PythonVersion.THREE,
                   script=glue.Code.from_bucket(bucket=code_bucket, key="script.py")
               )

               )

template = Template.from_stack(stack)
template.has_resource_properties("AWS::IAM::Policy",
                                 Match.object_like(
                                     {
                                         "PolicyDocument": {
                                             "Statement": [
                                                 {
                                                     "Action": [
                                                         "s3:GetObject*",
                                                         "s3:GetBucket*",
                                                         "s3:List*",
                                                         "s3:DeleteObject*",
                                                         "s3:PutObject*",
                                                         "s3:Abort*"
                                                     ],
                                                     "Effect": "Allow",
                                                     "Resource": [
                                                         {
                                                             "Fn::GetAtt": [
                                                                 "UIBucketB980636D",
                                                                 "Arn"
                                                             ]
                                                         },
                                                         {
                                                             "Fn::Join": [
                                                                 "",
                                                                 [
                                                                     {
                                                                         "Fn::GetAtt": [
                                                                             "UIBucketB980636D",
                                                                             "Arn"
                                                                         ]
                                                                     },
                                                                     f"{spark_ui_prefix}*"
                                                                 ]
                                                             ]
                                                         }
                                                     ]
                                                 },
                                                 Match.any_value(),
                                             ]
                                         }
                                     }
                                 )
                                 )

Possible Solution

Change bucket.grantReadWrite(role); to bucket.grantReadWrite(role, props.prefix);

Additional Information/Context

No response

CDK CLI Version

1.148.0 (build 69a50f1)

Framework Version

No response

Node.js Version

v14.17.6

OS

OSX

Language

Python

Language Version

No response

Other information

No response

@raginjason raginjason added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 11, 2022
@github-actions github-actions bot added the @aws-cdk/aws-glue Related to AWS Glue label Apr 11, 2022
@kaizencc
Copy link
Contributor

Good call. This is a small change but will also require integ testing it to ensure that the correct permissions are applied.

@kaizencc kaizencc added p2 effort/small Small work item – less than a day of effort and removed needs-triage This issue or PR still needs to be triaged. labels Apr 12, 2022
@kaizencc kaizencc removed their assignment Apr 12, 2022
@scanlonp scanlonp self-assigned this Jun 20, 2022
@scanlonp
Copy link
Contributor

scanlonp commented Aug 9, 2023

It would be helpful to have an example script that has spark write logs into the specified path. Trivial scripts do not write to the specified s3 bucket. Otherwise, meaningful integ testing will not be easy.

@raginjason
Copy link
Author

I can try to come up with something. The included assertions are not sufficient?

@scanlonp
Copy link
Contributor

scanlonp commented Aug 10, 2023

@raginjason I have made the fix so that the Cloudformation template has the necessary permissions, just like your assertions. However, once deployed, I have not been able to get a job to actually write Spark logs to s3. If your script.py (or some version of it) is open to be shared, I would be happy to test it. Otherwise, check to see if the change works for you once it gets merged, and open another issue + assign/ping me if it does not!

@mergify mergify bot closed this as completed in #26696 Aug 10, 2023
mergify bot pushed a commit that referenced this issue Aug 10, 2023
…granting S3 access (#26696)

The `SparkUIProps.prefix` attribute of `glue.Job` now has a consistent format, is validated, and the bucket ReadWrite role is only given permissions to the folder of the bucket specified by the `prefix` if provided.

Adds a suggested format for the prefix: `/foo/bar` and throws and error if the prefix does not start with a `/` or ends without a `/`.

This may be a breaking change for users who configured their prefix in a style different than this enforces. However, I believe it is the correct standardization going forward.

Closes #19862.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-glue Related to AWS Glue bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
3 participants