Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spark work group #4

Open
Avinash-1394 opened this issue Jun 16, 2023 · 1 comment
Open

Add spark work group #4

Avinash-1394 opened this issue Jun 16, 2023 · 1 comment

Comments

@Avinash-1394
Copy link

Description

Add a spark work group in the same catalog as the current work group that executes functional tests.

Additional information

Preferred name of the work group: spark
Preferred engine config: lowest possible
Preferred session timeout: 1 hour.

Note: Do not create them in multiple catalogs since Athena spark does not have multi catalog support yet.

@mattiamatrix mattiamatrix self-assigned this Jun 16, 2023
@dacort
Copy link

dacort commented Jun 16, 2023

I recently implemented this in CDK - feel free to use my IAM role as a reference:

export interface AthenaStackProps extends cdk.StackProps {
  readonly bucket: s3.IBucket;
}

export class AthenaStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props: AthenaStackProps) {
    super(scope, id, props);

    // Athena infra and resources
    const athenaSparkName = "spark-poc";

    // Athena IAM role for both SQL and Spark workgroups
    const athenaRole = new iam.Role(this, "aws-poc-athena-role", {
      assumedBy: new iam.CompositePrincipal(
        new iam.ServicePrincipal("athena.amazonaws.com"),
        new iam.AccountPrincipal(this.account).withConditions({
          StringEquals: { "aws:SourceAccount": this.account },
          ArnLike: { "aws:SourceArn": `arn:aws:athena:${this.region}:${this.account}:workgroup/${athenaSparkName}` },
        })
      ),
      managedPolicies: [
        // iam.ManagedPolicy.fromAwsManagedPolicyName("AmazonS3ReadOnlyAccess"),
      ],
      inlinePolicies: {
        athenaDefaultSparkPolicy: new iam.PolicyDocument({
          statements: [
            new iam.PolicyStatement({
              actions: ["s3:PutObject", "s3:ListBucket", "s3:DeleteObject", "s3:GetObject"],
              resources: [props.bucket.bucketArn, props.bucket.arnForObjects("*")],
            }),
            new iam.PolicyStatement({
              actions: [
                "athena:GetWorkGroup",
                "athena:TerminateSession",
                "athena:GetSession",
                "athena:GetSessionStatus",
                "athena:ListSessions",
                "athena:StartCalculationExecution",
                "athena:GetCalculationExecutionCode",
                "athena:StopCalculationExecution",
                "athena:ListCalculationExecutions",
                "athena:GetCalculationExecution",
                "athena:GetCalculationExecutionStatus",
                "athena:ListExecutors",
                "athena:ExportNotebook",
                "athena:UpdateNotebook",
              ],
              resources: [`arn:aws:athena:${this.region}:${this.account}:workgroup/${athenaSparkName}`],
            }),
            new iam.PolicyStatement({
              actions: ["logs:CreateLogStream", "logs:DescribeLogStreams", "logs:CreateLogGroup", "logs:PutLogEvents"],
              resources: [
                `arn:aws:logs:${this.region}:${this.account}:log-group:/aws-athena:*`,
                `arn:aws:logs:${this.region}:${this.account}:log-group:/aws-athena*:log-stream:*`,
              ],
            }),
            new iam.PolicyStatement({
              actions: ["logs:DescribeLogGroups"],
              resources: [`arn:aws:logs:${this.region}:${this.account}:log-group::*`],
            }),
            new iam.PolicyStatement({
              actions: ["cloudwatch:PutMetricData"],
              resources: ["*"],
              conditions: { StringEquals: { "cloudwatch:namespace": "AmazonAthenaForApacheSpark" } },
            }),
          ],
        }),
      },
    });

    // Athena Spark Workgroup
    const athenaSpark = new athena.CfnWorkGroup(this, "aws-poc-athena-spark-workgroup", {
      name: athenaSparkName,
      recursiveDeleteOption: true,
      workGroupConfiguration: {
        engineVersion: {
          selectedEngineVersion: "PySpark engine version 3",
        },
        resultConfiguration: {
          outputLocation: `s3://${props.bucket.bucketName}/athena/results/spark/`,
        },
        executionRole: athenaRole.roleArn,
      },
    });
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants