Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Redshift Spectrum / External Schema Data Catalog resource #32

Open
bitzstein opened this issue Jul 22, 2019 · 4 comments
Open

Comments

@bitzstein
Copy link

bitzstein commented Jul 22, 2019

I need support for the Redshift Spectrum external schema, specifically backed by data in S3 and a database in the AWS Glue Data Catalog.

I have written the resource code in the existing stub redshift/resource_redshift_external_schema_data_catalog.go, for the simplest use case:

  • database_name references database in AWS Glue Data Catalog
  • iam_role ARN with appropriate S3 and AWS Glue Data Catalog access

I based the code off the existing Redshift Schema resource, and took inspiration from the Postgres provider for the use of errwrap.

You can check it out here: https://github.com/bitzstein/terraform-provider-redshift/tree/feature/external-schema-data-catalog

It is working for me for configs built from scratch, as well as via terraform import using esoid in SVV_EXTERNAL_SCHEMAS table.

I wanted to see if there's interest in using my implementation of this resource, and any changes that would be needed to get it merged.

NOTE: I had trouble building the provider with the relative path ./redshift in main.go so also
needed to change that, for go build to work. See bitzstein@d88770c

@bitzstein
Copy link
Author

bitzstein commented Jul 22, 2019

Here is a an example config which requires a redshift cluster to already exist, with an associated IAM role called redshift-parent-role. Make sure you update local.s3_resources if you want to create real external tables (see https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html).

locals {
  database_name = "testdb"
  schema_name = "ext_testdb"
  s3_resources = "\"arn:aws:s3:::any_s3_repo\",\"arn:aws:s3:::any_s3_repo/some_prefix_*\""
  redshift_parent_role_name = "redshift-parent-role"
}

data "aws_region" "current" {}

# A role with "sts:AssumeRole" trust relationship with "redshift.amazonaws.com".
# It must also have been added to the IAM roles for the Redshift cluster.
# See: https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html
data "aws_iam_role" "redshift_parent_role" {
  name = "${local.redshift_parent_role_name}"
}

resource "aws_glue_catalog_database" "catalog_database" {
  name = "${local.database_name}"
  description = "Redshift Spectrum external schema ${local.schema_name}"
}

resource "aws_iam_role" "redshift_spectrum_role" {
  name = "redshift-spectrum-${local.schema_name}-role"
  description = "Allows ${data.aws_iam_role.redshift_parent_role.id} to access S3 and Data Catalog for external schema ${local.schema_name}"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "AWS": "${data.aws_iam_role.redshift_parent_role.arn}"
      },
      "Effect": "Allow"
    }
  ]
}
EOF
}

# Policy restricted to necessary S3 and AWS Glue Data Catalog resources.
# See: https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-iam-policies.html#spectrum-iam-policies-minimum-permissions
resource "aws_iam_role_policy" "redshift_spectrum_policy" {
  name = "redshift-spectrum-${local.schema_name}-policy"
  role = "${aws_iam_role.redshift_spectrum_role.id}"

  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListMultipartUploadParts",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads"
            ],
            "Resource": [${local.s3_resources}]
 },
        {
            "Effect": "Allow",
            "Action": [
                "glue:CreateDatabase",
                "glue:DeleteDatabase",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:UpdateDatabase",
                "glue:CreateTable",
                "glue:DeleteTable",
                "glue:BatchDeleteTable",
                "glue:UpdateTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:BatchCreatePartition",
                "glue:CreatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:UpdatePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition"
            ],
            "Resource": [
                "${format("arn:aws:glue:%s:%s:catalog", data.aws_region.current.name, aws_glue_catalog_database.catalog_database.catalog_id)}",
                "${format("arn:aws:glue:%s:%s:database/%s", data.aws_region.current.name, aws_glue_catalog_database.catalog_database.catalog_id, aws_glue_catalog_database.catalog_database.name)}",
                "${format("arn:aws:glue:%s:%s:table/%s/*", data.aws_region.current.name, aws_glue_catalog_database.catalog_database.catalog_id, aws_glue_catalog_database.catalog_database.name)}"
            ]
        }
    ]
}
EOF
}

# NOTE: A maximum of 10 policies can be attached to redshift_parent_role
resource "aws_iam_role_policy" "redshift_spectrum_role_chain" {
  name = "redshift-chain-spectrum-${local.schema_name}-policy"
  role = "${data.aws_iam_role.redshift_parent_role.id}"

  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sts:AssumeRole"
            ],
            "Resource": [ "${aws_iam_role.redshift_spectrum_role.arn}" ]
        }
    ]
}
EOF
}

resource "redshift_external_schema_data_catalog" "ext_schema" {
  schema_name = "${local.schema_name}"
  database_name = "${aws_glue_catalog_database.catalog_database.name}"

  # Chained IAM roles
  # See: https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html#authorizing-redshift-service-chaining-roles
  iam_role = "${data.aws_iam_role.redshift_parent_role.arn},${aws_iam_role.redshift_spectrum_role.arn}"

  # Defaults to the redshift provider user
  # owner = "${redshift_user.a_role.id}"
}

As part of this investigation, I have found that CASCADE does nothing without DELETE EXTERNAL DATABASE also being specified. I'm not interested in supporting the CASCADE, DELETE EXTERNAL DATABASE and CREATE EXTERNAL DATABASE IF NOT EXISTS options, because those can be managed independently in Terraform using the aws_glue_catalog_database resource. Hence I will remove the cascade_on_delete parameter.

See for more info: https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_SCHEMA.html

@frankfarrell
Copy link
Owner

That looks good to me @bitzstein , thanks for doing that
Please open a PR, to fullly understand what's involved

@macleodmac
Copy link

Hi, is there any progress in integrating this? I'm keen to see this feature!

@frankfarrell
Copy link
Owner

Hi there, I haven't looked at it and to be honest I haven't been actively working on this repository recently. But if you want to do a PR, I will happily review and merge it @macleodmac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants