Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added missing s3_settings attributes #11576

Closed
wants to merge 5 commits into from
Closed

Conversation

lyle-nel
Copy link

Extend DMS S3 settings to support updated API changes in aws-sdk-go. This is so that we can avoid using extra_connection_attributes.

Full list of supported attributes for s3_settings were found here: models/apis/dms/2016-01-01/api-2.json

Community Note

  • Please vote on this pull request by adding a 👍 reaction to the original pull request comment to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for pull request followers and do not help prioritize the request

Relates OR Closes #8000 #8009

Release note for CHANGELOG:

ENHANCEMENTS:

  • resource/aws_dms_endpoint: Added missing s3_settings attributes ServiceAccessRoleArn, ExternalTableDefinition, CsvRowDelimiter, CsvDelimiter, BucketFolder, BucketName, CompressionType, TimestampColumnName, DataFormat, ParquetVersion, EncryptionMode, ServerSideEncryptionKmsKeyId

Output from acceptance testing:

$ make testacc TEST=./aws TESTARGS='-run=TestAccAwsDmsEndpoint_S3'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./aws -v -count 1 -parallel 20 -run=TestAccAwsDmsEndpoint_S3 -timeout 120m
=== RUN   TestAccAwsDmsEndpoint_S3_Csv
=== PAUSE TestAccAwsDmsEndpoint_S3_Csv
=== RUN   TestAccAwsDmsEndpoint_S3_Parquet
=== PAUSE TestAccAwsDmsEndpoint_S3_Parquet
=== CONT  TestAccAwsDmsEndpoint_S3_Csv
=== CONT  TestAccAwsDmsEndpoint_S3_Parquet
--- PASS: TestAccAwsDmsEndpoint_S3_Csv (291.42s)
--- PASS: TestAccAwsDmsEndpoint_S3_Parquet (365.29s)
PASS
ok  	github.com/terraform-providers/terraform-provider-aws/aws	365.311s
...

…ed API changes in aws-sdk-go. This is so that we can avoid using extra_connection_attributes.
@lyle-nel lyle-nel requested a review from a team January 13, 2020 06:27
@ghost ghost added size/L Managed by automation to categorize the size of a PR. needs-triage Waiting for first response or review from a maintainer. service/databasemigrationservice tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure. labels Jan 13, 2020
@lyle-nel
Copy link
Author

Hi Folks, is there something that is left to do or can we merge this?

Note that even though the size estimation label says "size/L", it is actually a trivial change. The LOC is high due to the number of additional attributes that were added.

@gerovermaas
Copy link

gerovermaas commented Jan 14, 2020

Hi @lyle-nel,

I was happy to see this PR since we were also facing this issue.

I tested it, but do run into a problem. I have defined the end point as:

resource "aws_dms_endpoint" "s3_target" {
  endpoint_id   = "s3-target-endpoint"
  endpoint_type = "target"
  engine_name   = "s3"

  extra_connection_attributes = "includeOpForFullLoad=true"

  s3_settings {
    service_access_role_arn = aws_iam_role.s3_target.arn
    bucket_folder           = aws_db_instance.mysql.name
    bucket_name             = aws_s3_bucket.target.id
    # data_format             = "parquet"
    # timestamp_column_name   = "updateTS"

  }
}

This provisions fine with the 2.44.0 version of the AWS provider, but when I provision it with your version of the AWS provider, testing of the endpoint by AWS fails with the message: "Error Details: [message=Exception while converting from replicate model object Test endpoint, errType=, status=0, errMessage=, errDetails=]"

Enabling the the commented lines (data_format, timestamp_column_name) does not change it, fails with same message. (I disabled this, since I wanted to verify it still worked without the extract attributes added by this PR)

When doing a aws dms describe-endpoints and compare the differences between the endpoint definitions we see that the PR version of the ws provider adds EncryptionMode and TimestampColumnName attributes both with an empty value. Does that give a clue?:

Using PR version of aws provider:

     {
            "EndpointIdentifier": "s3-target-endpoint",
            "EndpointType": "TARGET",
            "EngineName": "s3",
            "EngineDisplayName": "Amazon S3",
            "ExtraConnectionAttributes": "bucketFolder=mydb;bucketName=dms-poc-target;compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\\n;",
            "Status": "active",
            "EndpointArn": "arn:aws:dms:eu-west-1:xxxxxxxxxxxxx:endpoint:U5YWWN2ANAIPSVHFYT3EXIA6DU",
            "SslMode": "none",
            "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/dms_test_s3_target_role",
            "ExternalTableDefinition": "",
            "S3Settings": {
                "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/dms_test_s3_target_role",
                "ExternalTableDefinition": "",
                "CsvRowDelimiter": "\\n",
                "CsvDelimiter": ",",
                "BucketFolder": "mydb",
                "BucketName": "dms-poc-target",
                "CompressionType": "NONE",
                "EncryptionMode": "",
                "EnableStatistics": true,
                "TimestampColumnName": ""
            }
        }

Using 2.44.0 version:

        {
            "EndpointIdentifier": "s3-target-endpoint",
            "EndpointType": "TARGET",
            "EngineName": "s3",
            "EngineDisplayName": "Amazon S3",
            "ExtraConnectionAttributes": "bucketFolder=mydb;bucketName=dms-poc-target;compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\\n;",
            "Status": "active",
            "EndpointArn": "arn:aws:dms:eu-west-1:xxxxxxxxxxxxx:endpoint:B4SPRUAZ6YTT2LRP76NMKZLVKM",
            "SslMode": "none",
            "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/dms_test_s3_target_role",
            "ExternalTableDefinition": "",
            "S3Settings": {
                "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxxxxxxx:role/dms_test_s3_target_role",
                "ExternalTableDefinition": "",
                "CsvRowDelimiter": "\\n",
                "CsvDelimiter": ",",
                "BucketFolder": "mydb",
                "BucketName": "dms-poc-target",
                "CompressionType": "NONE",
                "EnableStatistics": true
            }
        }

Using Terraform 0.12.19

Am I missing something?

Regards,
Gero

@lyle-nel
Copy link
Author

Hi @gerovermaas, I will look into this tomorrow morning (UTC+2) and get back to you as soon as possible.

@lyle-nel
Copy link
Author

@gerovermaas Thanks I am busy looking into why these additional attributes are showing up. Even in the 2.44.0 version in your example the EnableStatistics attribute is showing up even though that should only happen in the case of writing to parquet.

I have tracked down one potential avenue where the defaults are set to empty strings, but taking that away does not have an effect on the outcome.

I will keep you updated if I learn anything useful. For now, I just need to exclude the possibility that it is anything terraform related, then I can move on to seeing if the issue lies with the aws-go-sdk

@gerovermaas
Copy link

@lyle-nel Any progress or update on this PR?

@lyle-nel
Copy link
Author

@gerovermaas I am actively working on this. I don't have much of an update besides that I have written some integration tests in the aws provider that exposes the issue. I am currently writing some tests for the schema helpers in terraform core. It is very unlikely that the issue is there, but I just need to exclude that possibility.

For the sake of transparency, I will summarise the symptom:

Consider the following resource schema:

Schema: map[string]*schema.Schema{
	"s3_settings": {
		Type:     schema.TypeList,
		Optional: true,
		MaxItems: 1,
		DiffSuppressFunc: func(k, old, new string, d *schema.ResourceData) bool {
			if old == "1" && new == "0" {
				return true
			}
			return false
		},
		Elem: &schema.Resource{
			Schema: map[string]*schema.Schema{

				"compression_type": {
					Type:     schema.TypeString,
					Optional: true,
					Default:  "NONE",
				},
				"timestamp_column_name": {
					Type:        schema.TypeString,
					Optional:    true,
				}
			},
		},
	},
},

It is clear from this that compression_type is optional with a default value of "None" so we should expect to find that default value in the endpoint configuration. It is also clear that timestamp_column_name has no default defined, so it should be omitted entirely from the endpoint configuration. However, what we find is that timestamp_column_name is not omitted as expected.

@lyle-nel
Copy link
Author

lyle-nel commented Jan 21, 2020

TLDR; I need some help with fixing one of the test statements to get this over the finish line.

@gerovermaas I managed to track down the issue. My tests led me astray, so the solution is pretty straightforward. The main issue I am faced with now is to get the acceptance test to pass. I know the changes work because when I view the response object, only the desired fields are set.

I am using resource_aws_elastic_beanstalk_application.go and resource_aws_elastic_beanstalk_application_test.go as a reference implementation that uses the TestCheckNoResourceAttr testing function.

Test output:

==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./aws -v -count 1 -parallel 20 -run=TestAccAwsDmsEndpoint_S3 -timeout 120m
=== RUN   TestAccAwsDmsEndpoint_S3_Csv
=== PAUSE TestAccAwsDmsEndpoint_S3_Csv
=== RUN   TestAccAwsDmsEndpoint_S3_Parquet
=== PAUSE TestAccAwsDmsEndpoint_S3_Parquet
=== CONT  TestAccAwsDmsEndpoint_S3_Csv
=== CONT  TestAccAwsDmsEndpoint_S3_Parquet
--- FAIL: TestAccAwsDmsEndpoint_S3_Csv (52.02s)
    testing.go:640: Step 0 error: Check failed: Check 10/10 error: aws_dms_endpoint.dms_endpoint: Attribute 's3_settings.0.timestamp_column_name' found when not expected
--- FAIL: TestAccAwsDmsEndpoint_S3_Parquet (69.46s)
    testing.go:640: Step 0 error: Check failed: Check 7/11 error: aws_dms_endpoint.dms_endpoint: Attribute 's3_settings.0.timestamp_column_name' found when not expected
FAIL
FAIL    github.com/terraform-providers/terraform-provider-aws/aws       69.496s
FAIL
GNUmakefile:24: recipe for target 'testacc' failed

The response object from the create endpoint routine is as follows:

[DEBUG] DMS response from create endpoint: {
  Endpoints: [{
      EndpointArn: "arn:aws:dms:us-west-2:766914826347:endpoint:GKM7KR26IWYY7SWS6PBZORWYMA",
      EndpointIdentifier: "tf-test-dms-endpoint-i3kxqf9j-s3",
      EndpointType: "TARGET",
      EngineDisplayName: "Amazon S3",
      EngineName: "s3",
      ExtraConnectionAttributes: "bucketName=bucket_name;compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\\n;",
      S3Settings: {
        BucketFolder: "",
        BucketName: "bucket_name",
        CompressionType: "NONE",
        CsvDelimiter: ",",
        CsvRowDelimiter: "\\n",
        EnableStatistics: true,
        ServiceAccessRoleArn: "arn:aws:iam::766914826347:role/tf-test-iam-s3-role-i3kxqf9j-s3"
      },
      ServiceAccessRoleArn: "arn:aws:iam::766914826347:role/tf-test-iam-s3-role-i3kxqf9j-s3",
      SslMode: "none",
      Status: "active"
    }]
}

@gerovermaas
Copy link

Great to read @lyle-nel !

@ghost ghost added size/XL Managed by automation to categorize the size of a PR. and removed size/L Managed by automation to categorize the size of a PR. labels Jan 22, 2020
@lyle-nel lyle-nel force-pushed the master branch 2 times, most recently from 12c80af to 8a84842 Compare January 22, 2020 11:39
@lyle-nel
Copy link
Author

lyle-nel commented Jan 22, 2020

I am using resource_aws_elastic_beanstalk_application_test.go as a reference, but I can't get this test to pass even though the response object clearly shows these attributes are not set. I must be missing something obvious, so I need someone more experienced in this to guide me in the right direction:

https://github.com/terraform-providers/terraform-provider-aws/blob/8a84842b721b054e74af6648f62d0b84c3bd2d59/aws/resource_aws_dms_endpoint_test.go#L121-L125

@gerovermaas
Copy link

I have tested the version and can confirm that it now provision correct and I'm able to generate parquet files. Used the following resource definition:

resource "aws_dms_endpoint" "s3_target" {
  endpoint_id   = "s3-target-endpoint"
  endpoint_type = "target"
  engine_name   = "s3"

  extra_connection_attributes = "includeOpForFullLoad=true"

  s3_settings {
    service_access_role_arn = aws_iam_role.s3_target.arn
    bucket_folder           = "somefolder"
    bucket_name             = aws_s3_bucket.target.id
    data_format             = "parquet"
    timestamp_column_name   = "updateTS"
  }
}

And aws dms describe-endpoints shows:

        {
            "EndpointIdentifier": "s3-target-endpoint",
            "EndpointType": "TARGET",
            "EngineName": "s3",
            "EngineDisplayName": "Amazon S3",
            "ExtraConnectionAttributes": "bucketFolder=somefolder;bucketName=dms-poc-target2;compressionType=NONE;csvDelimiter=,;csvRowDelimiter=\\n;",
            "Status": "active",
            "EndpointArn": "arn:aws:dms:eu-west-2:627871771344:endpoint:ZCD2MXHTZVRTJUH2YSNWVGQ4JA",
            "SslMode": "none",
            "ServiceAccessRoleArn": "arn:aws:iam::627871771344:role/dms_test_s3_target_role",
            "S3Settings": {
                "ServiceAccessRoleArn": "arn:aws:iam::627871771344:role/dms_test_s3_target_role",
                "CsvRowDelimiter": "\\n",
                "CsvDelimiter": ",",
                "BucketFolder": "somefolder",
                "BucketName": "dms-poc-target2",
                "CompressionType": "NONE",
                "DataFormat": "parquet",
                "EnableStatistics": true
            }
        },

Would be great if this PR can be merged, it would help us a lot!

@hagridaaron
Copy link

This would also help me out quite a bit. Thank you so much @lyle-nel for all the hard work!

@gerovermaas
Copy link

We have been doing some more testing and although the dataFormat=parquet argument is now properly processed, we noticed that timestamp_column_name in the s3_settings is still not effectuated. We used this resource definition:

resource "aws_dms_endpoint" "s3_target" {
  endpoint_id   = "s3-target-endpoint"
  endpoint_type = "target"
  engine_name   = "s3"

  s3_settings {
    service_access_role_arn = aws_iam_role.s3_target.arn
    bucket_folder           = "dms"
    bucket_name             = aws_s3_bucket.target.id
    data_format             = "parquet"
    timestamp_column_name   = "updateTS"
    compression_type = "GZIP"
  }
}

And aws dms describe-endpoints shows:

       {
            "EndpointIdentifier": "s3-target-endpoint",
            "EndpointType": "TARGET",
            "EngineName": "s3",
            "EngineDisplayName": "Amazon S3",
            "ExtraConnectionAttributes": "bucketFolder=dms;bucketName=dms-poc-target2;compressionType=GZIP;csvDelimiter=,;csvRowDelimiter=\\n;",
            "Status": "active",
            "EndpointArn": "arn:aws:dms:eu-west-2:xxxxxxxxx:endpoint:LCRRXDBE5PAJNUBGNT35ITUVNU",
            "SslMode": "none",
            "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxxx:role/dms_test_s3_target_role",
            "S3Settings": {
                "ServiceAccessRoleArn": "arn:aws:iam::xxxxxxxx:role/dms_test_s3_target_role",
                "CsvRowDelimiter": "\\n",
                "CsvDelimiter": ",",
                "BucketFolder": "dms",
                "BucketName": "dms-poc-target2",
                "CompressionType": "GZIP",
                "DataFormat": "parquet",
                "EnableStatistics": true
            }
        },

When we do an additional terraform apply with extra_connection_attributes = "TimestampColumnName=updateTSs" included in the resource, then the describe shows:

        {
            "EndpointIdentifier": "s3-target-endpoint",
            "EndpointType": "TARGET",
            "EngineName": "s3",
            "EngineDisplayName": "Amazon S3",
            "ExtraConnectionAttributes": "compressionType=none;csvDelimiter=,;csvRowDelimiter=\\n;maxFileSize=1024000000;TimestampColumnName=updateTSs;",
            "Status": "active",
            "EndpointArn": "arn:aws:dms:eu-west-2:627871771344:endpoint:LCRRXDBE5PAJNUBGNT35ITUVNU",
            "SslMode": "none",
            "ServiceAccessRoleArn": "arn:aws:iam::627871771344:role/dms_test_s3_target_role",
            "S3Settings": {
                "ServiceAccessRoleArn": "arn:aws:iam::627871771344:role/dms_test_s3_target_role",
                "CsvRowDelimiter": "\\n",
                "CsvDelimiter": ",",
                "BucketFolder": "",
                "BucketName": "",
                "CompressionType": "none",
                "DataFormat": "parquet",
                "EnableStatistics": true
            }
        },

Is this something you can look into @lyle-nel ?

…ss it has a default or is explicitly populated.
@lyle-nel
Copy link
Author

@gerovermaas Yes, I know what the problem is.

My sincere apologies for this folks, once I understand what I am doing wrong with this one test, the test would expose issues such as this. I am hoping to figure out how this part needs to work soon, but some guidance from someone with experience in developing providers would go a long way. In the meanwhile, I have pushed the fix.

@gerovermaas , thanks for your valuable feedback and continued patience.

@gerovermaas
Copy link

Thank @lyle-nel ! I tested the new version and now the timestamp_column_name is properly handled. Great!

Used this definition and I'm happy we do not need the extra_connection_attributes anymore:

resource "aws_dms_endpoint" "s3_target" {
  endpoint_id   = "s3-target-endpoint"
  endpoint_type = "target"
  engine_name   = "s3"

  s3_settings {
    service_access_role_arn = aws_iam_role.s3_target.arn
    bucket_folder           = "dms"
    bucket_name             = aws_s3_bucket.target.id
    data_format             = "parquet"
    timestamp_column_name   = "updateTS"
    compression_type        = "GZIP"
  }
}```

@lyle-nel
Copy link
Author

@bflad I see that you are one of the more recent contributors to this component. I was hoping you could have a quick look at this? Am I missing something very obvious that is causing the terraform state to contain the optional attributes even though they are not being set?

@lyle-nel
Copy link
Author

I am closing this pull request until I can figure out how to fix the test.

@lyle-nel lyle-nel closed this Mar 18, 2020
@ghost
Copy link

ghost commented Apr 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 18, 2020
@breathingdust breathingdust removed the needs-triage Waiting for first response or review from a maintainer. label Sep 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
size/XL Managed by automation to categorize the size of a PR. tests PRs: expanded test coverage. Issues: expanded coverage, enhancements to test infrastructure.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DMS endpoint s3 target to support extra_connection_attributes
4 participants