-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.all should not be able to delete all Glue databases and tables in Environment account #531
Comments
Hi @jeffshep, thanks for raising the issue. Yes we are aware of this and were discussing internally how to tackle it best. We have 2 alternatives:
I would like to know your opinion on it so that we can start working on it and include it in V1.6 release (mid July) |
@dlpzx Thanks for the quick response, I would strongly advocate option 2. If I was using data.all to create resources I would expect it to delete them when I delete a dataset, my expectation would be that any data retention requirements would be my concern not data.all's. |
Thank you @jeffshep, we will try to include is as optional for v1.6 and at that state it will keep imported databases only. |
It was implemented in #524, it is now under validation |
### Feature or Bugfix - Bugfix ### Detail - When updating the dataset database handler custom resource to use a different PhysicalId to distinguish imported datasets CloudFormation forces a replacement of the resource which results in the deletion of the glue database. With this PR we add a condition to only delete databases when the custom resource physical id is prefixed by "CREATED-". - Force that Glue crawlers ccreated by data.all use the dataset IAM role as execution role. Previously a boto3 call updated this to use the pivotRole (contrary to what it was defined in CloudFormation). With this PR we make sure that the correct role is used in any existing crawler. I tested in a real AWS deployment that had previously created environments and datasets, with a crawler with the pivotRole assigned: - [X] Crawler execution role gets updated to dataset role - [X] Imported glue databases do not get deleted when the custom resource updates the physicalID ### Relates - #531 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
### Feature or Bugfix Release PR with the following list of features. Refer to each PR for the details ### Detail - #498 - #482 - #543 - #524 (which also solves #531) - #532 - #535 - #497 - #515 - #529 - #562 - #455 - #572 - #567 - #573 - #579 - #578 - #582 ### Breaking changes - release notes -⚠️ IMPORTANT: upgrade to a version >V1.5.0 before upgrading to V1.6 to avoid deletion of resources in custom resource deletion -⚠️ IMPORTANT: requires an update of environments and then datasets after upgrading. Either using cdk.json parameter `enable_update_dataall_stacks_in_cicd_pipeline`, waiting for overnight update stack task, or manually updating first environments and then datasets - CloudFront distribution replace for #529 - Additional EC2 permissions in CDK Synth CodeBuild stage for #543 --> this can be avoided by upgrading to v1.5.6 before upgrading to v1.6.0 - local development affected by more restrictive pivotRole trust policy ### Relates V1.6.0 release By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Gezim Musliaj <102723839+gmuslia@users.noreply.github.com> Co-authored-by: Noah Paige <69586985+noah-paige@users.noreply.github.com> Co-authored-by: nikpodsh <124577300+nikpodsh@users.noreply.github.com> Co-authored-by: chamcca <40579012+chamcca@users.noreply.github.com> Co-authored-by: Nikita Podshivalov <nikpodsh@amazon.com> Co-authored-by: dbalintx <132444646+dbalintx@users.noreply.github.com> Co-authored-by: mourya-33 <134511711+mourya-33@users.noreply.github.com>
Closing issue - implemented in v1.6 |
Describe the bug
A fundamental issue with data.all
Deleting a dataset, that was imported from existing resources, results in the Glue database being deleted in the AWS environment. The Glue resources are not owned by data.all, they are owned by a data pipeline in the account already, and should exist independently of whether the data is shared with data.all or not.
How to Reproduce
In an AWS account, create a Glue database and table, pointing to data in S3.
In data.all, Import a new dataset with these resources.
Wait for data.all to create the CloudFormation stack to import the dataset successfully.
Delete the dataset in data.all.
Observe the dataall-gluedb-lf-handler-UUID resource delete the Glue database and table
Expected behavior
In an AWS account, create a Glue database and table, pointing to data in S3.
In data.all, Import a new dataset with these resources.
Wait for data.all to create the CloudFormation stack to import the dataset successfully.
Delete the dataset in data.all.
The database and table should remain in the AWS account, data.all permissions should be removed.
I would expect data.all to respect existing resources and delineate between an imported dataset and a created dataset, so that when the dataset is deleted, it doesn't overstep its boundaries and delete the resources that existed before data.all was involved.
As for a technical solution, given that Glue databases created by data.all have a prefix with the environment name
dataall_DBNAME_UUID
, can logic be added to the gluedatabasecustomresource (in backend/dataall/cdkproxy/assets/) to check if the name has this format, and skip deleting the database if it does not?Your project
No response
Screenshots
No response
OS
n/a
Python version
n/a
AWS data.all version
1.5.4
Additional context
No response
The text was updated successfully, but these errors were encountered: