Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix 932: Investigate why some shares did not go to failed state, but remained stuck or in-progress #933

Merged
merged 69 commits into from
Jan 4, 2024
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
ad4ab1f
Add Additional Error Messages for KMS Key lookup on imported dataset …
noah-paige Sep 15, 2023
dbbef3c
Get Latest in main to v2m1m0 (#771)
noah-paige Sep 19, 2023
d096160
Handle Environment Import of IAM service roles (#749)
noah-paige Sep 26, 2023
a53434f
Build Compliant Names for Opensearch Resources (#750)
noah-paige Oct 5, 2023
16c7026
Merge branch 'main' into v2m1m0
dlpzx Oct 10, 2023
c61ba15
Update Lambda runtime (#782)
nikpodsh Oct 10, 2023
f84250e
Feat: limit pivot role S3 permissions (#780)
dlpzx Oct 12, 2023
7d9122d
Fix: ensure valid environments for share request and other objects cr…
dlpzx Oct 12, 2023
1801cf1
Adding configurable session timeout to IDP (#786)
manjulaK Oct 13, 2023
599fc1a
Fix: shell true semgrep (#760)
dlpzx Oct 16, 2023
b356bf2
Fix: allow to submit a share when you are both and approver and a req…
zsaltys Oct 16, 2023
793a078
feat: redirect upon creating a share request (#799)
zsaltys Oct 16, 2023
f448613
Fix: condition when there are no public subnets (#794)
lorchda Oct 18, 2023
66b9a08
feat: removing unused variable (#815)
zsaltys Oct 18, 2023
c833c26
feat: Handle Pre-filtering of tables (#811)
anushka-singh Oct 18, 2023
6cc564e
Fix Check other share exists before clean up (#769)
noah-paige Oct 18, 2023
8b7b82e
Email Notification on Share Workflow - Issue - 734 (#818)
TejasRGitHub Oct 20, 2023
48c32e5
feat: adding frontend and backend feature flags (#817)
zsaltys Oct 25, 2023
6d727e9
Feat: Refactor notifications from core to modules (#822)
dlpzx Oct 26, 2023
8ad760b
Merge branch 'main' into v2m1m0
dlpzx Oct 27, 2023
3f100b4
Feat: pivot role limit kms (#830)
dlpzx Oct 27, 2023
fb7b61b
Make hosted_zone_id optional, code update (#812)
lorchda Oct 27, 2023
b51da2c
Clean-up for v2.1 (#843)
dlpzx Oct 30, 2023
6d3c016
Merge branch 'main' into v2m1m0
dlpzx Oct 27, 2023
7912a24
Feat: pivot role limit kms (#830)
dlpzx Oct 27, 2023
55c579b
Make hosted_zone_id optional, code update (#812)
lorchda Oct 27, 2023
92d4324
Clean-up for v2.1 (#843)
dlpzx Oct 30, 2023
5fb7cf8
feat: Enabling S3 bucket share
anushka-singh Oct 31, 2023
cf9afc1
feat: Enabling S3 bucket share
anushka-singh Oct 31, 2023
ddf8623
Merge branch 'v2m1m0' of https://github.com/anushka-singh/aws-dataall…
anushka-singh Oct 31, 2023
b54860d
fix: adding missing pivot role permission to get key policy (#845)
zsaltys Oct 31, 2023
a05e548
Merge branch 'v2m1m0' into anu-s3-copy
dlpzx Oct 31, 2023
1365e92
Revert overwrites 2.
dlpzx Oct 31, 2023
bbcfbd5
Revert overwrites 3.
dlpzx Oct 31, 2023
9e8cdf1
Revert overwrites 4.
dlpzx Oct 31, 2023
5d90797
Revert overwrites 4.
dlpzx Oct 31, 2023
94be491
Revert overwrites 5.
dlpzx Oct 31, 2023
cff577f
Revert overwrites 6.
dlpzx Oct 31, 2023
5ff80fb
Revert overwrites 7.
dlpzx Oct 31, 2023
3383166
Revert overwrites 7.
dlpzx Oct 31, 2023
7ed96af
Revert overwrites 8.
dlpzx Oct 31, 2023
c051896
Revert overwrites 9.
dlpzx Oct 31, 2023
f5d62d7
Revert overwrites 10.
dlpzx Oct 31, 2023
3783a95
Revert overwrites 11.
dlpzx Oct 31, 2023
dacba14
Revert overwrites 12.
dlpzx Oct 31, 2023
3b404cd
Revert overwrites 13.
dlpzx Oct 31, 2023
5d0fe68
Fix down revision for migration script
dlpzx Oct 31, 2023
158925a
feat: Enabling S3 bucket share
anushka-singh Nov 2, 2023
d112a21
bugfix: Enabling S3 bucket share
anushka-singh Nov 3, 2023
06edb53
feat: Enabling S3 bucket share - Addressing comments on PR
anushka-singh Nov 8, 2023
f43003c
feat: Enabling S3 bucket share
anushka-singh Nov 10, 2023
4516f4d
feat: Enabling S3 bucket share - Addressing comments on PR
anushka-singh Nov 15, 2023
0f2faf7
feat: Enabling S3 bucket share - Addressing comments on PR
anushka-singh Nov 16, 2023
9b0ab34
Merge branch 'main' into bucket_share_anushka
anushka-singh Nov 16, 2023
7ab6427
feat: Enabling S3 bucket share
anushka-singh Nov 10, 2023
e251fcc
feat: Enabling S3 bucket share - Addressing comments on PR
anushka-singh Nov 15, 2023
e8bfb4b
feat: Enabling S3 bucket share - Addressing comments on PR
anushka-singh Nov 15, 2023
2ff67bc
Merge branch 'main' of https://github.com/anushka-singh/aws-dataall i…
anushka-singh Nov 16, 2023
3254260
Update share.js
anushka-singh Nov 16, 2023
eb8bf3d
Update index.js
anushka-singh Nov 16, 2023
a06838f
Merge branch 'main' of https://github.com/anushka-singh/aws-dataall
anushka-singh Dec 18, 2023
bed3d51
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Dec 19, 2023
35b1730
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Dec 27, 2023
e9debef
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 3, 2024
2501cac
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 3, 2024
530c098
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 3, 2024
e55f2cd
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 3, 2024
102f6fb
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 3, 2024
b233396
Bugfix#932: Investigate why some shares did not go to failed state, b…
anushka-singh Jan 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -288,14 +288,18 @@ def update_state(self, session, share_uri, new_state):
return True

def update_state_single_item(self, session, share_item, new_state):
logger.info(f"Updating share item in DB {share_item.shareItemUri} status to {new_state}")
ShareObjectRepository.update_share_item_status(
session=session,
uri=share_item.shareItemUri,
status=new_state
)
self._state = new_state
return True
try:
logger.info(f"Updating share item in DB {share_item.shareItemUri} status to {new_state}")
ShareObjectRepository.update_share_item_status(
anushka-singh marked this conversation as resolved.
Show resolved Hide resolved
session=session,
uri=share_item.shareItemUri,
status=new_state
)
self._state = new_state
return True
except Exception as e:
logger.error("Could not update share item status: ", exc_info=True)
raise e

@staticmethod
def get_share_item_shared_states():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,9 @@ def trigger_table_sharing_failure_alarm(
target_environment: Environment,
):
log.info('Triggering share failure alarm...')
subject = (
f'ALARM: DATAALL Table {table.GlueTableName} Sharing Failure Notification'
)
subject = f'Data.all Share Failure for Table {table.GlueTableName}'[:100]
anushka-singh marked this conversation as resolved.
Show resolved Hide resolved
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the table {table.GlueTableName} with Lake Formation.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the table {table.GlueTableName} with Lake Formation.

Alarm Details:
- State Change: OK -> ALARM
Expand Down Expand Up @@ -51,9 +49,9 @@ def trigger_revoke_table_sharing_failure_alarm(
target_environment: Environment,
):
log.info('Triggering share failure alarm...')
subject = f'ALARM: DATAALL Table {table.GlueTableName} Revoking LF permissions Failure Notification'
subject = f'Data.all Revoke LF Permissions Failure for Table {table.GlueTableName}'[:100]
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to revoke Lake Formation permissions for table {table.GlueTableName} with Lake Formation.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to revoke Lake Formation permissions for table {table.GlueTableName} with Lake Formation.
anushka-singh marked this conversation as resolved.
Show resolved Hide resolved

Alarm Details:
- State Change: OK -> ALARM
Expand All @@ -76,11 +74,9 @@ def trigger_revoke_table_sharing_failure_alarm(

def trigger_dataset_sync_failure_alarm(self, dataset: Dataset, error: str):
log.info(f'Triggering dataset {dataset.name} tables sync failure alarm...')
subject = (
f'ALARM: DATAALL Dataset {dataset.name} Tables Sync Failure Notification'
)
subject = f'Data.all Dataset Tables Sync Failure for {dataset.name}'[:100]
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to synchronize Dataset {dataset.name} tables from AWS Glue to the Search Catalog.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to synchronize Dataset {dataset.name} tables from AWS Glue to the Search Catalog.

Alarm Details:
- State Change: OK -> ALARM
Expand All @@ -101,11 +97,9 @@ def trigger_folder_sharing_failure_alarm(
target_environment: Environment,
):
log.info('Triggering share failure alarm...')
subject = (
f'ALARM: DATAALL Folder {folder.S3Prefix} Sharing Failure Notification'
)
subject = f'Data.all Folder Share Failure for {folder.S3Prefix}'[:100]
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the folder {folder.S3Prefix} with S3 Access Point.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the folder {folder.S3Prefix} with S3 Access Point.
Alarm Details:
- State Change: OK -> ALARM
- Reason for State Change: S3 Folder sharing failure
Expand All @@ -129,11 +123,9 @@ def trigger_revoke_folder_sharing_failure_alarm(
target_environment: Environment,
):
log.info('Triggering share failure alarm...')
subject = (
f'ALARM: DATAALL Folder {folder.S3Prefix} Sharing Revoke Failure Notification'
)
subject = f'Data.all Folder Share Revoke Failure for {folder.S3Prefix}'[:100]
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the folder {folder.S3Prefix} with S3 Access Point.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to share the folder {folder.S3Prefix} with S3 Access Point.
Alarm Details:
- State Change: OK -> ALARM
- Reason for State Change: S3 Folder sharing Revoke failure
Expand Down Expand Up @@ -173,11 +165,9 @@ def handle_bucket_sharing_failure(self, bucket: DatasetBucket,
target_environment: Environment,
alarm_type: str):
log.info(f'Triggering {alarm_type} failure alarm...')
subject = (
f'ALARM: DATAALL S3 Bucket {bucket.S3BucketName} {alarm_type} Failure Notification'
)
subject = f'Data.all S3 Bucket Failure for {bucket.S3BucketName} {alarm_type}'[:100]
message = f"""
You are receiving this email because your DATAALL {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to {alarm_type} the S3 Bucket {bucket.S3BucketName}.
You are receiving this email because your Data.all {self.envname} environment in the {self.region} region has entered the ALARM state, because it failed to {alarm_type} the S3 Bucket {bucket.S3BucketName}.
Alarm Details:
- State Change: OK -> ALARM
- Reason for State Change: S3 Bucket {alarm_type} failure
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -480,17 +480,21 @@ def handle_share_failure(
-------
True if alarm published successfully
"""
logging.error(
f'Failed to share table {table.GlueTableName} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region}'
f'due to: {error}'
)
try:
logging.error(
f'Failed to share table {table.GlueTableName} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region}'
f'due to: {error}'
)

DatasetAlarmService().trigger_table_sharing_failure_alarm(
table, self.share, self.target_environment
)
return True
DatasetAlarmService().trigger_table_sharing_failure_alarm(
table, self.share, self.target_environment
)
return True
except Exception as e:
logger.error("Could not process dataset alarms: ", exc_info=True)
return False

def handle_revoke_failure(
self,
Expand All @@ -504,16 +508,20 @@ def handle_revoke_failure(
-------
True if alarm published successfully
"""
logger.error(
f'Failed to revoke S3 permissions to table {table.GlueTableName} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_table_sharing_failure_alarm(
table, self.share, self.target_environment
)
return True
try:
logger.error(
f'Failed to revoke S3 permissions to table {table.GlueTableName} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_table_sharing_failure_alarm(
table, self.share, self.target_environment
)
return True
except Exception as e:
logger.error("Could not process dataset alarms: ", exc_info=True)
return False

def glue_client(self):
return GlueClient(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -510,22 +510,27 @@ def handle_share_failure(self, error: Exception) -> None:
self.target_folder, self.share, self.target_environment
)

def handle_revoke_failure(self, error: Exception) -> None:
def handle_revoke_failure(self, error: Exception) -> bool:
"""
Handles share failure by raising an alarm to alarmsTopic
Returns
-------
True if alarm published successfully
"""
logger.error(
f'Failed to revoke S3 permissions to folder {self.s3_prefix} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_folder_sharing_failure_alarm(
self.target_folder, self.share, self.target_environment
)
try:
logger.error(
f'Failed to revoke S3 permissions to folder {self.s3_prefix} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_folder_sharing_failure_alarm(
self.target_folder, self.share, self.target_environment
)
return True
except Exception as e:
logger.error("Could not process dataset alarms: ", exc_info=True)
return False

@staticmethod
def generate_default_kms_decrypt_policy_statement(target_requester_arn):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -438,16 +438,20 @@ def handle_revoke_failure(self, error: Exception) -> bool:
-------
True if alarm published successfully
"""
logger.error(
f'Failed to revoke S3 permissions to bucket {self.bucket_name} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_folder_sharing_failure_alarm(
self.target_bucket, self.share, self.target_environment
)
return True
try:
logger.error(
f'Failed to revoke S3 permissions to bucket {self.bucket_name} '
f'from source account {self.source_environment.AwsAccountId}//{self.source_environment.region} '
f'with target account {self.target_environment.AwsAccountId}/{self.target_environment.region} '
f'due to: {error}'
)
DatasetAlarmService().trigger_revoke_s3_bucket_sharing_failure_alarm(
self.target_bucket, self.share, self.target_environment
)
return True
except Exception as e:
logger.error("Could not process dataset alarms: ", exc_info=True)
return False

@staticmethod
def generate_default_bucket_read_policy_statement(s3_bucket_name, target_requester_arn):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -161,9 +161,9 @@ def process_revoked_shares(
revoked_item_SM.update_state_single_item(session, removing_item, new_state)

except Exception as e:
removing_bucket.handle_revoke_failure(e)
new_state = revoked_item_SM.run_transition(ShareItemActions.Failure.value)
revoked_item_SM.update_state_single_item(session, removing_item, new_state)
success = False
removing_bucket.handle_revoke_failure(e)
anushka-singh marked this conversation as resolved.
Show resolved Hide resolved

return success
Loading
Loading