Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and import data set stack failure #423

Closed
mvidhu opened this issue Apr 20, 2023 · 2 comments
Closed

Create and import data set stack failure #423

mvidhu opened this issue Apr 20, 2023 · 2 comments
Labels
status: can't reproduce This doesn't seem right status: needs more info type: bug Something isn't working

Comments

@mvidhu
Copy link

mvidhu commented Apr 20, 2023

Describe the bug

Data set creation and import stack is failing in cloudformation and is always getting rolled back from past few days. This issue started recently and there is no change in the dataall version on our organization from past 5 months. Hence we are not able to find the root cause of the issue.
Error in cloudformation stack occurs at the creation of crawler and error message is
S3 bucket dataall-<> does not exist. (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: dedb0351-c13a-4f6e-b84d-dea3cf0db051; Proxy: null)
We are using cdk version 14

Error in cloudformation while importing dataset:
Failure in creation of dataallDatasetDatabase

CREATE_FAILED Received response status [FAILED] from custom resource. Message returned: Error: Could not create Glue Database dataall_<> in aws://<>/<>, received An error occurred (AccessDeniedException) when calling the CreateDatabase operation: Insufficient Lake Formation permission(s) on s3://<>/ Logs: /aws/lambda/dataall-gluedb-handler-m6up1tqu at invokeUserFunction (/var/task/framework.js:2:6) at processTicksAndRejections (internal/process/task_queues.js:97:5) at async onEvent (/var/task/framework.js:1:302) at async Runtime.handler (/var/task/cfn-response.js:1:1474) (RequestId: 58ef82a8-80a8-41f3-b633-28c71896598c)

How to Reproduce

Bootstrap a aws account as environment in data.all
Create a data set in the environment.
Stack creation is failing and is in ROLLBACK_COMPLETE state.
Import existing bucket in the environment.
Stack creation is in ROLLBACK_FAILED state

Expected behavior

Create and import should be successful.

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.11

AWS data.all version

0.5.0

Additional context

No response

@dlpzx dlpzx added type: bug Something isn't working status: needs more info labels Apr 20, 2023
@dlpzx
Copy link
Contributor

dlpzx commented Apr 20, 2023

Hi @mvidhu :) Thanks for opening the issue. If I understand correctly you actually have 2 issues:

  • creating of Dataset = failure on S3 Bucket not found. We also encountered this issue a couple of months ago because of a change on Glue crawler creation. We fixed it in Add dependency in dataset stack #385 by adding a dependency clause in the stack
  • importing a Dataset = insufficient LF permissions in Glue database creation. I have tried recreating your issue in the latest version and the error does not appear. I would suggest you to upgrade to newer versions to fix this issue and the other one. If you want to debug the issue, here are some details on the failure. As part of the creation/import of datasets we create a Glue database and grant permissions in LF to it using a CloudFormation custom resource (a Lambda that is executed when the stack is created). I suggest you to take a look at LakeFormation and: 1) In the LakeFormation console, check the Data lake administrators. Make sure that the dataallPivotRole is one of these admins. 2) In LakeFormation, check the data lake locations, check if the imported S3 Bucket is already registered and what is the role that is using LakeFormation. 3) Has the pivotRole been deleted and re-created at any moment? In this case, I would remove it from the Lake Formation data lake admins and add it again. Lake Formation points at the unique identifier of an IAM role, when a role is deleted and re-created in the console it appears as the same role, but under-the-hood Lake Formation treats them as 2 roles which causes issues.

I hope this helps, please comment here if you still face issues :)

@dlpzx dlpzx added the status: can't reproduce This doesn't seem right label Apr 20, 2023
@dlpzx
Copy link
Contributor

dlpzx commented Jun 5, 2023

Closing due to inactivity. Re-open if needed.

@dlpzx dlpzx closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: can't reproduce This doesn't seem right status: needs more info type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants