Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix created dataset naming convention #1002

Merged
merged 3 commits into from
Jan 26, 2024
Merged

Fix created dataset naming convention #1002

merged 3 commits into from
Jan 26, 2024

Conversation

noah-paige
Copy link
Contributor

Feature or Bugfix

  • Bugfix

Detail

  • Before we were creating naming convention for s3 buckets, kms keys, etc. for newly create datasets without using the targetUri because we were referencing the Dataset object before it was added to the RDS Table and thus before the datasetUri is created

Relates

  • N/A

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)?
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume?
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization?
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features?
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users?
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@noah-paige noah-paige marked this pull request as ready for review January 25, 2024 19:58
@dlpzx
Copy link
Contributor

dlpzx commented Jan 26, 2024

Great catch, thanks for the quick PR. Overall changes would solve the issue but I think the create_dataset function in dataset_service can be improved. Right now in that function we:

  1. check AWS account (check Quicksight subscription)
  2. create the Dataset in RDS with the function build_dataset
  3. check the imported resources
  4. create Activity in RDS and update some dataset info with the function create_dataset
  5. create bucket in RDS, create permissions, create stack in RDS, upsert in Catalog and deploy stack

The problem is that if the imported resources fail, we would have an orphan dataset in the UI and no permissions are attached at that point. Instead I would re-order the actions as:

  1. check AWS account (check Quicksight subscription)
  2. check the imported resources
  3. create the Dataset in RDS + set_aws_resources + create Activity
  4. create bucket in RDS, create permissions, create stack in RDS, upsert in Catalog and deploy stack

@noah-paige
Copy link
Contributor Author

Great catch, thanks for the quick PR. Overall changes would solve the issue but I think the create_dataset function in dataset_service can be improved. Right now in that function we:

1. check AWS account (check Quicksight subscription)

2. create the Dataset in RDS with the function `build_dataset`

3. check the imported resources

4. create Activity in RDS and update some dataset info with the function `create_dataset`

5. create bucket in RDS, create permissions, create stack in RDS, upsert in Catalog and deploy stack

The problem is that if the imported resources fail, we would have an orphan dataset in the UI and no permissions are attached at that point. Instead I would re-order the actions as:

1. check AWS account (check Quicksight subscription)

2. check the imported resources

3. create the Dataset in RDS + set_aws_resources + create Activity

4. create bucket in RDS, create permissions, create stack in RDS, upsert in Catalog and deploy stack

I do not think the above is true - after this PR the way we implemented create_dataset is:

  1. check AWS account (check Quicksight subscription)
  2. create Dataset Model + _set_import_data (not inserted into RDS)
  3. check the imported resources
  4. create the Dataset in RDS + set_aws_resources + create Activity
  5. create bucket in RDS, create permissions, create stack in RDS, upsert in Catalog and deploy stack

I also tested by importing a dataset with incorrect KMS Key alias and there was no orphaned dataset in RDS. Let me know if I am missing some edge case / testing scenario though

@dlpzx
Copy link
Contributor

dlpzx commented Jan 26, 2024

My bad! I did not see the commit(dataset) in create_dataset. I deployed the PR in AWS and I confirm:

  • CICD pipeline succeeds
  • Dataset creation/import succeeds with proper names

@noah-paige noah-paige merged commit 3e33479 into main Jan 26, 2024
8 checks passed
@dlpzx dlpzx deleted the fix-dataset-naming branch April 25, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants