Integration tests executed on a real deployment as part of the CICD - Datasets #1358

dlpzx · 2024-06-24T15:19:08Z

Same as for #1220.

This issue is to track the progress for the Datasets modules.
It has its own dedicated issue because of the challenge of pre-existing infrastructure needed to test datasets

dlpzx · 2024-07-01T15:48:47Z

…a.all (#1379) ### Feature or Bugfix - Feature ### Detail It implements some tests for s3_datasets (check full list in #1358) ### For fresh deployments - [x] Create Dataset - [x] Import Dataset --> IMPORTANT: See below details on AWS actions for testing - [x] List Datasets - [X] Get Dataset - [x] Edit Dataset - [x] Delete dataset - decision: I only added explicit test for delete_unauthorized since delete_dataset is covered in the fixtures and it takes a long time to deploy + delete. If needed we can introduce the test for better reporting - [X] Access dataset assume role url - [X] Generate dataset access token - [X] Dataset upload data presigned url - [X] Backwards compatibility - update dataset - [X] Backwards compatibility - import dataset 🔦 **AWS actions outside of data.all** There are some actions that in real life are performed outside of data.all. To run the tests we need to either perform this actions manually before the tests are executed or we can use AWS SDK to automate them. Most important actions performed outside of data.all. - Creation of consumption roles - Creation of imported dataset bucket, kms key and glue database *IN THIS PR - Create VPCs for Notebooks - Validate shares - we assume the share request role for this To create resources we need to assume a role in the environment account. We could assume the pivot role, but then we need to ensure that it has CreateBucket... permissions; which is not the case. I have opted to create a separate isolated role `dataall-integration-tests-role` as part of the environment stack ONLY when we are creating environments during integration testing. As part of the global config of environments users can use the boto3 session of this role to perform direct AWS calls in the environment account. In https://github.com/data-dot-all/dataall/pull/1382/files we discussed some alternatives. In this PR we use the `environmentType` variable in the environment model, which was not used for anything (it always defaulted to Data environments). API call create environment (input: environmentType = IntegrationTesting) ---> in environment stack we check the type of environment and deploy the integration test role. Then we use an SSM parameter to read the tooling account id needed for the assume role trust policy ### Relates - #1358 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>

### Feature or Bugfix  - Feature ### Detail - Adding integration tests for Dataset Table Data Filters - PENDING TESTS PASSING IN DEV AWS ENV - Merge after #1391 ### Relates - related to #1220 and #1358 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <dlpzx@amazon.com>

#1533) ### Feature or Bugfix - Feature: Tests TO BE MERGED AFTER #1391 ### Detail Follow-up of #1391. This PR adds: - Tests for profiling jobs - because it is an easy submodule I decided to "chain" the tests and make them one dependent on the next one. I could also create a fixture for a profiling job (check warning, profiling jobs cannot be deleted) - Added missing tests in datasets_base - we still need to add redshift datasets and other types of datasets every time there is a new dataset added. - Added missing tests in s3_datasets: test_list_s3_datasets_owned_by_env_group. ⚠️ Issues discovered during testing. They are not bugs, they are missing functionalities: - Profiling jobs can never be deleted. It is just information on the RDS database, but nevertheless it cannot be deleted. - It would be nice to have an API that checks the status of a Glue crawler ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>

### Feature or Bugfix - Feature: Testing ### Detail Follow-up of #1391 - Implement Table Column tests ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>

dlpzx added type: enhancement Feature enhacement priority: high effort: medium labels Jun 24, 2024

anmolsgandhi added this to v2.7.0 Jun 24, 2024

github-project-automation bot moved this to Nominated in v2.7.0 Jun 24, 2024

anmolsgandhi removed this from v2.7.0 Jun 24, 2024

dlpzx mentioned this issue Jul 1, 2024

Integration tests executed on a real deployment as part of the CICD #1220

Open

dlpzx mentioned this issue Jul 2, 2024

Add Dataset integration tests - Dataset CRUD + actions outside of data.all #1379

Merged

11 tasks

dlpzx linked a pull request Jul 2, 2024 that will close this issue

Add Dataset integration tests - Dataset CRUD + actions outside of data.all #1379

Merged

11 tasks

anmolsgandhi assigned dlpzx Jul 2, 2024

dlpzx closed this as completed in #1379 Jul 9, 2024

dlpzx added this to v2.7.0 Jul 12, 2024

github-project-automation bot moved this to Nominated in v2.7.0 Jul 12, 2024

dlpzx reopened this Jul 12, 2024

dlpzx moved this from Nominated to Prioritized To do in v2.7.0 Jul 12, 2024

NickCorbett added this to the v2.7.0 Integration Testing and Release milestone Jul 18, 2024

dlpzx moved this from Backlog to In progress in v2.7.0 Sep 5, 2024

dlpzx added this to v2.8.0 Sep 9, 2024

github-project-automation bot moved this to Nominated in v2.8.0 Sep 9, 2024

dlpzx removed this from v2.8.0 Sep 9, 2024

This was referenced Sep 16, 2024

Add Dataset integration tests - Tables, Folders #1391

Merged

Add Dataset integration tests - Dataset missing tests, Table Profiling #1533

Merged

Add Dataset integration tests - Table Columns #1548

Merged

Feat/integration tests dataset filters #1539

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration tests executed on a real deployment as part of the CICD - Datasets #1358

Integration tests executed on a real deployment as part of the CICD - Datasets #1358

dlpzx commented Jun 24, 2024

dlpzx commented Jul 1, 2024 •

edited

Loading

Integration tests executed on a real deployment as part of the CICD - Datasets #1358

Integration tests executed on a real deployment as part of the CICD - Datasets #1358

Comments

dlpzx commented Jun 24, 2024

dlpzx commented Jul 1, 2024 • edited Loading

Required tests for basic coverage

For fresh deployments

For backwards compatibility

Full coverage

For fresh deployments

For backwards compatibility

dlpzx commented Jul 1, 2024 •

edited

Loading