Add Dataset integration tests - Tables, Folders #1391

noah-paige · 2024-07-09T02:52:13Z

Feature or Bugfix

Feature: Testing

Detail

In this PR we add new fixtures for S3 datasets that are used for the tests in S3/tables/folders but also for the tests developed in ##1389:

Fix imported KMS dataset - there was an error in the KMS keys and in the registration of the Glue database
Folders as separate fixture using create_folder data.all API
Tables as separate fixture using boto3 calls to create the table, upload data and then use sync_tables data.all API - the data can be queried!

This PR moves dataset_base testing scenarios to datasets_base/test_dataset.py. Testing scenarios have been defined for the S3 datasets and the remaining test scenarios for the datasets_base APIs are defined with their signature and a TODO comment.

It also splits the S3 dataset tests into their corresponding API subcategories (in backend/.../s3_datasets/api)

test_s3_datasets
test_s3_tables
test_s3_tables_profiling
test_s3_tables_columns
test_s3_folders

Implement testing scenarios for test_s3_folders covering all APIs and dataset types (parametrized tests). Note that to avoid duplication of tests, unauthorized test cases are tested with only one of the dataset types as the code executed is the same for all cases.

Implement testing scenarios for test_s3_tables covering all APIs and dataset types (parametrized tests). Same as folders, unauthorized tests are performed on a single dataset type. New tests include: sync_tables with real tables, preview tables with real tables, preview unauthorized depending on the confidentiality level, get_dataset_level, list_dataset_tables

For test_s3_datasets only test_create_dataset_unauthorized is added, but for other existing tests we add test for all dataset types (parametrized tests).

Next steps

In follow-up PRs we should implement the missing commented TODO tests for:

datasets_base ---> list owned tests
s3_datasets ---> list owned tests
s3_tables ---> data filters tests
s3_tables_profiling ---> some tests
s3_tables_columns ---> all tests
Review backwards compatibility tests and add table and folder test cases

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

Does this PR introduce or modify any input fields or queries - this includes
fetching data from storage outside the application (e.g. a database, an S3 bucket)?
- Is the input sanitized?
- What precautions are you taking before deserializing the data you consume?
- Is injection prevented by parametrizing queries?
- Have you ensured no eval or similar functions are used?
Does this PR introduce any functionality or component that requires authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
- Are you logging failed auth attempts?
Are you using or adding any cryptographic features?
- Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored?
Are you introducing any new policies/roles/users?
- Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…delete tests

…+ aws clients for dataset

…nt id

…egration-tests-datasets-pt2 # Conflicts: # tests_new/integration_tests/core/environment/global_conftest.py # tests_new/integration_tests/modules/s3_datasets/global_conftest.py # tests_new/integration_tests/modules/s3_datasets/test_s3_dataset.py

tests_new/integration_tests/modules/s3_datasets/global_conftest.py

…x issue with retry decorator

SofiaSazonova · 2024-09-10T14:08:52Z

tests_new/integration_tests/modules/s3_datasets/global_conftest.py

-    dataset_name, client, group, env, bucket=None, kms_alias=None, glue_database=None
-):
-    dataset_name = 'persistent_s3_dataset1'
+def get_or_create_persistent_s3_dataset(dataset_name, client, group, env, bucket=None, kms_alias='', glue_database=''):


dataset_name is further used only for bucket naming and tags. Let's make it a dataset name as well. Otherwise all datasets has the same name TestDatasetCreated

dlpzx · 2024-09-12T08:37:24Z

Tested with latest changes locally and in a real CICD in AWS

SofiaSazonova

Tested in my deployment. Good to go

petrkalos · 2024-09-13T13:28:16Z

tests_new/integration_tests/modules/s3_datasets/test_s3_tables_profiling.py

+    ).contains('UnauthorizedOperation', 'PROFILE_DATASET_TABLE', dataset_uri)
+
+
+def test_list_table_profiling_runs():


you could mark them as @pytest.mark.skip(reason="no way of currently testing this") to avoid polluting the reports

### Feature or Bugfix  - Feature ### Detail - Adding integration tests for Dataset Table Data Filters - PENDING TESTS PASSING IN DEV AWS ENV - Merge after #1391 ### Relates - related to #1220 and #1358 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: dlpzx <dlpzx@amazon.com>

#1533) ### Feature or Bugfix - Feature: Tests TO BE MERGED AFTER #1391 ### Detail Follow-up of #1391. This PR adds: - Tests for profiling jobs - because it is an easy submodule I decided to "chain" the tests and make them one dependent on the next one. I could also create a fixture for a profiling job (check warning, profiling jobs cannot be deleted) - Added missing tests in datasets_base - we still need to add redshift datasets and other types of datasets every time there is a new dataset added. - Added missing tests in s3_datasets: test_list_s3_datasets_owned_by_env_group. ⚠️ Issues discovered during testing. They are not bugs, they are missing functionalities: - Profiling jobs can never be deleted. It is just information on the RDS database, but nevertheless it cannot be deleted. - It would be nice to have an API that checks the status of a Glue crawler ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>

### Feature or Bugfix - Feature: Testing ### Detail Follow-up of #1391 - Implement Table Column tests ### Relates - #1358 - #1391 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Noah Paige <noahpaig@amazon.com>

dlpzx and others added 18 commits July 1, 2024 19:08

Add integration tests for datasets - basic queries and conftest

b188538

add list + get queries, add persistent datasets, begin create/update/…

45d1407

…delete tests

Add integration test role in Environment stack + session in conftest …

cd27097

…+ aws clients for dataset

simplified conftests for datasets

d04b525

create integration role with region in name

5e5507e

New environment type: IntegrationTests + ssm param with tooling accou…

fa69dde

…nt id

Error on cdk add_to_policy

3e19596

Add filter term include tags datasets

c05de67

Add sample data and tests for dataset role access

8f2a918

Add sample data and tests for dataset role access

9b2c711

Add assume role permissions to codebuild role

2dcd60f

Add naming checks in clients + create table

c261da7

Add permissions, confidentiality and commented tests

1e9732b

revert persistent environment

5ea8b6b

Fix check_stack_ready in dataset creation

520a34e

Revert session environment and add tests

972c883

fix integration role datasets

7b1c942

Fix presigned URL upload test

d9042dc

dlpzx changed the title ~~Feat/integration tests datasets pt2~~ Add Dataset integration tests - Tables, Folders Jul 9, 2024

dlpzx added 2 commits July 9, 2024 15:40

Uncomment drafted table/folder tests

b633938

SofiaSazonova reviewed Sep 3, 2024

View reviewed changes

tests_new/integration_tests/modules/s3_datasets/global_conftest.py Outdated Show resolved Hide resolved

SofiaSazonova reviewed Sep 3, 2024

View reviewed changes

tests_new/integration_tests/modules/s3_datasets/global_conftest.py Outdated Show resolved Hide resolved

dlpzx added 7 commits September 5, 2024 10:30

Merge branch 'refs/heads/main' into feat/integration-tests-datasets-pt2

2330021

Ruff and readme

a857d0a

Split dataset tests and added signature of each test for all APIs. Fi…

5968fd3

…x issue with retry decorator

Added all dataset query definitions and placeholders for tests

052bc7e

Started parametrization of tests

9ad774e

Started parametrization of tests

146c45e

Started parametrization of tests

0907ea3

SofiaSazonova reviewed Sep 10, 2024

View reviewed changes

dlpzx added 4 commits September 10, 2024 17:44

Moving fixture parameters to conftest

196fb6e

Update requisite in README

d3bb8be

PR review comments - functions to create AWS imported resources, names

ec38ec0

PR review comments - 2

ccb6887

dlpzx requested a review from petrkalos September 11, 2024 09:38

Merge branch 'refs/heads/main' into feat/integration-tests-datasets-pt2

01c65c8

dlpzx mentioned this pull request Sep 11, 2024

Add Dataset integration tests - Dataset missing tests, Table Profiling #1533

Merged

dlpzx requested a review from SofiaSazonova September 11, 2024 15:22

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 93b0997 to 6e9fbb1 Compare September 12, 2024 06:48

Issue persistent buckets

5358677

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 6e9fbb1 to 5358677 Compare September 12, 2024 07:56

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 7e51219 to 4084a3b Compare September 12, 2024 12:05

noah-paige mentioned this pull request Sep 12, 2024

Feat/integration tests dataset filters #1539

Merged

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 4084a3b to 0eb4f73 Compare September 12, 2024 12:41

Rewrite if-clause existing infra and resource for imported dataset

590909b

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 0eb4f73 to 590909b Compare September 12, 2024 13:35

SofiaSazonova approved these changes Sep 12, 2024

View reviewed changes

Small return issue

0674531

dlpzx force-pushed the feat/integration-tests-datasets-pt2 branch from 84dcfea to 0674531 Compare September 13, 2024 08:55

dlpzx mentioned this pull request Sep 13, 2024

Add Dataset integration tests - Table Columns #1548

Merged

petrkalos approved these changes Sep 13, 2024

View reviewed changes

petrkalos reviewed Sep 13, 2024

View reviewed changes

dlpzx merged commit 405019d into main Sep 13, 2024
9 checks passed

dlpzx mentioned this pull request Sep 16, 2024

Integration tests executed on a real deployment as part of the CICD - Datasets #1358

Open

dlpzx deleted the feat/integration-tests-datasets-pt2 branch September 19, 2024 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dataset integration tests - Tables, Folders #1391

Add Dataset integration tests - Tables, Folders #1391

noah-paige commented Jul 9, 2024 •

edited by dlpzx

Loading

SofiaSazonova Sep 10, 2024

dlpzx Sep 11, 2024

dlpzx commented Sep 12, 2024

SofiaSazonova left a comment

petrkalos Sep 13, 2024

		).contains('UnauthorizedOperation', 'PROFILE_DATASET_TABLE', dataset_uri)


		def test_list_table_profiling_runs():

Add Dataset integration tests - Tables, Folders #1391

Add Dataset integration tests - Tables, Folders #1391

Conversation

noah-paige commented Jul 9, 2024 • edited by dlpzx Loading

Feature or Bugfix

Detail

Next steps

Relates

Security

SofiaSazonova Sep 10, 2024

Choose a reason for hiding this comment

dlpzx Sep 11, 2024

Choose a reason for hiding this comment

dlpzx commented Sep 12, 2024

SofiaSazonova left a comment

Choose a reason for hiding this comment

petrkalos Sep 13, 2024

Choose a reason for hiding this comment

noah-paige commented Jul 9, 2024 •

edited by dlpzx

Loading